All of lore.kernel.org
 help / color / mirror / Atom feed
* Questions about KMS flip
@ 2021-11-04 12:51 Christian König
  2021-11-04 16:44 ` Harry Wentland
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-04 12:51 UTC (permalink / raw)
  To: Michel Dänzer, Wentland, Harry, Daniel Vetter, dri-devel

Hi guys,

adding the usual suspects which might know that of hand: When we do a 
KMS page flip, who keeps the reference to the BO while it is scanned out?

We are running into warning backtraces from TTM which look more than odd.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-04 12:51 Questions about KMS flip Christian König
@ 2021-11-04 16:44 ` Harry Wentland
  2021-11-05 12:20   ` Ville Syrjälä
  2021-11-05 18:13   ` Daniel Vetter
  0 siblings, 2 replies; 28+ messages in thread
From: Harry Wentland @ 2021-11-04 16:44 UTC (permalink / raw)
  To: Christian König, Michel Dänzer, Daniel Vetter,
	dri-devel, Kazlauskas, Nicholas

+Nick

It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.

Harry

On 2021-11-04 08:51, Christian König wrote:
> Hi guys,
> 
> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> 
> We are running into warning backtraces from TTM which look more than odd.
> 
> Thanks,
> Christian.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-04 16:44 ` Harry Wentland
@ 2021-11-05 12:20   ` Ville Syrjälä
  2021-11-05 18:13   ` Daniel Vetter
  1 sibling, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2021-11-05 12:20 UTC (permalink / raw)
  To: Harry Wentland
  Cc: dri-devel, Michel Dänzer, Christian König, Kazlauskas,
	Nicholas

On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> +Nick
> 
> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.

BTW looks like you have a possible leak during fb init;
amdgpu_display_framebuffer_init() grabs the refs to the BOs,
but drm_framebuffer_init() might still fail (at least
theoretically) which will then leak those BO refs.

> 
> Harry
> 
> On 2021-11-04 08:51, Christian König wrote:
> > Hi guys,
> > 
> > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > 
> > We are running into warning backtraces from TTM which look more than odd.
> > 
> > Thanks,
> > Christian.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-04 16:44 ` Harry Wentland
  2021-11-05 12:20   ` Ville Syrjälä
@ 2021-11-05 18:13   ` Daniel Vetter
  2021-11-08  7:44     ` Christian König
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-05 18:13 UTC (permalink / raw)
  To: Harry Wentland
  Cc: dri-devel, Michel Dänzer, Christian König, Kazlauskas,
	Nicholas

On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> +Nick
> 
> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.

Yup plane state holds reference for its entire existing (well this holds
in general as design principle for atomic state structs, just makes life
easier). And the plane state is guaranteed to exist from when we first pin
(prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
hook).

Out of curiosity, what's blowing up?
-Daniel

> 
> Harry
> 
> On 2021-11-04 08:51, Christian König wrote:
> > Hi guys,
> > 
> > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > 
> > We are running into warning backtraces from TTM which look more than odd.
> > 
> > Thanks,
> > Christian.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-05 18:13   ` Daniel Vetter
@ 2021-11-08  7:44     ` Christian König
  2021-11-08 14:59       ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-08  7:44 UTC (permalink / raw)
  To: Daniel Vetter, Harry Wentland
  Cc: Michel Dänzer, Kazlauskas, Nicholas, dri-devel



Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
>> +Nick
>>
>> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> Yup plane state holds reference for its entire existing (well this holds
> in general as design principle for atomic state structs, just makes life
> easier). And the plane state is guaranteed to exist from when we first pin
> (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> hook).
>
> Out of curiosity, what's blowing up?

The TTM pin count warning. What happens is that we try to free up a BO 
while it is still being pinned.

My best guess is that some DMA-buf client is doing something wrong, but 
it could of course also be that the BO was pinned for scanout.

Christian.

> -Daniel
>
>> Harry
>>
>> On 2021-11-04 08:51, Christian König wrote:
>>> Hi guys,
>>>
>>> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
>>>
>>> We are running into warning backtraces from TTM which look more than odd.
>>>
>>> Thanks,
>>> Christian.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-08  7:44     ` Christian König
@ 2021-11-08 14:59       ` Daniel Vetter
  2021-11-10 10:18         ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-08 14:59 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, dri-devel, Kazlauskas, Nicholas

On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
> Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> > On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> > > +Nick
> > > 
> > > It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> > Yup plane state holds reference for its entire existing (well this holds
> > in general as design principle for atomic state structs, just makes life
> > easier). And the plane state is guaranteed to exist from when we first pin
> > (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> > hook).
> > 
> > Out of curiosity, what's blowing up?
> 
> The TTM pin count warning. What happens is that we try to free up a BO while
> it is still being pinned.
> 
> My best guess is that some DMA-buf client is doing something wrong, but it
> could of course also be that the BO was pinned for scanout.

We check in dma_buf_release whether there's anything left over, so I think
the dma-buf scenario is rather unlikely.

I guess worst case we could add a cookie struct to dma_buf_pin that you
need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
maybe implement that in ttm even. Otherwise I don't have good ideas.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Harry
> > > 
> > > On 2021-11-04 08:51, Christian König wrote:
> > > > Hi guys,
> > > > 
> > > > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > > > 
> > > > We are running into warning backtraces from TTM which look more than odd.
> > > > 
> > > > Thanks,
> > > > Christian.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-08 14:59       ` Daniel Vetter
@ 2021-11-10 10:18         ` Christian König
  2021-11-10 13:26           ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-10 10:18 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Michel Dänzer, Kazlauskas, Nicholas, dri-devel

Am 08.11.21 um 15:59 schrieb Daniel Vetter:
> On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
>> Am 05.11.21 um 19:13 schrieb Daniel Vetter:
>>> On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
>>>> +Nick
>>>>
>>>> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
>>> Yup plane state holds reference for its entire existing (well this holds
>>> in general as design principle for atomic state structs, just makes life
>>> easier). And the plane state is guaranteed to exist from when we first pin
>>> (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
>>> hook).
>>>
>>> Out of curiosity, what's blowing up?
>> The TTM pin count warning. What happens is that we try to free up a BO while
>> it is still being pinned.
>>
>> My best guess is that some DMA-buf client is doing something wrong, but it
>> could of course also be that the BO was pinned for scanout.
> We check in dma_buf_release whether there's anything left over, so I think
> the dma-buf scenario is rather unlikely.

That was unfortunately only added quite recently and is currently 
backported to older kernels.

> I guess worst case we could add a cookie struct to dma_buf_pin that you
> need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
> maybe implement that in ttm even. Otherwise I don't have good ideas.

I was thinking about something similar for ttm_bo_pin().

BTW: How would I do that? I know that dump_stack() prints the current 
stack trace into the system log, but how would I get this as string?

Thanks,
Christian.

> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> Harry
>>>>
>>>> On 2021-11-04 08:51, Christian König wrote:
>>>>> Hi guys,
>>>>>
>>>>> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
>>>>>
>>>>> We are running into warning backtraces from TTM which look more than odd.
>>>>>
>>>>> Thanks,
>>>>> Christian.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-10 10:18         ` Christian König
@ 2021-11-10 13:26           ` Daniel Vetter
  2021-11-12 12:47               ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-10 13:26 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, dri-devel, Kazlauskas, Nicholas

On Wed, Nov 10, 2021 at 11:18:02AM +0100, Christian König wrote:
> Am 08.11.21 um 15:59 schrieb Daniel Vetter:
> > On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
> > > Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> > > > On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> > > > > +Nick
> > > > > 
> > > > > It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> > > > Yup plane state holds reference for its entire existing (well this holds
> > > > in general as design principle for atomic state structs, just makes life
> > > > easier). And the plane state is guaranteed to exist from when we first pin
> > > > (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> > > > hook).
> > > > 
> > > > Out of curiosity, what's blowing up?
> > > The TTM pin count warning. What happens is that we try to free up a BO while
> > > it is still being pinned.
> > > 
> > > My best guess is that some DMA-buf client is doing something wrong, but it
> > > could of course also be that the BO was pinned for scanout.
> > We check in dma_buf_release whether there's anything left over, so I think
> > the dma-buf scenario is rather unlikely.
> 
> That was unfortunately only added quite recently and is currently backported
> to older kernels.
> 
> > I guess worst case we could add a cookie struct to dma_buf_pin that you
> > need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
> > maybe implement that in ttm even. Otherwise I don't have good ideas.
> 
> I was thinking about something similar for ttm_bo_pin().
> 
> BTW: How would I do that? I know that dump_stack() prints the current stack
> trace into the system log, but how would I get this as string?

stack depot, not stuff it into a string. stack depot is a lot more
efficient and capturing backtraces, since it de-dupes and hashes and all
you need is a pointer iirc. linux/stackdepot.h for interface, I think we
recently gained a few more users of it in drm. It's _really_ nifty.
-Daniel


> 
> Thanks,
> Christian.
> 
> > -Daniel
> > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > Harry
> > > > > 
> > > > > On 2021-11-04 08:51, Christian König wrote:
> > > > > > Hi guys,
> > > > > > 
> > > > > > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > > > > > 
> > > > > > We are running into warning backtraces from TTM which look more than odd.
> > > > > > 
> > > > > > Thanks,
> > > > > > Christian.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-10 13:26           ` Daniel Vetter
@ 2021-11-12 12:47               ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-12 12:47 UTC (permalink / raw)
  To: Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
  Cc: Michel Dänzer, amd-gfx list, dri-devel

Hi guys,

Am 10.11.21 um 14:26 schrieb Daniel Vetter:
> [SNIP]
> stack depot, not stuff it into a string. stack depot is a lot more
> efficient and capturing backtraces, since it de-dupes and hashes and all
> you need is a pointer iirc. linux/stackdepot.h for interface, I think we
> recently gained a few more users of it in drm. It's _really_ nifty.
> -Daniel

thanks for the pointer Daniel, that framework indeed looks like it can 
become handy.

Anyway this unfortunately turned out to be work for Harray and Nicholas. 
In detail it's about this bug report here: 
https://bugzilla.kernel.org/show_bug.cgi?id=214621

Lang was able to reproduce the issue and narrow it down to the pin in 
amdgpu_display_crtc_page_flip_target().

In other words we somehow have an unbalanced pinning of the scanout 
buffer in DC.

What should we do about that?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
@ 2021-11-12 12:47               ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-12 12:47 UTC (permalink / raw)
  To: Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
  Cc: Michel Dänzer, amd-gfx list, Daniel Vetter, dri-devel

Hi guys,

Am 10.11.21 um 14:26 schrieb Daniel Vetter:
> [SNIP]
> stack depot, not stuff it into a string. stack depot is a lot more
> efficient and capturing backtraces, since it de-dupes and hashes and all
> you need is a pointer iirc. linux/stackdepot.h for interface, I think we
> recently gained a few more users of it in drm. It's _really_ nifty.
> -Daniel

thanks for the pointer Daniel, that framework indeed looks like it can 
become handy.

Anyway this unfortunately turned out to be work for Harray and Nicholas. 
In detail it's about this bug report here: 
https://bugzilla.kernel.org/show_bug.cgi?id=214621

Lang was able to reproduce the issue and narrow it down to the pin in 
amdgpu_display_crtc_page_flip_target().

In other words we somehow have an unbalanced pinning of the scanout 
buffer in DC.

What should we do about that?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 12:47               ` Christian König
  (?)
@ 2021-11-12 14:29               ` Michel Dänzer
  2021-11-12 14:30                 ` Michel Dänzer
  -1 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 14:29 UTC (permalink / raw)
  To: Christian König, Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
  Cc: dri-devel, amd-gfx list

On 2021-11-12 13:47, Christian König wrote:
> 
> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
> 
> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> 
> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.

DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in 
dm_plane_helper_cleanup_fb.


With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 14:29               ` Michel Dänzer
@ 2021-11-12 14:30                 ` Michel Dänzer
  2021-11-12 15:03                   ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 14:30 UTC (permalink / raw)
  To: Christian König, Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
  Cc: amd-gfx list, dri-devel

On 2021-11-12 15:29, Michel Dänzer wrote:
> On 2021-11-12 13:47, Christian König wrote:
>>
>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>
>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>
>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> 
> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in 
> dm_plane_helper_cleanup_fb.
> 
> 
> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb

This should say amdgpu_display_unpin_work_func.


> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> 
> 


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 14:30                 ` Michel Dänzer
@ 2021-11-12 15:03                   ` Christian König
  2021-11-12 16:10                     ` Michel Dänzer
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-12 15:03 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Harry Wentland,
	Kazlauskas, Nicholas, Yu, Lang
  Cc: dri-devel, amd-gfx list



Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> On 2021-11-12 15:29, Michel Dänzer wrote:
>> On 2021-11-12 13:47, Christian König wrote:
>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>>
>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>
>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>> dm_plane_helper_cleanup_fb.
>>
>>
>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> This should say amdgpu_display_unpin_work_func.

Ah! So that is the classic (e.g. non atomic) path?

>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?

Nope, my educated guess is rather that we free up the BO before 
amdgpu_display_unpin_work_func is called.

E.g. not pin unbalance, but rather use after free.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 15:03                   ` Christian König
@ 2021-11-12 16:10                     ` Michel Dänzer
  2021-11-15  6:41                         ` Lang Yu
  2021-11-15  7:25                       ` Christian König
  0 siblings, 2 replies; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 16:10 UTC (permalink / raw)
  To: Christian König, Christian König, Harry Wentland,
	Kazlauskas, Nicholas, Yu, Lang
  Cc: amd-gfx list, dri-devel

On 2021-11-12 16:03, Christian König wrote:
> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>> On 2021-11-12 13:47, Christian König wrote:
>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>>>
>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>
>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>> dm_plane_helper_cleanup_fb.
>>>
>>>
>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>> This should say amdgpu_display_unpin_work_func.
> 
> Ah! So that is the classic (e.g. non atomic) path?

Presumably.


>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> 
> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> 
> E.g. not pin unbalance, but rather use after free.

amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 16:10                     ` Michel Dänzer
@ 2021-11-15  6:41                         ` Lang Yu
  2021-11-15  7:25                       ` Christian König
  1 sibling, 0 replies; 28+ messages in thread
From: Lang Yu @ 2021-11-15  6:41 UTC (permalink / raw)
  To: Michel DDDnzer
  Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
	Christian KKKnig

On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> On 2021-11-12 16:03, Christian König wrote:
> > Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>> On 2021-11-12 13:47, Christian König wrote:
> >>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
> >>>>
> >>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>
> >>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>> dm_plane_helper_cleanup_fb.
> >>>
> >>>
> >>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >> This should say amdgpu_display_unpin_work_func.
> > 
> > Ah! So that is the classic (e.g. non atomic) path?
> 
> Presumably.
> 
> 
> >>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> > 
> > Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> > 
> > E.g. not pin unbalance, but rather use after free.
> 
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).


Actually, each call to amdgpu_display_crtc_page_flip_target() will

1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
   (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().

2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
   Next call.

The problem is the pinned buffer of last call to 
amdgpu_display_crtc_page_flip_target() is not unpinned.

It should be an imbalance call to pin/unpin.

> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fJroRJc4SYuUXX1U52X%2FktYpAKaADg3kM9PsONnKu8c%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
@ 2021-11-15  6:41                         ` Lang Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Lang Yu @ 2021-11-15  6:41 UTC (permalink / raw)
  To: Michel DDDnzer
  Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
	Harry Wentland, Christian KKKnig

On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> On 2021-11-12 16:03, Christian König wrote:
> > Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>> On 2021-11-12 13:47, Christian König wrote:
> >>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
> >>>>
> >>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>
> >>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>> dm_plane_helper_cleanup_fb.
> >>>
> >>>
> >>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >> This should say amdgpu_display_unpin_work_func.
> > 
> > Ah! So that is the classic (e.g. non atomic) path?
> 
> Presumably.
> 
> 
> >>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> > 
> > Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> > 
> > E.g. not pin unbalance, but rather use after free.
> 
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).


Actually, each call to amdgpu_display_crtc_page_flip_target() will

1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
   (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().

2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
   Next call.

The problem is the pinned buffer of last call to 
amdgpu_display_crtc_page_flip_target() is not unpinned.

It should be an imbalance call to pin/unpin.

> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fJroRJc4SYuUXX1U52X%2FktYpAKaADg3kM9PsONnKu8c%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-12 16:10                     ` Michel Dänzer
  2021-11-15  6:41                         ` Lang Yu
@ 2021-11-15  7:25                       ` Christian König
  1 sibling, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-15  7:25 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Harry Wentland,
	Kazlauskas, Nicholas, Yu, Lang
  Cc: amd-gfx list, dri-devel

Am 12.11.21 um 17:10 schrieb Michel Dänzer:
> On 2021-11-12 16:03, Christian König wrote:
>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7Cchristian.koenig%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302340621335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pvLGq%2FJRvVy0k5GMGF2UPotCSdbiQNfndtjI14luAUg%3D&reserved=0
>>>>>
>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>
>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>> dm_plane_helper_cleanup_fb.
>>>>
>>>>
>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>> This should say amdgpu_display_unpin_work_func.
>> Ah! So that is the classic (e.g. non atomic) path?
> Presumably.
>
>
>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>
>> E.g. not pin unbalance, but rather use after free.
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).

Yeah, seen that in the meantime as well.

But we also have a WARN_ON() when the pincount overruns, so that can't 
be it either.

Long story short I have no idea what's going on here.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15  6:41                         ` Lang Yu
  (?)
@ 2021-11-15  8:38                         ` Michel Dänzer
  2021-11-15  9:04                           ` Lang Yu
  -1 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15  8:38 UTC (permalink / raw)
  To: Lang Yu
  Cc: Christian König, dri-devel, Kazlauskas, Nicholas,
	amd-gfx list, Christian König

On 2021-11-15 07:41, Lang Yu wrote:
> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>> On 2021-11-12 16:03, Christian König wrote:
>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
>>>>>>
>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>
>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>> dm_plane_helper_cleanup_fb.
>>>>>
>>>>>
>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>> This should say amdgpu_display_unpin_work_func.
>>>
>>> Ah! So that is the classic (e.g. non atomic) path?
>>
>> Presumably.
>>
>>
>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>
>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>
>>> E.g. not pin unbalance, but rather use after free.
>>
>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> 
> 
> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> 
> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> 
> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
>    Next call.
> 
> The problem is the pinned buffer of last call to 
> amdgpu_display_crtc_page_flip_target() is not unpinned.

It's unpinned in dce_v*_0_crtc_disable.


I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15  8:38                         ` Michel Dänzer
@ 2021-11-15  9:04                           ` Lang Yu
  2021-11-15  9:49                             ` Michel Dänzer
  0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-15  9:04 UTC (permalink / raw)
  To: Michel DDDnzer
  Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
	Christian KKKnig

On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> On 2021-11-15 07:41, Lang Yu wrote:
> > On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >> On 2021-11-12 16:03, Christian König wrote:
> >>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=od%2BNksWOff%2FtsuAYZLX7lIJFQJMY2OScVqhLclPYWAQ%3D&reserved=0
> >>>>>>
> >>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>
> >>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>> dm_plane_helper_cleanup_fb.
> >>>>>
> >>>>>
> >>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>> This should say amdgpu_display_unpin_work_func.
> >>>
> >>> Ah! So that is the classic (e.g. non atomic) path?
> >>
> >> Presumably.
> >>
> >>
> >>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>
> >>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>
> >>> E.g. not pin unbalance, but rather use after free.
> >>
> >> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> > 
> > 
> > Actually, each call to amdgpu_display_crtc_page_flip_target() will
> > 
> > 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> > 
> > 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >    Next call.
> > 
> > The problem is the pinned buffer of last call to 
> > amdgpu_display_crtc_page_flip_target() is not unpinned.
> 
> It's unpinned in dce_v*_0_crtc_disable.

I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
So it's not unpinned...

> 
> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
> 
> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=o2m1p98zVr16n58wdOWBWyEzETMAfEGqMYAtaAZozgo%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15  9:04                           ` Lang Yu
@ 2021-11-15  9:49                             ` Michel Dänzer
  2021-11-15 11:31                               ` Lang Yu
  0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15  9:49 UTC (permalink / raw)
  To: Lang Yu
  Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
	Christian KKKnig

On 2021-11-15 10:04, Lang Yu wrote:
> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
>> On 2021-11-15 07:41, Lang Yu wrote:
>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>>>> On 2021-11-12 16:03, Christian König wrote:
>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=od%2BNksWOff%2FtsuAYZLX7lIJFQJMY2OScVqhLclPYWAQ%3D&reserved=0
>>>>>>>>
>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>>>
>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>>>> dm_plane_helper_cleanup_fb.
>>>>>>>
>>>>>>>
>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>>>> This should say amdgpu_display_unpin_work_func.
>>>>>
>>>>> Ah! So that is the classic (e.g. non atomic) path?
>>>>
>>>> Presumably.
>>>>
>>>>
>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>>>
>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>>>
>>>>> E.g. not pin unbalance, but rather use after free.
>>>>
>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
>>>
>>>
>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
>>>
>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
>>>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
>>>
>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
>>>    Next call.
>>>
>>> The problem is the pinned buffer of last call to 
>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
>>
>> It's unpinned in dce_v*_0_crtc_disable.
> 
> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> So it's not unpinned...

__drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.

Have you checked for the issue I described below? Should be pretty easy to catch.


>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15  9:49                             ` Michel Dänzer
@ 2021-11-15 11:31                               ` Lang Yu
  2021-11-15 12:04                                 ` Michel Dänzer
  0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-15 11:31 UTC (permalink / raw)
  To: Michel DDDnzer
  Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
	Christian KKKnig

On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
> On 2021-11-15 10:04, Lang Yu wrote:
> > On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> >> On 2021-11-15 07:41, Lang Yu wrote:
> >>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >>>> On 2021-11-12 16:03, Christian König wrote:
> >>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3D&reserved=0
> >>>>>>>>
> >>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>>>
> >>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>>>> dm_plane_helper_cleanup_fb.
> >>>>>>>
> >>>>>>>
> >>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>>>> This should say amdgpu_display_unpin_work_func.
> >>>>>
> >>>>> Ah! So that is the classic (e.g. non atomic) path?
> >>>>
> >>>> Presumably.
> >>>>
> >>>>
> >>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>>>
> >>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>>>
> >>>>> E.g. not pin unbalance, but rather use after free.
> >>>>
> >>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >>>
> >>>
> >>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >>>
> >>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >>>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >>>
> >>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >>>    Next call.
> >>>
> >>> The problem is the pinned buffer of last call to 
> >>> amdgpu_display_crtc_page_flip_target() is not unpinned.
> >>
> >> It's unpinned in dce_v*_0_crtc_disable.
> > 
> > I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> > So it's not unpinned...
> 
> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
> 
> Have you checked for the issue I described below? Should be pretty easy to catch.
> 
> 
> >> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.

Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
never called. Though a single call to dce_v*_0_crtc_do_set_base() will 
only pin the BO, I found it will be unpinned in next call to 
dce_v*_0_crtc_do_set_base(). Anyway, these pin/unpin operations looks
error-prone.

Regards,
Lang

> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=EppaFKsy78tecP6oIVVuBh3enb%2Fu8jurugqEwUxDCYk%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15 11:31                               ` Lang Yu
@ 2021-11-15 12:04                                 ` Michel Dänzer
  2021-11-16  3:27                                   ` Lang Yu
  0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15 12:04 UTC (permalink / raw)
  To: Lang Yu
  Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
	Christian KKKnig

On 2021-11-15 12:31, Lang Yu wrote:
> On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
>> On 2021-11-15 10:04, Lang Yu wrote:
>>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
>>>> On 2021-11-15 07:41, Lang Yu wrote:
>>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>>>>>> On 2021-11-12 16:03, Christian König wrote:
>>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3D&reserved=0
>>>>>>>>>>
>>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>>>>>
>>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>>>>>> dm_plane_helper_cleanup_fb.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>>>>>> This should say amdgpu_display_unpin_work_func.
>>>>>>>
>>>>>>> Ah! So that is the classic (e.g. non atomic) path?
>>>>>>
>>>>>> Presumably.
>>>>>>
>>>>>>
>>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>>>>>
>>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>>>>>
>>>>>>> E.g. not pin unbalance, but rather use after free.
>>>>>>
>>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
>>>>>
>>>>>
>>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
>>>>>
>>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
>>>>>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
>>>>>
>>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
>>>>>    Next call.
>>>>>
>>>>> The problem is the pinned buffer of last call to 
>>>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
>>>>
>>>> It's unpinned in dce_v*_0_crtc_disable.
>>>
>>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
>>> So it's not unpinned...
>>
>> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
>>
>> Have you checked for the issue I described below? Should be pretty easy to catch.
>>
>>
>>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
> 
> Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
> never called.

It would be expected to happen when the screen turns off, e.g. due to DPMS.


> Though a single call to dce_v*_0_crtc_do_set_base() will 
> only pin the BO, I found it will be unpinned in next call to 
> dce_v*_0_crtc_do_set_base().

Yeah, that's the normal case when the new BO is different from the old one.

To catch the case I described, try something like

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c

index 18a7b3bd633b..5726bd87a355 100644

--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c

@@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,

                return r;



        if (!atomic) {

+               WARN_ON_ONCE(target_fb == fb);

                r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);

                if (unlikely(r != 0)) {

                        amdgpu_bo_unreserve(abo);



-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-15 12:04                                 ` Michel Dänzer
@ 2021-11-16  3:27                                   ` Lang Yu
  2021-11-16  7:14                                     ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-16  3:27 UTC (permalink / raw)
  To: Michel DDDnzer
  Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
	Christian KKKnig

On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> On 2021-11-15 12:31, Lang Yu wrote:
> > On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
> >> On 2021-11-15 10:04, Lang Yu wrote:
> >>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> >>>> On 2021-11-15 07:41, Lang Yu wrote:
> >>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >>>>>> On 2021-11-12 16:03, Christian König wrote:
> >>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>>>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Qz2VKDE0%2BcIuXZMy0IL3NLZAjeXLimwl8PPZh4Cp4iE%3D&reserved=0
> >>>>>>>>>>
> >>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>>>>>
> >>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>>>>>> dm_plane_helper_cleanup_fb.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>>>>>> This should say amdgpu_display_unpin_work_func.
> >>>>>>>
> >>>>>>> Ah! So that is the classic (e.g. non atomic) path?
> >>>>>>
> >>>>>> Presumably.
> >>>>>>
> >>>>>>
> >>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>>>>>
> >>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>>>>>
> >>>>>>> E.g. not pin unbalance, but rather use after free.
> >>>>>>
> >>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >>>>>
> >>>>>
> >>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >>>>>
> >>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >>>>>    (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >>>>>
> >>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >>>>>    Next call.
> >>>>>
> >>>>> The problem is the pinned buffer of last call to 
> >>>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
> >>>>
> >>>> It's unpinned in dce_v*_0_crtc_disable.
> >>>
> >>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> >>> So it's not unpinned...
> >>
> >> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
> >>
> >> Have you checked for the issue I described below? Should be pretty easy to catch.
> >>
> >>
> >>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
> > 
> > Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
> > never called.
> 
> It would be expected to happen when the screen turns off, e.g. due to DPMS.
> 
> 
> > Though a single call to dce_v*_0_crtc_do_set_base() will 
> > only pin the BO, I found it will be unpinned in next call to 
> > dce_v*_0_crtc_do_set_base().
> 
> Yeah, that's the normal case when the new BO is different from the old one.
> 
> To catch the case I described, try something like
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> index 18a7b3bd633b..5726bd87a355 100644
> 
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> 
> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> 
>                 return r;
> 
> 
> 
>         if (!atomic) {
> 
> +               WARN_ON_ONCE(target_fb == fb);
> 
>                 r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> 
>                 if (unlikely(r != 0)) {
> 
>                         amdgpu_bo_unreserve(abo);
>

I did some tests, the warning can be triggered.

pin/unpin operations in *_crtc_do_set_base() and
amdgpu_display_crtc_page_flip_target() are mixed.

Regards,
Lang

> 
> -- 
> Earthling Michel Dänzer            |                  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=jct%2F2NG7LamVRzzpkSy9XjQIDZRBIutSeK3VQ2jh524%3D&reserved=0
> Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-16  3:27                                   ` Lang Yu
@ 2021-11-16  7:14                                     ` Christian König
  2021-11-16  8:00                                       ` Lang Yu
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-16  7:14 UTC (permalink / raw)
  To: Lang Yu, Michel DDDnzer
  Cc: dri-devel, Kazlauskas, Nicholas, amd-gfx list, Christian KKKnig

Am 16.11.21 um 04:27 schrieb Lang Yu:
> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>> [SNIP]
>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>> only pin the BO, I found it will be unpinned in next call to
>>> dce_v*_0_crtc_do_set_base().
>> Yeah, that's the normal case when the new BO is different from the old one.
>>
>> To catch the case I described, try something like
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> index 18a7b3bd633b..5726bd87a355 100644
>>
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>
>>                  return r;
>>
>>
>>
>>          if (!atomic) {
>>
>> +               WARN_ON_ONCE(target_fb == fb);
>>
>>                  r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>
>>                  if (unlikely(r != 0)) {
>>
>>                          amdgpu_bo_unreserve(abo);
>>
> I did some tests, the warning can be triggered.
>
> pin/unpin operations in *_crtc_do_set_base() and
> amdgpu_display_crtc_page_flip_target() are mixed.

Ok sounds like we narrowed down the root cause pretty well.

Question is now how can we fix this? Just not pin the BO when target_fb 
== fb?

Thanks,
Christian.

>
> Regards,
> Lang
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-16  7:14                                     ` Christian König
@ 2021-11-16  8:00                                       ` Lang Yu
  2021-11-16  8:09                                         ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-16  8:00 UTC (permalink / raw)
  To: Christian Koenig
  Cc: dri-devel, Michel DDDnzer, Kazlauskas, Nicholas, amd-gfx list,
	Christian KKKnig

On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
> Am 16.11.21 um 04:27 schrieb Lang Yu:
> > On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> > > [SNIP]
> > > > Though a single call to dce_v*_0_crtc_do_set_base() will
> > > > only pin the BO, I found it will be unpinned in next call to
> > > > dce_v*_0_crtc_do_set_base().
> > > Yeah, that's the normal case when the new BO is different from the old one.
> > > 
> > > To catch the case I described, try something like
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > > 
> > > index 18a7b3bd633b..5726bd87a355 100644
> > > 
> > > --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > > 
> > > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > > 
> > > @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> > > 
> > >                  return r;
> > > 
> > > 
> > > 
> > >          if (!atomic) {
> > > 
> > > +               WARN_ON_ONCE(target_fb == fb);
> > > 
> > >                  r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> > > 
> > >                  if (unlikely(r != 0)) {
> > > 
> > >                          amdgpu_bo_unreserve(abo);
> > > 
> > I did some tests, the warning can be triggered.
> > 
> > pin/unpin operations in *_crtc_do_set_base() and
> > amdgpu_display_crtc_page_flip_target() are mixed.
> 
> Ok sounds like we narrowed down the root cause pretty well.
> 
> Question is now how can we fix this? Just not pin the BO when target_fb ==
> fb?

That worked. I did a few simple tests and didn't observe ttm_bo_release warnings 
any more.

The pin/unpin logic,

1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
old_fb is NULL.

2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
unpins old fb.

3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.

4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)

5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.

.....

x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.

And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.

If the logic is wrong, please correct me.

Regards,
Lang

> Thanks,
> Christian.
> 
> > 
> > Regards,
> > Lang
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-16  8:00                                       ` Lang Yu
@ 2021-11-16  8:09                                         ` Christian König
  2021-11-16 14:10                                           ` Alex Deucher
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-16  8:09 UTC (permalink / raw)
  To: Lang Yu
  Cc: dri-devel, Michel DDDnzer, Kazlauskas, Nicholas, amd-gfx list,
	Christian KKKnig

Am 16.11.21 um 09:00 schrieb Lang Yu:
> On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
>> Am 16.11.21 um 04:27 schrieb Lang Yu:
>>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>>>> [SNIP]
>>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>>>> only pin the BO, I found it will be unpinned in next call to
>>>>> dce_v*_0_crtc_do_set_base().
>>>> Yeah, that's the normal case when the new BO is different from the old one.
>>>>
>>>> To catch the case I described, try something like
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> index 18a7b3bd633b..5726bd87a355 100644
>>>>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>>>
>>>>                   return r;
>>>>
>>>>
>>>>
>>>>           if (!atomic) {
>>>>
>>>> +               WARN_ON_ONCE(target_fb == fb);
>>>>
>>>>                   r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>>>
>>>>                   if (unlikely(r != 0)) {
>>>>
>>>>                           amdgpu_bo_unreserve(abo);
>>>>
>>> I did some tests, the warning can be triggered.
>>>
>>> pin/unpin operations in *_crtc_do_set_base() and
>>> amdgpu_display_crtc_page_flip_target() are mixed.
>> Ok sounds like we narrowed down the root cause pretty well.
>>
>> Question is now how can we fix this? Just not pin the BO when target_fb ==
>> fb?
> That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
> any more.
>
> The pin/unpin logic,
>
> 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> old_fb is NULL.
>
> 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> unpins old fb.
>
> 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>
> 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
>
> 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>
> .....
>
> x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
>
> And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
>
> If the logic is wrong, please correct me.

I can't fully judge because I'm not that deep with my head in the old 
display code, but from a ten mile high view it sounds sane to me. Michel 
what do you think?

BTW: Nicholas are there any plans to get rid of all that stuff? It would 
be a really nice cleanup of rather flaky code I think.

Thanks,
Christian.

>
> Regards,
> Lang
>
>> Thanks,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-16  8:09                                         ` Christian König
@ 2021-11-16 14:10                                           ` Alex Deucher
  2021-11-16 14:50                                             ` Michel Dänzer
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2021-11-16 14:10 UTC (permalink / raw)
  To: Christian König
  Cc: Michel DDDnzer, amd-gfx list, Christian KKKnig, dri-devel,
	Lang Yu, Kazlauskas, Nicholas

On Tue, Nov 16, 2021 at 3:09 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 16.11.21 um 09:00 schrieb Lang Yu:
> > On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
> >> Am 16.11.21 um 04:27 schrieb Lang Yu:
> >>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> >>>> [SNIP]
> >>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
> >>>>> only pin the BO, I found it will be unpinned in next call to
> >>>>> dce_v*_0_crtc_do_set_base().
> >>>> Yeah, that's the normal case when the new BO is different from the old one.
> >>>>
> >>>> To catch the case I described, try something like
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> index 18a7b3bd633b..5726bd87a355 100644
> >>>>
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> >>>>
> >>>>                   return r;
> >>>>
> >>>>
> >>>>
> >>>>           if (!atomic) {
> >>>>
> >>>> +               WARN_ON_ONCE(target_fb == fb);
> >>>>
> >>>>                   r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> >>>>
> >>>>                   if (unlikely(r != 0)) {
> >>>>
> >>>>                           amdgpu_bo_unreserve(abo);
> >>>>
> >>> I did some tests, the warning can be triggered.
> >>>
> >>> pin/unpin operations in *_crtc_do_set_base() and
> >>> amdgpu_display_crtc_page_flip_target() are mixed.
> >> Ok sounds like we narrowed down the root cause pretty well.
> >>
> >> Question is now how can we fix this? Just not pin the BO when target_fb ==
> >> fb?
> > That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
> > any more.
> >
> > The pin/unpin logic,
> >
> > 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > old_fb is NULL.
> >
> > 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > unpins old fb.
> >
> > 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
> >
> > 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
> >
> > 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
> >
> > .....
> >
> > x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
> >
> > And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
> >
> > If the logic is wrong, please correct me.
>
> I can't fully judge because I'm not that deep with my head in the old
> display code, but from a ten mile high view it sounds sane to me. Michel
> what do you think?
>
> BTW: Nicholas are there any plans to get rid of all that stuff? It would
> be a really nice cleanup of rather flaky code I think.

We just need to add analog support to the DC code.  Darren was looking into it.

Alex


>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Lang
> >
> >> Thanks,
> >> Christian.
> >>
> >>> Regards,
> >>> Lang
> >>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Questions about KMS flip
  2021-11-16 14:10                                           ` Alex Deucher
@ 2021-11-16 14:50                                             ` Michel Dänzer
  0 siblings, 0 replies; 28+ messages in thread
From: Michel Dänzer @ 2021-11-16 14:50 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: dri-devel, Lang Yu, Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas

On 2021-11-16 15:10, Alex Deucher wrote:
> On Tue, Nov 16, 2021 at 3:09 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>> Am 16.11.21 um 09:00 schrieb Lang Yu:
>>> On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
>>>> Am 16.11.21 um 04:27 schrieb Lang Yu:
>>>>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>>>>>> [SNIP]
>>>>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>>>>>> only pin the BO, I found it will be unpinned in next call to
>>>>>>> dce_v*_0_crtc_do_set_base().
>>>>>> Yeah, that's the normal case when the new BO is different from the old one.
>>>>>>
>>>>>> To catch the case I described, try something like
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> index 18a7b3bd633b..5726bd87a355 100644
>>>>>>
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>>>>>
>>>>>>                   return r;
>>>>>>
>>>>>>
>>>>>>
>>>>>>           if (!atomic) {
>>>>>>
>>>>>> +               WARN_ON_ONCE(target_fb == fb);
>>>>>>
>>>>>>                   r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>>>>>
>>>>>>                   if (unlikely(r != 0)) {
>>>>>>
>>>>>>                           amdgpu_bo_unreserve(abo);
>>>>>>
>>>>> I did some tests, the warning can be triggered.
>>>>>
>>>>> pin/unpin operations in *_crtc_do_set_base() and
>>>>> amdgpu_display_crtc_page_flip_target() are mixed.
>>>> Ok sounds like we narrowed down the root cause pretty well.
>>>>
>>>> Question is now how can we fix this? Just not pin the BO when target_fb ==
>>>> fb?
>>> That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
>>> any more.
>>>
>>> The pin/unpin logic,
>>>
>>> 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> old_fb is NULL.
>>>
>>> 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> unpins old fb.
>>>
>>> 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>>>
>>> 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
>>>
>>> 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>>>
>>> .....
>>>
>>> x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
>>>
>>> And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
>>>
>>> If the logic is wrong, please correct me.
>>
>> I can't fully judge because I'm not that deep with my head in the old
>> display code, but from a ten mile high view it sounds sane to me. Michel
>> what do you think?

Sounds good to me.


Would be nice to address the other issue identified in this thread as well:

The pin in amdgpu_display_crtc_page_flip_target is guarded by if (!adev->enable_virtual_display), but the corresponding unpins in 
amdgpu_display_unpin_work_func & dce_v*_0_crtc_disable aren't. This probably results in pin count underflows with virtual display.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-11-16 14:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 12:51 Questions about KMS flip Christian König
2021-11-04 16:44 ` Harry Wentland
2021-11-05 12:20   ` Ville Syrjälä
2021-11-05 18:13   ` Daniel Vetter
2021-11-08  7:44     ` Christian König
2021-11-08 14:59       ` Daniel Vetter
2021-11-10 10:18         ` Christian König
2021-11-10 13:26           ` Daniel Vetter
2021-11-12 12:47             ` Christian König
2021-11-12 12:47               ` Christian König
2021-11-12 14:29               ` Michel Dänzer
2021-11-12 14:30                 ` Michel Dänzer
2021-11-12 15:03                   ` Christian König
2021-11-12 16:10                     ` Michel Dänzer
2021-11-15  6:41                       ` Lang Yu
2021-11-15  6:41                         ` Lang Yu
2021-11-15  8:38                         ` Michel Dänzer
2021-11-15  9:04                           ` Lang Yu
2021-11-15  9:49                             ` Michel Dänzer
2021-11-15 11:31                               ` Lang Yu
2021-11-15 12:04                                 ` Michel Dänzer
2021-11-16  3:27                                   ` Lang Yu
2021-11-16  7:14                                     ` Christian König
2021-11-16  8:00                                       ` Lang Yu
2021-11-16  8:09                                         ` Christian König
2021-11-16 14:10                                           ` Alex Deucher
2021-11-16 14:50                                             ` Michel Dänzer
2021-11-15  7:25                       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.