All of lore.kernel.org
 help / color / mirror / Atom feed
From: Melissa Wen <melissa.srw@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>,
	Haneen Mohammed <hamohammed.sa@gmail.com>,
	David Airlie <airlied@linux.ie>,
	Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	kernel-usp@googlegroups.com, twoerner@gmail.com
Subject: Re: [PATCH] drm/vkms: add missing drm_crtc_vblank_put to the get/put pair on flush
Date: Wed, 22 Jul 2020 11:06:04 -0300	[thread overview]
Message-ID: <20200722140604.27dfzfnzug5vb75r@smtp.gmail.com> (raw)
In-Reply-To: <20200722120502.GK6419@phenom.ffwll.local>

On 07/22, daniel@ffwll.ch wrote:
> On Wed, Jul 22, 2020 at 08:04:11AM -0300, Melissa Wen wrote:
> > This patch adds a missing drm_crtc_vblank_put op to the pair
> > drm_crtc_vblank_get/put (inc/decrement counter to guarantee vblanks).
> > 
> > It clears the execution of the following kms_cursor_crc subtests:
> > 1. pipe-A-cursor-[size,alpha-opaque, NxN-(on-screen, off-screen, sliding,
> >    random, fast-moving])] - successful when running individually.
> > 2. pipe-A-cursor-dpms passes again
> > 3. pipe-A-cursor-suspend also passes
> > 
> > The issue was initially tracked in the sequential execution of IGT
> > kms_cursor_crc subtest: when running the test sequence or one of its
> > subtests twice, the odd execs complete and the pairs get stuck in an
> > endless wait. In the IGT code, calling a wait_for_vblank before the start
> > of CRC capture prevented the busy-wait. But the problem persisted in the
> > pipe-A-cursor-dpms and -suspend subtests.
> > 
> > Checking the history, the pipe-A-cursor-dpms subtest was successful when,
> > in vkms_atomic_commit_tail, instead of using the flip_done op, it used
> > wait_for_vblanks. Another way to prevent blocking was wait_one_vblank when
> > enabling crtc. However, in both cases, pipe-A-cursor-suspend persisted
> > blocking in the 2nd start of CRC capture, which may indicate that
> > something got stuck in the step of CRC setup. Indeed, wait_one_vblank in
> > the crc setup was able to sync things and free all kms_cursor_crc
> > subtests.
> > 
> > Tracing and comparing a clean run with a blocked one:
> > - in a clean one, vkms_crtc_atomic_flush enables vblanks;
> > - when blocked, only in next op, vkms_crtc_atomic_enable, the vblanks
> > started. Moreover, a series of vkms_vblank_simulate flow out until
> > disabling vblanks.
> > Also watching the steps of vkms_crtc_atomic_flush, when the very first
> > drm_crtc_vblank_get returned an error, the subtest crashed. On the other
> > hand, when vblank_get succeeded, the subtest completed. Finally, checking
> > the flush steps: it increases counter to hold a vblank reference (get),
> > but there isn't a op to decreased it and release vblanks (put).
> > 
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> > Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
> > Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_crtc.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> > index ac85e17428f8..a99d6b4a92dd 100644
> > --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> > @@ -246,6 +246,7 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
> >  
> >  		spin_unlock(&crtc->dev->event_lock);
> >  
> > +		drm_crtc_vblank_put(crtc);
> 
> Uh so I reviewed this a bit more carefully now, and I dont think this is
> the correct bugfix. From the kerneldoc of drm_crtc_arm_vblank_event():
> 
>  * Caller must hold a vblank reference for the event @e acquired by a
>  * drm_crtc_vblank_get(), which will be dropped when the next vblank arrives.
> 
> So when we call drm_crtc_arm_vblank_event then the vblank_put gets called
> for us. And that's the only case where we successfully acquired a vblank
> interrupt reference since on failure of drm_crtc_vblank_get (0 indicates
> success for that function, failure negative error number) we directly send
> out the event.
> 
> So something else fishy is going on, and now I'm totally confused why this
> even happens.
> 
> We also have a pile of WARN_ON checks in drm_crtc_vblank_put to make sure
> we don't underflow the refcount, so it's also not that I think (except if
> this patch creates more WARNING backtraces).
> 
> But clearly it changes behaviour somehow ... can you try to figure out
> what changes? Maybe print out the vblank->refcount at various points in
> the driver, and maybe also trace when exactly the fake vkms vblank hrtimer
> is enabled/disabled ...

:(

I can check these, but I also have other suspicions. When I place the
drm_crct_vblank_put out of the if (at the end of flush), it not only solve
the issue of blocking on kms_cursor_crc, but also the WARN_ON on kms_flip
doesn't appear anymore (a total cleanup). Just after:

vkms_output->composer_state = to_vkms_crtc_state(crtc->state);

looks like there is something stuck around here.

Besides, there is a lock at atomic_begin:

  /* This lock is held across the atomic commit to block vblank timer
   * from scheduling vkms_composer_worker until the composer is updated
   */
  spin_lock_irq(&vkms_output->lock);

that seems to be released on atomic_flush and make me suspect something
missing on the composer update.

I'll check all these things and come back with news (hope) :)

Thanks,

Melissa
> 
> I'm totally confused about what's going on here now.
> -Daniel
> 
> >  		crtc->state->event = NULL;
> >  	}
> >  
> > -- 
> > 2.27.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Melissa Wen <melissa.srw@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Haneen Mohammed <hamohammed.sa@gmail.com>,
	Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>,
	David Airlie <airlied@linux.ie>,
	Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	kernel-usp@googlegroups.com
Subject: Re: [PATCH] drm/vkms: add missing drm_crtc_vblank_put to the get/put pair on flush
Date: Wed, 22 Jul 2020 11:06:04 -0300	[thread overview]
Message-ID: <20200722140604.27dfzfnzug5vb75r@smtp.gmail.com> (raw)
In-Reply-To: <20200722120502.GK6419@phenom.ffwll.local>

On 07/22, daniel@ffwll.ch wrote:
> On Wed, Jul 22, 2020 at 08:04:11AM -0300, Melissa Wen wrote:
> > This patch adds a missing drm_crtc_vblank_put op to the pair
> > drm_crtc_vblank_get/put (inc/decrement counter to guarantee vblanks).
> > 
> > It clears the execution of the following kms_cursor_crc subtests:
> > 1. pipe-A-cursor-[size,alpha-opaque, NxN-(on-screen, off-screen, sliding,
> >    random, fast-moving])] - successful when running individually.
> > 2. pipe-A-cursor-dpms passes again
> > 3. pipe-A-cursor-suspend also passes
> > 
> > The issue was initially tracked in the sequential execution of IGT
> > kms_cursor_crc subtest: when running the test sequence or one of its
> > subtests twice, the odd execs complete and the pairs get stuck in an
> > endless wait. In the IGT code, calling a wait_for_vblank before the start
> > of CRC capture prevented the busy-wait. But the problem persisted in the
> > pipe-A-cursor-dpms and -suspend subtests.
> > 
> > Checking the history, the pipe-A-cursor-dpms subtest was successful when,
> > in vkms_atomic_commit_tail, instead of using the flip_done op, it used
> > wait_for_vblanks. Another way to prevent blocking was wait_one_vblank when
> > enabling crtc. However, in both cases, pipe-A-cursor-suspend persisted
> > blocking in the 2nd start of CRC capture, which may indicate that
> > something got stuck in the step of CRC setup. Indeed, wait_one_vblank in
> > the crc setup was able to sync things and free all kms_cursor_crc
> > subtests.
> > 
> > Tracing and comparing a clean run with a blocked one:
> > - in a clean one, vkms_crtc_atomic_flush enables vblanks;
> > - when blocked, only in next op, vkms_crtc_atomic_enable, the vblanks
> > started. Moreover, a series of vkms_vblank_simulate flow out until
> > disabling vblanks.
> > Also watching the steps of vkms_crtc_atomic_flush, when the very first
> > drm_crtc_vblank_get returned an error, the subtest crashed. On the other
> > hand, when vblank_get succeeded, the subtest completed. Finally, checking
> > the flush steps: it increases counter to hold a vblank reference (get),
> > but there isn't a op to decreased it and release vblanks (put).
> > 
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> > Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
> > Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_crtc.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> > index ac85e17428f8..a99d6b4a92dd 100644
> > --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> > @@ -246,6 +246,7 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
> >  
> >  		spin_unlock(&crtc->dev->event_lock);
> >  
> > +		drm_crtc_vblank_put(crtc);
> 
> Uh so I reviewed this a bit more carefully now, and I dont think this is
> the correct bugfix. From the kerneldoc of drm_crtc_arm_vblank_event():
> 
>  * Caller must hold a vblank reference for the event @e acquired by a
>  * drm_crtc_vblank_get(), which will be dropped when the next vblank arrives.
> 
> So when we call drm_crtc_arm_vblank_event then the vblank_put gets called
> for us. And that's the only case where we successfully acquired a vblank
> interrupt reference since on failure of drm_crtc_vblank_get (0 indicates
> success for that function, failure negative error number) we directly send
> out the event.
> 
> So something else fishy is going on, and now I'm totally confused why this
> even happens.
> 
> We also have a pile of WARN_ON checks in drm_crtc_vblank_put to make sure
> we don't underflow the refcount, so it's also not that I think (except if
> this patch creates more WARNING backtraces).
> 
> But clearly it changes behaviour somehow ... can you try to figure out
> what changes? Maybe print out the vblank->refcount at various points in
> the driver, and maybe also trace when exactly the fake vkms vblank hrtimer
> is enabled/disabled ...

:(

I can check these, but I also have other suspicions. When I place the
drm_crct_vblank_put out of the if (at the end of flush), it not only solve
the issue of blocking on kms_cursor_crc, but also the WARN_ON on kms_flip
doesn't appear anymore (a total cleanup). Just after:

vkms_output->composer_state = to_vkms_crtc_state(crtc->state);

looks like there is something stuck around here.

Besides, there is a lock at atomic_begin:

  /* This lock is held across the atomic commit to block vblank timer
   * from scheduling vkms_composer_worker until the composer is updated
   */
  spin_lock_irq(&vkms_output->lock);

that seems to be released on atomic_flush and make me suspect something
missing on the composer update.

I'll check all these things and come back with news (hope) :)

Thanks,

Melissa
> 
> I'm totally confused about what's going on here now.
> -Daniel
> 
> >  		crtc->state->event = NULL;
> >  	}
> >  
> > -- 
> > 2.27.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-07-22 14:06 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 11:04 [PATCH] drm/vkms: add missing drm_crtc_vblank_put to the get/put pair on flush Melissa Wen
2020-07-22 11:04 ` Melissa Wen
2020-07-22 12:05 ` daniel
2020-07-22 12:05   ` daniel
2020-07-22 14:06   ` Melissa Wen [this message]
2020-07-22 14:06     ` Melissa Wen
2020-07-22 15:17     ` Daniel Vetter
2020-07-22 15:17       ` Daniel Vetter
2020-07-25  1:17       ` Sidong Yang
2020-07-25  1:17         ` Sidong Yang
2020-07-25 15:57         ` Daniel Vetter
2020-07-25 15:57           ` Daniel Vetter
2020-07-25 17:45           ` Melissa Wen
2020-07-25 17:45             ` Melissa Wen
2020-07-25 18:12             ` Daniel Vetter
2020-07-25 18:12               ` Daniel Vetter
2020-07-25 18:49               ` Melissa Wen
2020-07-25 18:49                 ` Melissa Wen
2020-07-25 19:19                 ` Melissa Wen
2020-07-25 19:19                   ` Melissa Wen
2020-07-25 19:29                   ` Melissa Wen
2020-07-25 19:29                     ` Melissa Wen
2020-07-26 10:26                     ` Daniel Vetter
2020-07-26 10:26                       ` Daniel Vetter
2020-07-28 16:16                       ` Sidong Yang
2020-07-28 16:16                         ` Sidong Yang
2020-07-28 21:55                         ` daniel
2020-07-28 21:55                           ` daniel
2020-07-29 19:09               ` Melissa Wen
2020-07-29 19:09                 ` Melissa Wen
2020-07-29 21:48                 ` Daniel Vetter
2020-07-29 21:48                   ` Daniel Vetter
2020-07-30 10:09                   ` Melissa Wen
2020-07-30 10:09                     ` Melissa Wen
2020-07-31  9:08                     ` daniel
2020-07-31  9:08                       ` daniel
2020-07-31 16:13                       ` Sidong Yang
2020-07-31 16:13                         ` Sidong Yang
2020-07-31 16:47                         ` Melissa Wen
2020-07-31 16:47                           ` Melissa Wen
2020-07-31 17:29                           ` Leandro Ribeiro
2020-07-31 17:36                           ` Leandro Ribeiro
2020-07-31 17:36                             ` Leandro Ribeiro
2020-07-31 18:10                           ` Leandro Ribeiro
2020-07-31 18:10                             ` Leandro Ribeiro
2020-07-31 18:33                           ` Daniel Vetter
2020-07-31 18:33                             ` Daniel Vetter
2020-07-31 18:39                             ` Leandro Ribeiro
2020-07-31 18:39                               ` Leandro Ribeiro
2020-08-01 16:06                             ` Sidong Yang
2020-08-01 16:06                               ` Sidong Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200722140604.27dfzfnzug5vb75r@smtp.gmail.com \
    --to=melissa.srw@gmail.com \
    --cc=Rodrigo.Siqueira@amd.com \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hamohammed.sa@gmail.com \
    --cc=kernel-usp@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rodrigosiqueiramelo@gmail.com \
    --cc=twoerner@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.