All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>,
	Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [PATCH] drm/vblank: Avoid storing a timestamp for the same frame twice
Date: Mon, 8 Feb 2021 18:43:53 +0100	[thread overview]
Message-ID: <CAKMK7uG21d_w47eVpGG4JyyT7iXpySHWf9y-+wCOGTXb682WFQ@mail.gmail.com> (raw)
In-Reply-To: <YCFtuRvpm8OpmdR4@intel.com>

On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote:
> > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote:
> > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote:
> > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote:
> > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote:
> > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote:
> > > > > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > >
> > > > > > > > drm_vblank_restore() exists because certain power saving states
> > > > > > > > can clobber the hardware frame counter. The way it does this is
> > > > > > > > by guesstimating how many frames were missed purely based on
> > > > > > > > the difference between the last stored timestamp vs. a newly
> > > > > > > > sampled timestamp.
> > > > > > > >
> > > > > > > > If we should call this function before a full frame has
> > > > > > > > elapsed since we sampled the last timestamp we would end up
> > > > > > > > with a possibly slightly different timestamp value for the
> > > > > > > > same frame. Currently we will happily overwrite the already
> > > > > > > > stored timestamp for the frame with the new value. This
> > > > > > > > could cause userspace to observe two different timestamps
> > > > > > > > for the same frame (and the timestamp could even go
> > > > > > > > backwards depending on how much error we introduce when
> > > > > > > > correcting the timestamp based on the scanout position).
> > > > > > > >
> > > > > > > > To avoid that let's not update the stored timestamp unless we're
> > > > > > > > also incrementing the sequence counter. We do still want to update
> > > > > > > > vblank->last with the freshly sampled hw frame counter value so
> > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame
> > > > > > > > counter to determine how many frames have elapsed.
> > > > > > >
> > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank
> > > > > > > counter?
> > > > > >
> > > > > > Because next time a vblank irq happens the code will do:
> > > > > > diff = current_hw_counter - vblank->last
> > > > > >
> > > > > > which won't work very well if vblank->last is garbage.
> > > > > >
> > > > > > Updating vblank->last is pretty much why drm_vblank_restore()
> > > > > > exists at all.
> > > > >
> > > > > Oh sure, _restore has to update this, together with the timestamp.
> > > > >
> > > > > But your code adds such an update where we update the hw vblank counter,
> > > > > but not the timestamp, and that feels buggy. Either we're still in the
> > > > > same frame, and then we should story nothing. Or we advanced, and then we
> > > > > probably want a new timestampt for that frame too.
> > > >
> > > > Even if we're still in the same frame the hw frame counter may already
> > > > have been reset due to the power well having been turned off. That is
> > > > what I'm trying to fix here.
> > > >
> > > > Now I suppose that's fairly unlikely, at least with PSR which probably
> > > > does impose some extra delays before the power gets yanked. But at least
> > > > theoretically possible.
> > >
> > > Pondering about this a bit further. I think the fact that the current
> > > code takes the round-to-closest approach I used for the vblank handler
> > > is perhaps a bit bad. It could push the seq counter forward if we're
> > > past the halfway point of a frame. I think that rounding behaviour
> > > makes sense for the irq since those tick steadily and so allowing a bit
> > > of error either way seems correct to me. Perhaps round-down might be
> > > the better option for _restore(). Not quites sure, need more thinking
> > > probably.
> >
> > Yes this is the rounding I'm worried about.
>
> Actually I don't think this is really an issue since we are working
> with the corrected timestamps here. Those always line up with
> frames, so unless the correction is really buggy or the hw somehow
> skips a partial frame it should work rather well. At least when
> operating with small timescales. For large gaps the error might
> creep up, but I don't think a small error in the predicted seq
> number over a long timespan is really a problem.

That corrected timestamp is what can go wrong I think: There's no
guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal()
flips to top-of-frame at the exact same time than when the hw vblank
counter flips. Or at least I'm not seeing where we correct them both
together.

> > But your point above that the hw might reset the counter again is also
> > valid. I'm assuming what you're worried about is that we first do a
> > _restore (and the hw vblank counter hasn't been trashed yet), and then in
> > the same frame we do another restore, but now the hw frame counter has
> > been trashe, and we need to update it?
>
> Yeah, although the pre-trashing _restore could also just be
> a vblank irq I think.
>
> >
> > > Another idea that came to me now is that maybe we should actually just
> > > check if the current hw frame counter value looks sane, as in something
> > > like:
> > >
> > > diff_hw_counter = current_hw_counter-stored_hw_counter
> > > diff_ts = (current_ts-stored_ts)/framedur
> > >
> > > if (diff_hw_counter ~= diff_ts)
> > >     diff = diff_hw_counter;
> > > else
> > >     diff = diff_ts;
> > >
> > > and if they seem to match then just keep trusting the hw counter.
> > > So only if there's a significant difference would we disregard
> > > the diff of the hw counter and instead use the diff based on the
> > > timestamps. Not sure what "significant" is though; One frame, two
> > > frames?
> >
> > Hm, another idea: The only point where we can trust the entire hw counter
> > + timestamp sampling is when the irq happens. Because then we know the
> > driver will have properly corrected for any hw oddities (like hw counter
> > flipping not at top-of-frame, like the core expects).
>
> i915 at least gives out correct data regardless of when you sample
> it. Well, except for the cases where the hw counter gets trashed,
> in which case the hw counter is garbage (when compared with .last)
> but the timestamp is still correct.

Hm where/how do we handle this? Maybe I'm just out of date with how it
all works nowadays.

> > So what if _restore always goes back to the last such trusted hw counter
> > for computing the frame counter diff and all that stuff? That way if we
> > have a bunch of _restore with incosisten hw vblank counter, we will a)
> > only take the last one (fixes the bug you're trying to fix) b) still use
> > the same last trusted baseline for computations (addresses the race I'm
> > seeing).
> >
> > Or does this not work?
>
> I don't think I really understand what you're suggesting here.
> _restore is already using the last trusted data (the stored
> timestamp + .last).
>
> So the one thing _restore will have to update is .last.
> I think it can either do what it does now and set .last
> to the current hw counter value + update the timestamp
> to match, or it could perhaps adjust the stored .last
> such that the already stored timestamp and the updated
> .last match up. But I think both of those options have
> the same level or inaccuracy since both would still do
> the same ts_diff->hw_counter_diff prediction.
>
> >
> > It does complicate the code a bit, because we'd need to store the
> > count/timestamp information from _restore outside of the usual vblank ts
> > array. But I think that addresses everything.
>
> Hmm. So restore would store this extra information
> somewhere else, and not update the normal stuff at all?
> What exactly would we do with that extra data?

Hm I guess I didn't think this through. But the idea I had was:
- _restore always recomputes back from the las
drm_crtc_handl_vblank-stored timestamp.
- the first drm_crtc_handle_vblank bakes in any corrections that
_restore has prepared meanwhile
- same applies to all the sampling functions we might look at lastes
timestamps/counter values.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>,
	Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH] drm/vblank: Avoid storing a timestamp for the same frame twice
Date: Mon, 8 Feb 2021 18:43:53 +0100	[thread overview]
Message-ID: <CAKMK7uG21d_w47eVpGG4JyyT7iXpySHWf9y-+wCOGTXb682WFQ@mail.gmail.com> (raw)
In-Reply-To: <YCFtuRvpm8OpmdR4@intel.com>

On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote:
> > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote:
> > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote:
> > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote:
> > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote:
> > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote:
> > > > > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > >
> > > > > > > > drm_vblank_restore() exists because certain power saving states
> > > > > > > > can clobber the hardware frame counter. The way it does this is
> > > > > > > > by guesstimating how many frames were missed purely based on
> > > > > > > > the difference between the last stored timestamp vs. a newly
> > > > > > > > sampled timestamp.
> > > > > > > >
> > > > > > > > If we should call this function before a full frame has
> > > > > > > > elapsed since we sampled the last timestamp we would end up
> > > > > > > > with a possibly slightly different timestamp value for the
> > > > > > > > same frame. Currently we will happily overwrite the already
> > > > > > > > stored timestamp for the frame with the new value. This
> > > > > > > > could cause userspace to observe two different timestamps
> > > > > > > > for the same frame (and the timestamp could even go
> > > > > > > > backwards depending on how much error we introduce when
> > > > > > > > correcting the timestamp based on the scanout position).
> > > > > > > >
> > > > > > > > To avoid that let's not update the stored timestamp unless we're
> > > > > > > > also incrementing the sequence counter. We do still want to update
> > > > > > > > vblank->last with the freshly sampled hw frame counter value so
> > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame
> > > > > > > > counter to determine how many frames have elapsed.
> > > > > > >
> > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank
> > > > > > > counter?
> > > > > >
> > > > > > Because next time a vblank irq happens the code will do:
> > > > > > diff = current_hw_counter - vblank->last
> > > > > >
> > > > > > which won't work very well if vblank->last is garbage.
> > > > > >
> > > > > > Updating vblank->last is pretty much why drm_vblank_restore()
> > > > > > exists at all.
> > > > >
> > > > > Oh sure, _restore has to update this, together with the timestamp.
> > > > >
> > > > > But your code adds such an update where we update the hw vblank counter,
> > > > > but not the timestamp, and that feels buggy. Either we're still in the
> > > > > same frame, and then we should story nothing. Or we advanced, and then we
> > > > > probably want a new timestampt for that frame too.
> > > >
> > > > Even if we're still in the same frame the hw frame counter may already
> > > > have been reset due to the power well having been turned off. That is
> > > > what I'm trying to fix here.
> > > >
> > > > Now I suppose that's fairly unlikely, at least with PSR which probably
> > > > does impose some extra delays before the power gets yanked. But at least
> > > > theoretically possible.
> > >
> > > Pondering about this a bit further. I think the fact that the current
> > > code takes the round-to-closest approach I used for the vblank handler
> > > is perhaps a bit bad. It could push the seq counter forward if we're
> > > past the halfway point of a frame. I think that rounding behaviour
> > > makes sense for the irq since those tick steadily and so allowing a bit
> > > of error either way seems correct to me. Perhaps round-down might be
> > > the better option for _restore(). Not quites sure, need more thinking
> > > probably.
> >
> > Yes this is the rounding I'm worried about.
>
> Actually I don't think this is really an issue since we are working
> with the corrected timestamps here. Those always line up with
> frames, so unless the correction is really buggy or the hw somehow
> skips a partial frame it should work rather well. At least when
> operating with small timescales. For large gaps the error might
> creep up, but I don't think a small error in the predicted seq
> number over a long timespan is really a problem.

That corrected timestamp is what can go wrong I think: There's no
guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal()
flips to top-of-frame at the exact same time than when the hw vblank
counter flips. Or at least I'm not seeing where we correct them both
together.

> > But your point above that the hw might reset the counter again is also
> > valid. I'm assuming what you're worried about is that we first do a
> > _restore (and the hw vblank counter hasn't been trashed yet), and then in
> > the same frame we do another restore, but now the hw frame counter has
> > been trashe, and we need to update it?
>
> Yeah, although the pre-trashing _restore could also just be
> a vblank irq I think.
>
> >
> > > Another idea that came to me now is that maybe we should actually just
> > > check if the current hw frame counter value looks sane, as in something
> > > like:
> > >
> > > diff_hw_counter = current_hw_counter-stored_hw_counter
> > > diff_ts = (current_ts-stored_ts)/framedur
> > >
> > > if (diff_hw_counter ~= diff_ts)
> > >     diff = diff_hw_counter;
> > > else
> > >     diff = diff_ts;
> > >
> > > and if they seem to match then just keep trusting the hw counter.
> > > So only if there's a significant difference would we disregard
> > > the diff of the hw counter and instead use the diff based on the
> > > timestamps. Not sure what "significant" is though; One frame, two
> > > frames?
> >
> > Hm, another idea: The only point where we can trust the entire hw counter
> > + timestamp sampling is when the irq happens. Because then we know the
> > driver will have properly corrected for any hw oddities (like hw counter
> > flipping not at top-of-frame, like the core expects).
>
> i915 at least gives out correct data regardless of when you sample
> it. Well, except for the cases where the hw counter gets trashed,
> in which case the hw counter is garbage (when compared with .last)
> but the timestamp is still correct.

Hm where/how do we handle this? Maybe I'm just out of date with how it
all works nowadays.

> > So what if _restore always goes back to the last such trusted hw counter
> > for computing the frame counter diff and all that stuff? That way if we
> > have a bunch of _restore with incosisten hw vblank counter, we will a)
> > only take the last one (fixes the bug you're trying to fix) b) still use
> > the same last trusted baseline for computations (addresses the race I'm
> > seeing).
> >
> > Or does this not work?
>
> I don't think I really understand what you're suggesting here.
> _restore is already using the last trusted data (the stored
> timestamp + .last).
>
> So the one thing _restore will have to update is .last.
> I think it can either do what it does now and set .last
> to the current hw counter value + update the timestamp
> to match, or it could perhaps adjust the stored .last
> such that the already stored timestamp and the updated
> .last match up. But I think both of those options have
> the same level or inaccuracy since both would still do
> the same ts_diff->hw_counter_diff prediction.
>
> >
> > It does complicate the code a bit, because we'd need to store the
> > count/timestamp information from _restore outside of the usual vblank ts
> > array. But I think that addresses everything.
>
> Hmm. So restore would store this extra information
> somewhere else, and not update the normal stuff at all?
> What exactly would we do with that extra data?

Hm I guess I didn't think this through. But the idea I had was:
- _restore always recomputes back from the las
drm_crtc_handl_vblank-stored timestamp.
- the first drm_crtc_handle_vblank bakes in any corrections that
_restore has prepared meanwhile
- same applies to all the sampling functions we might look at lastes
timestamps/counter values.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-02-08 17:44 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04  2:04 [PATCH] drm/vblank: Avoid storing a timestamp for the same frame twice Ville Syrjala
2021-02-04  2:04 ` [Intel-gfx] " Ville Syrjala
2021-02-04  3:12 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-02-04  5:44 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-02-04 15:32 ` [PATCH] " Daniel Vetter
2021-02-04 15:32   ` [Intel-gfx] " Daniel Vetter
2021-02-04 15:55   ` Ville Syrjälä
2021-02-04 15:55     ` [Intel-gfx] " Ville Syrjälä
2021-02-05 15:46     ` Daniel Vetter
2021-02-05 15:46       ` [Intel-gfx] " Daniel Vetter
2021-02-05 16:24       ` Ville Syrjälä
2021-02-05 16:24         ` [Intel-gfx] " Ville Syrjälä
2021-02-05 21:19         ` Ville Syrjälä
2021-02-05 21:19           ` [Intel-gfx] " Ville Syrjälä
2021-02-08  9:56           ` Daniel Vetter
2021-02-08  9:56             ` [Intel-gfx] " Daniel Vetter
2021-02-08 16:58             ` Ville Syrjälä
2021-02-08 16:58               ` [Intel-gfx] " Ville Syrjälä
2021-02-08 17:43               ` Daniel Vetter [this message]
2021-02-08 17:43                 ` Daniel Vetter
2021-02-08 18:05                 ` Ville Syrjälä
2021-02-08 18:05                   ` [Intel-gfx] " Ville Syrjälä
2021-02-09 10:07 ` Daniel Vetter
2021-02-09 10:07   ` [Intel-gfx] " Daniel Vetter
2021-02-09 15:40   ` Ville Syrjälä
2021-02-09 15:40     ` [Intel-gfx] " Ville Syrjälä
2021-02-09 16:44     ` Daniel Vetter
2021-02-09 16:44       ` [Intel-gfx] " Daniel Vetter
2021-02-18 16:03 ` [PATCH v2] drm/vblank: Do not store a new vblank timestamp in drm_vblank_restore() Ville Syrjala
2021-02-18 16:03   ` [Intel-gfx] " Ville Syrjala
2021-02-18 16:10   ` Ville Syrjälä
2021-02-18 16:10     ` [Intel-gfx] " Ville Syrjälä
2021-02-19 15:08   ` Daniel Vetter
2021-02-19 15:08     ` [Intel-gfx] " Daniel Vetter
2021-02-19 15:47     ` Ville Syrjälä
2021-02-19 15:47       ` [Intel-gfx] " Ville Syrjälä
2021-02-18 19:08 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/vblank: Avoid storing a timestamp for the same frame twice (rev2) Patchwork
2021-02-18 19:22   ` Ville Syrjälä
2021-02-18 19:51     ` Vudum, Lakshminarayana
2021-02-18 19:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-02-18 20:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-02-21  4:18 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/vblank: Avoid storing a timestamp for the same frame twice (rev3) Patchwork
2021-02-21  5:41 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKMK7uG21d_w47eVpGG4JyyT7iXpySHWf9y-+wCOGTXb682WFQ@mail.gmail.com \
    --to=daniel@ffwll.ch \
    --cc=dhinakaran.pandiyan@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.