All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@intel.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Haneen Mohammed <hamohammed.sa@gmail.com>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	Shayenne Moura <shayenneluzmoura@gmail.com>
Subject: Re: [PATCH 01/10] drm/vkms: Fix crc worker races
Date: Mon, 17 Jun 2019 23:39:31 -0300	[thread overview]
Message-ID: <20190618023931.ccs3glcvh5hnm4m3@smtp.gmail.com> (raw)
In-Reply-To: <20190612144849.GP2458@phenom.ffwll.local>

On 06/12, Daniel Vetter wrote:
> On Wed, Jun 12, 2019 at 10:33:11AM -0300, Rodrigo Siqueira wrote:
> > On Thu, Jun 6, 2019 at 7:28 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > The issue we have is that the crc worker might fall behind. We've
> > > tried to handle this by tracking both the earliest frame for which it
> > > still needs to compute a crc, and the last one. Plus when the
> > > crtc_state changes, we have a new work item, which are all run in
> > > order due to the ordered workqueue we allocate for each vkms crtc.
> > >
> > > Trouble is there's been a few small issues in the current code:
> > > - we need to capture frame_end in the vblank hrtimer, not in the
> > >   worker. The worker might run much later, and then we generate a lot
> > >   of crc for which there's already a different worker queued up.
> > > - frame number might be 0, so create a new crc_pending boolean to
> > >   track this without confusion.
> > > - we need to atomically grab frame_start/end and clear it, so do that
> > >   all in one go. This is not going to create a new race, because if we
> > >   race with the hrtimer then our work will be re-run.
> > > - only race that can happen is the following:
> > >   1. worker starts
> > >   2. hrtimer runs and updates frame_end
> > >   3. worker grabs frame_start/end, already reading the new frame_end,
> > >   and clears crc_pending
> > >   4. hrtimer calls queue_work()
> > >   5. worker completes
> > >   6. worker gets  re-run, crc_pending is false
> > >   Explain this case a bit better by rewording the comment.
> > >
> > > v2: Demote warning level output to debug when we fail to requeue, this
> > > is expected under high load when the crc worker can't quite keep up.
> > >
> > > Cc: Shayenne Moura <shayenneluzmoura@gmail.com>
> > > Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_crc.c  | 27 +++++++++++++--------------
> > >  drivers/gpu/drm/vkms/vkms_crtc.c |  9 +++++++--
> > >  drivers/gpu/drm/vkms/vkms_drv.h  |  2 ++
> > >  3 files changed, 22 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_crc.c b/drivers/gpu/drm/vkms/vkms_crc.c
> > > index d7b409a3c0f8..66603da634fe 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_crc.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_crc.c
> > > @@ -166,16 +166,24 @@ void vkms_crc_work_handle(struct work_struct *work)
> > >         struct drm_plane *plane;
> > >         u32 crc32 = 0;
> > >         u64 frame_start, frame_end;
> > > +       bool crc_pending;
> > >         unsigned long flags;
> > >
> > >         spin_lock_irqsave(&out->state_lock, flags);
> > >         frame_start = crtc_state->frame_start;
> > >         frame_end = crtc_state->frame_end;
> > > +       crc_pending = crtc_state->crc_pending;
> > > +       crtc_state->frame_start = 0;
> > > +       crtc_state->frame_end = 0;
> > > +       crtc_state->crc_pending = false;
> > >         spin_unlock_irqrestore(&out->state_lock, flags);
> > >
> > > -       /* _vblank_handle() hasn't updated frame_start yet */
> > > -       if (!frame_start || frame_start == frame_end)
> > > -               goto out;
> > > +       /*
> > > +        * We raced with the vblank hrtimer and previous work already computed
> > > +        * the crc, nothing to do.
> > > +        */
> > > +       if (!crc_pending)
> > > +               return;
> > 
> > I think this condition is not reachable because crc_pending will be
> > filled with true in `vkms_vblank_simulate()` which in turn schedule
> > the function `vkms_crc_work_handle()`. Just for checking, I tried to
> > reach this condition by running kms_flip, kms_pipe_crc_basic, and
> > kms_cursor_crc with two different VM setups[1], but I couldn't reach
> > it. What do you think?
> 
> thread A			thread B
> 1. run vblank hrtimer
> 
> 				2. starts running crc work (from previous
> 				vblank)
> 
> 3. spin_lock()			-> gets stalled on the spin_lock() because
> 				   thread A has it already
> 
> 4. update frame_end (only in
> later patches, atm this is
> impossible). crc_pending is set
> already.
> 
> 5. schedule_work: since the work
> is running already, this means it
> is scheduled to run once more.
> 
> 6. spin_unlock
> 
> 				7. compute crc, clear crc_pending
> 				8. work finishes
> 				9. work gets run again
> 				8. crc_pending=false
> 
> Since the spin_lock critical section is _very_ short (less than 1 usec I
> bet), this race is very hard to hit.

First of all, thank you very much for all of your detailed explanation
and sorry for my delay to reply, I was 'processing' all of your
comments. I believe that I understood the issues related with this
patchset, I just want to check with you if the diagram and the cases
below make sense:

timer   |------|------|------|------|------|...

Case 1:        +----x +---x  +-----x

Case 2:     A  +----------x
            B         +----x

At the top of this diagram, I illustrated the vblank period along the
time. In the bottom lines, I highlighted two cases; the '+' represents
when the worker is queued (queue_work()), and the 'x' denotes when the
CRC work finishes its data processing. Before describing each case from
the diagram, I want to highlight that I'm focused on these two snippets
of code:

static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) {
    [..]
		spin_lock(&output->crc_lock);
      [..] data [..]
		spin_unlock(&output->crc_lock);
    [..]
}

void vkms_crc_work_handle(struct work_struct *work){
  [..]
  spin_lock_irq(&out->crc_lock);
  crtc_state->crc_pending = false;
    [..] data [..]
	spin_unlock_irq(&output->crc_lock);
  [..]
}

Cases:

1) This is the best scenario; each CRC worker finishes before the next
vblank.

2) In this scenario, one of the CRC workers extends along multiple
vblanks.  If worker A already collected the sensitive data inside
vkms_crc_work_handle(), worker A and B will finish without problems
(thanks to your changes). However, if for any reason, the worker A did
not start before the worker B, the new work will take care of its own
CRC and the CRC from worker A. Finally, since worker B will set
crc_pending equal false when the worker A starts, it'll just return
because of the following code:

if (!crc_pending)
  return;

Make sense?
 
> Exercise: Figure out why schedule_work _must_ schedule the work item to
> re-run if it's running already. If it doesn't do that there's another
> race.
> 
> > 
> > [1] Qemu parameters
> > VM1: -m 1G -smp cores=2,cpus=2
> > VM2: -enable-kvm -m 2G -smp cores=4,cpus=4
> > 
> > >         drm_for_each_plane(plane, &vdev->drm) {
> > >                 struct vkms_plane_state *vplane_state;
> > > @@ -196,20 +204,11 @@ void vkms_crc_work_handle(struct work_struct *work)
> > >         if (primary_crc)
> > >                 crc32 = _vkms_get_crc(primary_crc, cursor_crc);
> > >
> > > -       frame_end = drm_crtc_accurate_vblank_count(crtc);
> > > -
> > > -       /* queue_work can fail to schedule crc_work; add crc for
> > > -        * missing frames
> > > +       /*
> > > +        * The worker can fall behind the vblank hrtimer, make sure we catch up.
> > >          */
> > >         while (frame_start <= frame_end)
> > >                 drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
> > 
> > I want to take this opportunity to ask about this while; It's not
> > really specific to this patch.
> > 
> > I have to admit that I never fully got the idea behind this 'while';
> > it looks like that we just fill out the missed frames with a repeated
> > value. FWIU, `drm_crtc_add_crc_entry()` will add an entry with the CRC
> > information for a frame, but in this case, we are adding the same CRC
> > for a different set of frames. I agree that near frame has a similar
> > CRC value, but could we rely on this all the time? What could happen
> > if we have a great difference from the frame_start and frame_end?
> 
> It's a cheap trick for slow cpu: If the crc work gets behind the vblank
> hrtimer, we need to somehow catch up. With real hw this is not possible,
> but with vkms we simulate the hw. The only quick way to catch up is to
> fill out the same crc for everything. It's a lie, it will make some
> kms_atomic tests fail, but it's the only thing we can really do. Aside
> from trying to make the crc computation code as fast as possible.
> -Daniel
> 
> > 
> > > -
> > > -out:
> > > -       /* to avoid using the same value for frame number again */
> > > -       spin_lock_irqsave(&out->state_lock, flags);
> > > -       crtc_state->frame_end = frame_end;
> > > -       crtc_state->frame_start = 0;
> > > -       spin_unlock_irqrestore(&out->state_lock, flags);
> > >  }
> > >
> > >  static int vkms_crc_parse_source(const char *src_name, bool *enabled)
> > > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > index 1bbe099b7db8..c727d8486e97 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > @@ -30,13 +30,18 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
> > >                  * has read the data
> > >                  */
> > >                 spin_lock(&output->state_lock);
> > > -               if (!state->frame_start)
> > > +               if (!state->crc_pending)
> > >                         state->frame_start = frame;
> > > +               else
> > > +                       DRM_DEBUG_DRIVER("crc worker falling behind, frame_start: %llu, frame_end: %llu\n",
> > > +                                        state->frame_start, frame);
> > > +               state->frame_end = frame;
> > > +               state->crc_pending = true;
> > >                 spin_unlock(&output->state_lock);
> > >
> > >                 ret = queue_work(output->crc_workq, &state->crc_work);
> > >                 if (!ret)
> > > -                       DRM_WARN("failed to queue vkms_crc_work_handle");
> > > +                       DRM_DEBUG_DRIVER("vkms_crc_work_handle already queued\n");
> > >         }
> > >
> > >         spin_unlock(&output->lock);
> > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > index 81f1cfbeb936..3c7e06b19efd 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > @@ -56,6 +56,8 @@ struct vkms_plane_state {
> > >  struct vkms_crtc_state {
> > >         struct drm_crtc_state base;
> > >         struct work_struct crc_work;
> > > +
> > > +       bool crc_pending;
> > >         u64 frame_start;
> > >         u64 frame_end;
> > >  };
> > > --
> > > 2.20.1
> > >
> > 
> > 
> > -- 
> > 
> > Rodrigo Siqueira
> > https://siqueira.tech
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Rodrigo Siqueira
https://siqueira.tech
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2019-06-18  2:39 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 22:27 [PATCH 00/10] drm/vkms: rework crc worker Daniel Vetter
2019-06-06 22:27 ` [PATCH 01/10] drm/vkms: Fix crc worker races Daniel Vetter
2019-06-12 13:33   ` Rodrigo Siqueira
2019-06-12 14:48     ` Daniel Vetter
2019-06-18  2:39       ` Rodrigo Siqueira [this message]
2019-06-18  8:49         ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 02/10] drm/vkms: Use spin_lock_irq in process context Daniel Vetter
2019-06-12 13:34   ` Rodrigo Siqueira
2019-06-12 14:54     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 03/10] drm/vkms: Rename vkms_output.state_lock to crc_lock Daniel Vetter
2019-06-12 13:38   ` Rodrigo Siqueira
2019-06-13  7:48     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 04/10] drm/vkms: Move format arrays to vkms_plane.c Daniel Vetter
2019-06-12 13:39   ` Rodrigo Siqueira
2019-06-19  2:12   ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 05/10] drm/vkms: Add our own commit_tail Daniel Vetter
2019-06-06 22:27 ` [PATCH 06/10] drm/vkms: flush crc workers earlier in commit flow Daniel Vetter
2019-06-12 13:42   ` Rodrigo Siqueira
2019-06-13  7:53     ` Daniel Vetter
2019-06-13  7:55       ` Daniel Vetter
2019-06-18  2:31       ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 07/10] drm/vkms: Dont flush crc worker when we change crc status Daniel Vetter
2019-06-19  2:17   ` Rodrigo Siqueira
2019-06-19  7:47     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 08/10] drm/vkms: No _irqsave within spin_lock_irq needed Daniel Vetter
2019-06-12 13:43   ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 09/10] drm/vkms: totally reworked crc data tracking Daniel Vetter
2019-06-12 13:46   ` Rodrigo Siqueira
2019-06-13  7:59     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 10/10] drm/vkms: No need for ->pages_lock in crc work anymore Daniel Vetter
2019-06-12 13:47   ` Rodrigo Siqueira
2019-06-12 13:28 ` [PATCH 00/10] drm/vkms: rework crc worker Rodrigo Siqueira
2019-06-12 14:42   ` Daniel Vetter
2019-06-18  2:49     ` Rodrigo Siqueira
2019-06-18  8:56       ` Daniel Vetter
2019-06-18 21:54         ` Rodrigo Siqueira
2019-06-18 22:06           ` Daniel Vetter
2019-06-18 22:07             ` Daniel Vetter
2019-06-18 22:25               ` Rodrigo Siqueira
2019-06-18 22:39                 ` Daniel Vetter
2019-06-26  1:44             ` Rodrigo Siqueira
2019-06-26  7:54               ` Daniel Vetter
2019-06-26 13:46                 ` Rodrigo Siqueira
2019-07-01  3:30                 ` Rodrigo Siqueira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618023931.ccs3glcvh5hnm4m3@smtp.gmail.com \
    --to=rodrigosiqueiramelo@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hamohammed.sa@gmail.com \
    --cc=shayenneluzmoura@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.