All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@intel.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Haneen Mohammed <hamohammed.sa@gmail.com>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	Shayenne Moura <shayenneluzmoura@gmail.com>
Subject: Re: [PATCH 01/10] drm/vkms: Fix crc worker races
Date: Mon, 17 Jun 2019 23:39:31 -0300	[thread overview]
Message-ID: <20190618023931.ccs3glcvh5hnm4m3@smtp.gmail.com> (raw)
In-Reply-To: <20190612144849.GP2458@phenom.ffwll.local>

On 06/12, Daniel Vetter wrote:
> On Wed, Jun 12, 2019 at 10:33:11AM -0300, Rodrigo Siqueira wrote:
> > On Thu, Jun 6, 2019 at 7:28 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > The issue we have is that the crc worker might fall behind. We've
> > > tried to handle this by tracking both the earliest frame for which it
> > > still needs to compute a crc, and the last one. Plus when the
> > > crtc_state changes, we have a new work item, which are all run in
> > > order due to the ordered workqueue we allocate for each vkms crtc.
> > >
> > > Trouble is there's been a few small issues in the current code:
> > > - we need to capture frame_end in the vblank hrtimer, not in the
> > >   worker. The worker might run much later, and then we generate a lot
> > >   of crc for which there's already a different worker queued up.
> > > - frame number might be 0, so create a new crc_pending boolean to
> > >   track this without confusion.
> > > - we need to atomically grab frame_start/end and clear it, so do that
> > >   all in one go. This is not going to create a new race, because if we
> > >   race with the hrtimer then our work will be re-run.
> > > - only race that can happen is the following:
> > >   1. worker starts
> > >   2. hrtimer runs and updates frame_end
> > >   3. worker grabs frame_start/end, already reading the new frame_end,
> > >   and clears crc_pending
> > >   4. hrtimer calls queue_work()
> > >   5. worker completes
> > >   6. worker gets  re-run, crc_pending is false
> > >   Explain this case a bit better by rewording the comment.
> > >
> > > v2: Demote warning level output to debug when we fail to requeue, this
> > > is expected under high load when the crc worker can't quite keep up.
> > >
> > > Cc: Shayenne Moura <shayenneluzmoura@gmail.com>
> > > Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_crc.c  | 27 +++++++++++++--------------
> > >  drivers/gpu/drm/vkms/vkms_crtc.c |  9 +++++++--
> > >  drivers/gpu/drm/vkms/vkms_drv.h  |  2 ++
> > >  3 files changed, 22 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_crc.c b/drivers/gpu/drm/vkms/vkms_crc.c
> > > index d7b409a3c0f8..66603da634fe 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_crc.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_crc.c
> > > @@ -166,16 +166,24 @@ void vkms_crc_work_handle(struct work_struct *work)
> > >         struct drm_plane *plane;
> > >         u32 crc32 = 0;
> > >         u64 frame_start, frame_end;
> > > +       bool crc_pending;
> > >         unsigned long flags;
> > >
> > >         spin_lock_irqsave(&out->state_lock, flags);
> > >         frame_start = crtc_state->frame_start;
> > >         frame_end = crtc_state->frame_end;
> > > +       crc_pending = crtc_state->crc_pending;
> > > +       crtc_state->frame_start = 0;
> > > +       crtc_state->frame_end = 0;
> > > +       crtc_state->crc_pending = false;
> > >         spin_unlock_irqrestore(&out->state_lock, flags);
> > >
> > > -       /* _vblank_handle() hasn't updated frame_start yet */
> > > -       if (!frame_start || frame_start == frame_end)
> > > -               goto out;
> > > +       /*
> > > +        * We raced with the vblank hrtimer and previous work already computed
> > > +        * the crc, nothing to do.
> > > +        */
> > > +       if (!crc_pending)
> > > +               return;
> > 
> > I think this condition is not reachable because crc_pending will be
> > filled with true in `vkms_vblank_simulate()` which in turn schedule
> > the function `vkms_crc_work_handle()`. Just for checking, I tried to
> > reach this condition by running kms_flip, kms_pipe_crc_basic, and
> > kms_cursor_crc with two different VM setups[1], but I couldn't reach
> > it. What do you think?
> 
> thread A			thread B
> 1. run vblank hrtimer
> 
> 				2. starts running crc work (from previous
> 				vblank)
> 
> 3. spin_lock()			-> gets stalled on the spin_lock() because
> 				   thread A has it already
> 
> 4. update frame_end (only in
> later patches, atm this is
> impossible). crc_pending is set
> already.
> 
> 5. schedule_work: since the work
> is running already, this means it
> is scheduled to run once more.
> 
> 6. spin_unlock
> 
> 				7. compute crc, clear crc_pending
> 				8. work finishes
> 				9. work gets run again
> 				8. crc_pending=false
> 
> Since the spin_lock critical section is _very_ short (less than 1 usec I
> bet), this race is very hard to hit.

First of all, thank you very much for all of your detailed explanation
and sorry for my delay to reply, I was 'processing' all of your
comments. I believe that I understood the issues related with this
patchset, I just want to check with you if the diagram and the cases
below make sense:

timer   |------|------|------|------|------|...

Case 1:        +----x +---x  +-----x

Case 2:     A  +----------x
            B         +----x

At the top of this diagram, I illustrated the vblank period along the
time. In the bottom lines, I highlighted two cases; the '+' represents
when the worker is queued (queue_work()), and the 'x' denotes when the
CRC work finishes its data processing. Before describing each case from
the diagram, I want to highlight that I'm focused on these two snippets
of code:

static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) {
    [..]
		spin_lock(&output->crc_lock);
      [..] data [..]
		spin_unlock(&output->crc_lock);
    [..]
}

void vkms_crc_work_handle(struct work_struct *work){
  [..]
  spin_lock_irq(&out->crc_lock);
  crtc_state->crc_pending = false;
    [..] data [..]
	spin_unlock_irq(&output->crc_lock);
  [..]
}

Cases:

1) This is the best scenario; each CRC worker finishes before the next
vblank.

2) In this scenario, one of the CRC workers extends along multiple
vblanks.  If worker A already collected the sensitive data inside
vkms_crc_work_handle(), worker A and B will finish without problems
(thanks to your changes). However, if for any reason, the worker A did
not start before the worker B, the new work will take care of its own
CRC and the CRC from worker A. Finally, since worker B will set
crc_pending equal false when the worker A starts, it'll just return
because of the following code:

if (!crc_pending)
  return;

Make sense?
 
> Exercise: Figure out why schedule_work _must_ schedule the work item to
> re-run if it's running already. If it doesn't do that there's another
> race.
> 
> > 
> > [1] Qemu parameters
> > VM1: -m 1G -smp cores=2,cpus=2
> > VM2: -enable-kvm -m 2G -smp cores=4,cpus=4
> > 
> > >         drm_for_each_plane(plane, &vdev->drm) {
> > >                 struct vkms_plane_state *vplane_state;
> > > @@ -196,20 +204,11 @@ void vkms_crc_work_handle(struct work_struct *work)
> > >         if (primary_crc)
> > >                 crc32 = _vkms_get_crc(primary_crc, cursor_crc);
> > >
> > > -       frame_end = drm_crtc_accurate_vblank_count(crtc);
> > > -
> > > -       /* queue_work can fail to schedule crc_work; add crc for
> > > -        * missing frames
> > > +       /*
> > > +        * The worker can fall behind the vblank hrtimer, make sure we catch up.
> > >          */
> > >         while (frame_start <= frame_end)
> > >                 drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
> > 
> > I want to take this opportunity to ask about this while; It's not
> > really specific to this patch.
> > 
> > I have to admit that I never fully got the idea behind this 'while';
> > it looks like that we just fill out the missed frames with a repeated
> > value. FWIU, `drm_crtc_add_crc_entry()` will add an entry with the CRC
> > information for a frame, but in this case, we are adding the same CRC
> > for a different set of frames. I agree that near frame has a similar
> > CRC value, but could we rely on this all the time? What could happen
> > if we have a great difference from the frame_start and frame_end?
> 
> It's a cheap trick for slow cpu: If the crc work gets behind the vblank
> hrtimer, we need to somehow catch up. With real hw this is not possible,
> but with vkms we simulate the hw. The only quick way to catch up is to
> fill out the same crc for everything. It's a lie, it will make some
> kms_atomic tests fail, but it's the only thing we can really do. Aside
> from trying to make the crc computation code as fast as possible.
> -Daniel
> 
> > 
> > > -
> > > -out:
> > > -       /* to avoid using the same value for frame number again */
> > > -       spin_lock_irqsave(&out->state_lock, flags);
> > > -       crtc_state->frame_end = frame_end;
> > > -       crtc_state->frame_start = 0;
> > > -       spin_unlock_irqrestore(&out->state_lock, flags);
> > >  }
> > >
> > >  static int vkms_crc_parse_source(const char *src_name, bool *enabled)
> > > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > index 1bbe099b7db8..c727d8486e97 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > @@ -30,13 +30,18 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
> > >                  * has read the data
> > >                  */
> > >                 spin_lock(&output->state_lock);
> > > -               if (!state->frame_start)
> > > +               if (!state->crc_pending)
> > >                         state->frame_start = frame;
> > > +               else
> > > +                       DRM_DEBUG_DRIVER("crc worker falling behind, frame_start: %llu, frame_end: %llu\n",
> > > +                                        state->frame_start, frame);
> > > +               state->frame_end = frame;
> > > +               state->crc_pending = true;
> > >                 spin_unlock(&output->state_lock);
> > >
> > >                 ret = queue_work(output->crc_workq, &state->crc_work);
> > >                 if (!ret)
> > > -                       DRM_WARN("failed to queue vkms_crc_work_handle");
> > > +                       DRM_DEBUG_DRIVER("vkms_crc_work_handle already queued\n");
> > >         }
> > >
> > >         spin_unlock(&output->lock);
> > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > index 81f1cfbeb936..3c7e06b19efd 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > @@ -56,6 +56,8 @@ struct vkms_plane_state {
> > >  struct vkms_crtc_state {
> > >         struct drm_crtc_state base;
> > >         struct work_struct crc_work;
> > > +
> > > +       bool crc_pending;
> > >         u64 frame_start;
> > >         u64 frame_end;
> > >  };
> > > --
> > > 2.20.1
> > >
> > 
> > 
> > -- 
> > 
> > Rodrigo Siqueira
> > https://siqueira.tech
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Rodrigo Siqueira
https://siqueira.tech
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2019-06-18  2:39 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 22:27 [PATCH 00/10] drm/vkms: rework crc worker Daniel Vetter
2019-06-06 22:27 ` [PATCH 01/10] drm/vkms: Fix crc worker races Daniel Vetter
2019-06-12 13:33   ` Rodrigo Siqueira
2019-06-12 14:48     ` Daniel Vetter
2019-06-18  2:39       ` Rodrigo Siqueira [this message]
2019-06-18  8:49         ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 02/10] drm/vkms: Use spin_lock_irq in process context Daniel Vetter
2019-06-12 13:34   ` Rodrigo Siqueira
2019-06-12 14:54     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 03/10] drm/vkms: Rename vkms_output.state_lock to crc_lock Daniel Vetter
2019-06-12 13:38   ` Rodrigo Siqueira
2019-06-13  7:48     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 04/10] drm/vkms: Move format arrays to vkms_plane.c Daniel Vetter
2019-06-12 13:39   ` Rodrigo Siqueira
2019-06-19  2:12   ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 05/10] drm/vkms: Add our own commit_tail Daniel Vetter
2019-06-06 22:27 ` [PATCH 06/10] drm/vkms: flush crc workers earlier in commit flow Daniel Vetter
2019-06-12 13:42   ` Rodrigo Siqueira
2019-06-13  7:53     ` Daniel Vetter
2019-06-13  7:55       ` Daniel Vetter
2019-06-18  2:31       ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 07/10] drm/vkms: Dont flush crc worker when we change crc status Daniel Vetter
2019-06-19  2:17   ` Rodrigo Siqueira
2019-06-19  7:47     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 08/10] drm/vkms: No _irqsave within spin_lock_irq needed Daniel Vetter
2019-06-12 13:43   ` Rodrigo Siqueira
2019-06-06 22:27 ` [PATCH 09/10] drm/vkms: totally reworked crc data tracking Daniel Vetter
2019-06-12 13:46   ` Rodrigo Siqueira
2019-06-13  7:59     ` Daniel Vetter
2019-06-06 22:27 ` [PATCH 10/10] drm/vkms: No need for ->pages_lock in crc work anymore Daniel Vetter
2019-06-12 13:47   ` Rodrigo Siqueira
2019-06-12 13:28 ` [PATCH 00/10] drm/vkms: rework crc worker Rodrigo Siqueira
2019-06-12 14:42   ` Daniel Vetter
2019-06-18  2:49     ` Rodrigo Siqueira
2019-06-18  8:56       ` Daniel Vetter
2019-06-18 21:54         ` Rodrigo Siqueira
2019-06-18 22:06           ` Daniel Vetter
2019-06-18 22:07             ` Daniel Vetter
2019-06-18 22:25               ` Rodrigo Siqueira
2019-06-18 22:39                 ` Daniel Vetter
2019-06-26  1:44             ` Rodrigo Siqueira
2019-06-26  7:54               ` Daniel Vetter
2019-06-26 13:46                 ` Rodrigo Siqueira
2019-07-01  3:30                 ` Rodrigo Siqueira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618023931.ccs3glcvh5hnm4m3@smtp.gmail.com \
    --to=rodrigosiqueiramelo@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hamohammed.sa@gmail.com \
    --cc=shayenneluzmoura@gmail.com \
    --subject='Re: [PATCH 01/10] drm/vkms: Fix crc worker races' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.