Re: [PATCH] perf/core: fix multiplexing event scheduling issue

From: Stephane Eranian <eranian@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	mingo@elte.hu, Arnaldo Carvalho de Melo <acme@redhat.com>,
	Jiri Olsa <jolsa@redhat.com>, "Liang, Kan" <kan.liang@intel.com>,
	Song Liu <songliubraving@fb.com>, Ian Rogers <irogers@google.com>
Subject: Re: [PATCH] perf/core: fix multiplexing event scheduling issue
Date: Wed, 23 Oct 2019 00:06:43 -0700	[thread overview]
Message-ID: <CABPqkBRgBegcdNHtXUqkdfJUASjuUYnSkh_cNeqfoO4wF7tnFQ@mail.gmail.com> (raw)
In-Reply-To: <20191021100558.GC1800@hirez.programming.kicks-ass.net>

On Mon, Oct 21, 2019 at 3:06 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 17, 2019 at 05:27:46PM -0700, Stephane Eranian wrote:
> > @@ -2153,6 +2157,7 @@ __perf_remove_from_context(struct perf_event *event,
> >                          void *info)
> >  {
> >       unsigned long flags = (unsigned long)info;
> > +     int was_necessary = ctx->rotate_necessary;
> >
> >       if (ctx->is_active & EVENT_TIME) {
> >               update_context_time(ctx);
> > @@ -2171,6 +2176,37 @@ __perf_remove_from_context(struct perf_event *event,
> >                       cpuctx->task_ctx = NULL;
> >               }
> >       }
> > +
> > +     /*
> > +      * sanity check that event_sched_out() does not and will not
> > +      * change the state of ctx->rotate_necessary
> > +      */
> > +     WARN_ON(was_necessary != event->ctx->rotate_necessary);
>
> It doesn't... why is this important to check?
>
I can remove that. It is leftover from debugging. It is okay to look
at the situation
after event_sched_out(). Today, it does not change rotate_necessary.

> > +     /*
> > +      * if we remove an event AND we were multiplexing then, that means
> > +      * we had more events than we have counters for, and thus, at least,
> > +      * one event was in INACTIVE state. Now, that we removed an event,
> > +      * we need to resched to give a chance to all events to get scheduled,
> > +      * otherwise some may get stuck.
> > +      *
> > +      * By the time this function is called the event is usually in the OFF
> > +      * state.
> > +      * Note that this is not a duplicate of the same code in _perf_event_disable()
> > +      * because the call path are different. Some events may be simply disabled
>
> It is the exact same code twice though; IIRC this C language has a
> feature to help with that.

Sure! I will make a function to check on the condition.

>
> > +      * others removed. There is a way to get removed and not be disabled first.
> > +      */
> > +     if (ctx->rotate_necessary && ctx->nr_events) {
> > +             int type = get_event_type(event);
> > +             /*
> > +              * In case we removed a pinned event, then we need to
> > +              * resched for both pinned and flexible events. The
> > +              * opposite is not true. A pinned event can never be
> > +              * inactive due to multiplexing.
> > +              */
> > +             if (type & EVENT_PINNED)
> > +                     type |= EVENT_FLEXIBLE;
> > +             ctx_resched(cpuctx, cpuctx->task_ctx, type);
> > +     }
>
> What you're relying on is that ->rotate_necessary implies ->is_active
> and there's pending events. And if we tighten ->rotate_necessary you can
> remove the && ->nr_events.
>
Imagine I have 6 events and 4 counters and I do delete them all before
the timer expires.
Then, I can be in a situation where rotate_necessary is still true and
yet have no more events
in the context. That is because only ctx_sched_out() clears
rotate_necessary, IIRC. So that
is why there is the && nr_events. Now, calling ctx_resched() with no
events wouldn't probably
cause any harm, just wasted work.  So if by tightening, I am guessing
you mean clearing
rotate_necessary earlier. But that would be tricky because the only
reliable way of clearing
it is when you know you are about the reschedule everything. Removing
an event by itself
may not be enough to eliminate multiplexing.

> > @@ -2232,6 +2270,35 @@ static void __perf_event_disable(struct perf_event *event,
> >               event_sched_out(event, cpuctx, ctx);
> >
> >       perf_event_set_state(event, PERF_EVENT_STATE_OFF);
> > +     /*
> > +      * sanity check that event_sched_out() does not and will not
> > +      * change the state of ctx->rotate_necessary
> > +      */
> > +     WARN_ON_ONCE(was_necessary != event->ctx->rotate_necessary);
> > +
> > +     /*
> > +      * if we disable an event AND we were multiplexing then, that means
> > +      * we had more events than we have counters for, and thus, at least,
> > +      * one event was in INACTIVE state. Now, that we disabled an event,
> > +      * we need to resched to give a chance to all events to be scheduled,
> > +      * otherwise some may get stuck.
> > +      *
> > +      * Note that this is not a duplicate of the same code in
> > +      * __perf_remove_from_context()
> > +      * because events can be disabled without being removed.
>
> It _IS_ a duplicate, it is the _exact_ same code twice. What you're
> trying to say is that we need it in both places, but that's something
> else entirely.
>

Will refactor.

> > +      */
> > +     if (ctx->rotate_necessary && ctx->nr_events) {
> > +             int type = get_event_type(event);
> > +             /*
> > +              * In case we removed a pinned event, then we need to
> > +              * resched for both pinned and flexible events. The
> > +              * opposite is not true. A pinned event can never be
> > +              * inactive due to multiplexing.
> > +              */
> > +             if (type & EVENT_PINNED)
> > +                     type |= EVENT_FLEXIBLE;
> > +             ctx_resched(cpuctx, cpuctx->task_ctx, type);
> > +     }
> >  }
>
>