linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vince Weaver <vincent.weaver@maine.edu>
Cc: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Paul Mackerras <paulus@samba.org>
Subject: Re: x86_pmu_start WARN_ON.
Date: Mon, 17 Feb 2014 16:28:59 +0100	[thread overview]
Message-ID: <20140217152859.GF15586@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <alpine.DEB.2.10.1402131700090.24354@vincent-weaver-1.um.maine.edu>

On Thu, Feb 13, 2014 at 05:13:20PM -0500, Vince Weaver wrote:
> On Thu, 13 Feb 2014, Vince Weaver wrote:
> 
> > The plot thickens.  The WARN_ON is not caused by the cycles event that we 
> > open, but it's caused by the NMI Watchdog cycles event.
> 
> The WARN_ON_ONCE at line 1076 in perf_event.c is triggering because
> in x86_pmu_enable() is calling x86_pmu_start() for all of the active x86 
> events (three plus the NMI watchdog) the NMI watchdog is unexpectedly
> not having PERF_HES_STOPPED set (it's hw.state is 0).

Cute, that is indeed unexpected.

> What's the deal with the PERF_HES_STOPPED name?  Is HES an acronym?
> Or is it just a male event?  

Hardware Event State

> Also it's not really clear what PERF_HES_ARCH indicates.

I ran out of names it seems; its used in the reschedule case where we
relocate a stopped event. We save the STOPPED state into ARCH (because
we're going to destroy the STOPPED state by stopping everybody), so that
we know not to (re)enable the event when its on its new location.

The comment in x86_pmu_enable() near where we set ARCH was supposed to
communicate this.

> Things rapidly get complicated beyond that, as the NMI watchdog is a 
> kernel-created event bound to the CPU, wheras 2 of the events are x86 hw 
> events with a breakpoint-event groupleader (and the fact one of the events 
> has precise set).

For this code that _should_ not matter much; they're 3 events and we
mapped them into hardware counters.

So the precise has to run on cnt0 on Core2, the NMI is simple enough to
fit (and we prefer) fixed purpose counters. So there _should_ not be a
reshuffle.

Although I should probably assert these _should_ thingies.

> From the stacktrace it looks like it is the close of a completely 
> unrelated tracepoint event that triggers this all off, but I'm not
> sure why a list_del_event() call of the tracepoint name would
> trigger a schedule_timeout() and an ensuing __perf_event_task_sched_in()
> which is what eventually triggers the problem.

Enable CONFIG_FRAME_POINTER for better stack traces; I suspect the
list_del_event() is just random stack garbage. The path that makes sense
is:
  wait_rcu()->__wait_for_common()->schedule_timeout()

> Scattering printks around doesn't see to work because a lot of the related 
> calls are happening in contexts where printks don't really work.
> 
> Anyway I just wanted to summarize my findings as I might not have a chance 
> to look at this again for a while.  For completion I'm including the 
> backtrace below.

Sure, much appreciated. I'll go read up on the event schedule code, its
been a while since I stared at that in too much detail.

  reply	other threads:[~2014-02-17 15:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-30 19:02 x86_pmu_start WARN_ON Dave Jones
2014-02-10 21:26 ` Vince Weaver
2014-02-11 13:29   ` Peter Zijlstra
2014-02-12 21:04     ` Vince Weaver
2014-02-13 14:11       ` Vince Weaver
2014-02-13 17:35         ` Vince Weaver
2014-02-13 22:13           ` Vince Weaver
2014-02-17 15:28             ` Peter Zijlstra [this message]
2014-02-18 18:30               ` Vince Weaver
2014-02-18 22:20                 ` Vince Weaver
2014-02-19 10:19                   ` Peter Zijlstra
2014-02-19 22:34                     ` Vince Weaver
2014-02-20 10:08                       ` Peter Zijlstra
2014-02-20 15:47                         ` Andi Kleen
2014-02-20 15:54                           ` Peter Zijlstra
2014-02-20 17:31                             ` Andi Kleen
2014-02-20 18:15                               ` Peter Zijlstra
2014-02-20 18:23                                 ` Andi Kleen
2014-02-20 19:04                                 ` Steven Rostedt
2014-02-20 16:26                         ` Steven Rostedt
2014-02-20 17:00                           ` Peter Zijlstra
2014-02-20 17:43                             ` Steven Rostedt
2014-02-20 17:46                               ` Steven Rostedt
2014-02-20 18:18                                 ` Peter Zijlstra
2014-02-20 18:03                         ` Vince Weaver
2014-02-20 18:23                           ` Peter Zijlstra
2014-02-20 18:54                             ` Vince Weaver
2014-02-20 19:21                               ` Vince Weaver
2014-02-20 19:46                                 ` Vince Weaver
2014-02-21 14:37                                   ` Vince Weaver
2014-02-21 15:03                             ` Peter Zijlstra
2014-02-21 20:18                               ` Vince Weaver
2014-02-24 11:28                                 ` Peter Zijlstra
2014-02-26  5:59                                   ` Vince Weaver
2014-02-27 13:32                               ` [tip:perf/core] perf/x86: Fix event scheduling tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140217152859.GF15586@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulus@samba.org \
    --cc=vincent.weaver@maine.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).