From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751088AbdE3Rk0 (ORCPT <rfc822;w@1wt.eu>);
        Tue, 30 May 2017 13:40:26 -0400
Received: from merlin.infradead.org ([205.233.59.134]:51508 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750898AbdE3RkY (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 May 2017 13:40:24 -0400
Date: Tue, 30 May 2017 19:40:14 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Andi Kleen <ak@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>,
        Vince Weaver <vincent.weaver@maine.edu>,
        "Liang, Kan" <kan.liang@intel.com>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "alexander.shishkin@linux.intel.com" 
        <alexander.shishkin@linux.intel.com>,
        "acme@redhat.com" <acme@redhat.com>,
        "jolsa@redhat.com" <jolsa@redhat.com>,
        "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter
Message-ID: <20170530174014.zjauj22hx7avxqgf@hirez.programming.kicks-ass.net>
References: <CABPqkBTp5muPs32b7YVbfu57aEKv8aXMS+E08xgjxaOvY+B7wQ@mail.gmail.com>
 <20170523063913.363ssgcy7kmeesye@hirez.programming.kicks-ass.net>
 <CABPqkBTo=KC1Qp6vx272UJd2VdPOuX7O1B7J3aY2Y8srQaW-gg@mail.gmail.com>
 <20170524154518.GA24144@tassilo.jf.intel.com>
 <alpine.DEB.2.20.1705241158160.23659@macbook-air>
 <CABPqkBQq_ARmJ-WMk-SXwRguwPAgSHA4F8zhnbU3BWmAYZqo=w@mail.gmail.com>
 <20170530092523.xkuj5lqpq5pb5y4m@hirez.programming.kicks-ass.net>
 <20170530135128.GI24144@tassilo.jf.intel.com>
 <20170530162838.h5tzdnrxpy6upbka@hirez.programming.kicks-ass.net>
 <20170530172208.GL24144@tassilo.jf.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170530172208.GL24144@tassilo.jf.intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 30, 2017 at 10:22:08AM -0700, Andi Kleen wrote:
> > > You would only need a single one per system however, not one per CPU.
> > > RCU already tracks all the CPUs, all we need is a single NMI watchdog
> > > that makes sure RCU itself does not get stuck.
> > > 
> > > So we just have to find a single watchdog somewhere that can trigger
> > > NMI.
> > 
> > But then you have to IPI broadcast the NMI, which is less than ideal.
> 
> Only when the watchdog times out to print the backtraces.

The current NMI watchdog has a per-cpu state. So that means either doing
for_all_cpu() loops or IPI broadcasts from the NMI tickle. Neither is
something you really want.

> > RCU doesn't have that problem because the quiescent state is a global
> > thing. CPU progress, which is what the NMI watchdog tests, is very much
> > per logical CPU though.
> 
> RCU already has a CPU stall detector. It should work (and usually
> triggers before the NMI watchdog in my experience unless the
> whole system is dead)

It only goes look at CPU state once it detects the global QS is stalled
I think. But I've not had much luck with the RCU one -- although I think
its been improved since I last had a hard problem.