From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751088AbdE3Rk0 (ORCPT ); Tue, 30 May 2017 13:40:26 -0400 Received: from merlin.infradead.org ([205.233.59.134]:51508 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750898AbdE3RkY (ORCPT ); Tue, 30 May 2017 13:40:24 -0400 Date: Tue, 30 May 2017 19:40:14 +0200 From: Peter Zijlstra To: Andi Kleen Cc: Stephane Eranian , Vince Weaver , "Liang, Kan" , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , "alexander.shishkin@linux.intel.com" , "acme@redhat.com" , "jolsa@redhat.com" , "torvalds@linux-foundation.org" , "tglx@linutronix.de" Subject: Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter Message-ID: <20170530174014.zjauj22hx7avxqgf@hirez.programming.kicks-ass.net> References: <20170523063913.363ssgcy7kmeesye@hirez.programming.kicks-ass.net> <20170524154518.GA24144@tassilo.jf.intel.com> <20170530092523.xkuj5lqpq5pb5y4m@hirez.programming.kicks-ass.net> <20170530135128.GI24144@tassilo.jf.intel.com> <20170530162838.h5tzdnrxpy6upbka@hirez.programming.kicks-ass.net> <20170530172208.GL24144@tassilo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170530172208.GL24144@tassilo.jf.intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 30, 2017 at 10:22:08AM -0700, Andi Kleen wrote: > > > You would only need a single one per system however, not one per CPU. > > > RCU already tracks all the CPUs, all we need is a single NMI watchdog > > > that makes sure RCU itself does not get stuck. > > > > > > So we just have to find a single watchdog somewhere that can trigger > > > NMI. > > > > But then you have to IPI broadcast the NMI, which is less than ideal. > > Only when the watchdog times out to print the backtraces. The current NMI watchdog has a per-cpu state. So that means either doing for_all_cpu() loops or IPI broadcasts from the NMI tickle. Neither is something you really want. > > RCU doesn't have that problem because the quiescent state is a global > > thing. CPU progress, which is what the NMI watchdog tests, is very much > > per logical CPU though. > > RCU already has a CPU stall detector. It should work (and usually > triggers before the NMI watchdog in my experience unless the > whole system is dead) It only goes look at CPU state once it detects the global QS is stalled I think. But I've not had much luck with the RCU one -- although I think its been improved since I last had a hard problem.