From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754791AbdKASXV (ORCPT ); Wed, 1 Nov 2017 14:23:21 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:50671 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751587AbdKASXU (ORCPT ); Wed, 1 Nov 2017 14:23:20 -0400 Date: Wed, 1 Nov 2017 19:22:54 +0100 (CET) From: Thomas Gleixner To: Guenter Roeck cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Don Zickus , Ingo Molnar Subject: Re: Crashes in perf_event_ctx_lock_nested In-Reply-To: <20171031221107.GA12133@roeck-us.net> Message-ID: References: <20171030224512.GA13592@roeck-us.net> <20171031134850.ynix2zqypmca2mtt@hirez.programming.kicks-ass.net> <20171031221107.GA12133@roeck-us.net> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 31 Oct 2017, Guenter Roeck wrote: > On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > > [ ...] > > > So we have to revert > > > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > > > Patch attached. > > > > Tested-by: Guenter Roeck > > There is still a problem. When running > > echo 6 > /proc/sys/kernel/watchdog_thresh > echo 5 > /proc/sys/kernel/watchdog_thresh > > repeatedly, the message > > NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. > > stops after a while (after ~10-30 iterations, with fluctuations). > After adding trace messages into hardlockup_detector_perf_disable() > and hardlockup_detector_perf_enable(), I see: > > hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3 > hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2 > hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1 > hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0 > ... > hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2 > hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1 > hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0 > hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1 > ... > hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6 > hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5 > hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4 > hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3 > > Maybe watchdog_cpus needs to be atomic ? Indeed. Thanks, tglx