From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753895AbdJaWLL (ORCPT ); Tue, 31 Oct 2017 18:11:11 -0400 Received: from mail-pf0-f172.google.com ([209.85.192.172]:55946 "EHLO mail-pf0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753486AbdJaWLK (ORCPT ); Tue, 31 Oct 2017 18:11:10 -0400 X-Google-Smtp-Source: ABhQp+R3PilC6nom8ZKFG518cmRGg0NiZspF/Tb9CCaWYUP56ZZU1DsRJ1UL98B3OiQAmjdx5p+jiQ== Date: Tue, 31 Oct 2017 15:11:07 -0700 From: Guenter Roeck To: Thomas Gleixner Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Don Zickus , Ingo Molnar Subject: Re: Crashes in perf_event_ctx_lock_nested Message-ID: <20171031221107.GA12133@roeck-us.net> References: <20171030224512.GA13592@roeck-us.net> <20171031134850.ynix2zqypmca2mtt@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: [ ...] > So we have to revert > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > Patch attached. > Tested-by: Guenter Roeck There is still a problem. When running echo 6 > /proc/sys/kernel/watchdog_thresh echo 5 > /proc/sys/kernel/watchdog_thresh repeatedly, the message NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. stops after a while (after ~10-30 iterations, with fluctuations). After adding trace messages into hardlockup_detector_perf_disable() and hardlockup_detector_perf_enable(), I see: hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3 hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2 hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1 hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0 ... hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2 hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1 hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0 hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1 ... hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6 hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5 hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4 hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3 Maybe watchdog_cpus needs to be atomic ? Guenter