From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755263AbdKAUbI (ORCPT ); Wed, 1 Nov 2017 16:31:08 -0400 Received: from terminus.zytor.com ([65.50.211.136]:52807 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755008AbdKAUbG (ORCPT ); Wed, 1 Nov 2017 16:31:06 -0400 Date: Wed, 1 Nov 2017 13:27:53 -0700 From: tip-bot for Thomas Gleixner Message-ID: Cc: hpa@zytor.com, tglx@linutronix.de, linux@roeck-us.net, dzickus@redhat.com, linux-kernel@vger.kernel.org, mingo@kernel.org, peterz@infradead.org Reply-To: tglx@linutronix.de, hpa@zytor.com, linux@roeck-us.net, dzickus@redhat.com, linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org In-Reply-To: References: To: linux-tip-commits@vger.kernel.org Subject: [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") Git-Commit-ID: 9c388a5ed1960b2ebbebd3dbe7553092b0c15ec1 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 9c388a5ed1960b2ebbebd3dbe7553092b0c15ec1 Gitweb: https://git.kernel.org/tip/9c388a5ed1960b2ebbebd3dbe7553092b0c15ec1 Author: Thomas Gleixner AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100 Committer: Thomas Gleixner CommitDate: Wed, 1 Nov 2017 21:18:39 +0100 watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") Guenter reported a crash in the watchdog/perf code, which is caused by cleanup() and enable() running concurrently. The reason for this is: The watchdog functions are serialized via the watchdog_mutex and cpu hotplug locking, but the enable of the perf based watchdog happens in context of the unpark callback of the smpboot thread. But that unpark function is not synchronous inside the locking. The unparking of the thread just wakes it up and leaves so there is no guarantee when the thread is executing. If it starts running _before_ the cleanup happened then it will create a event and overwrite the dead event pointer. The new event is then cleaned up because the event is marked dead. lock(watchdog_mutex); lockup_detector_reconfigure(); cpus_read_lock(); stop(); park() update(); start(); unpark() cpus_read_unlock(); thread runs() overwrite dead event ptr cleanup(); free new event, which is active inside perf.... unlock(watchdog_mutex); The park side is safe as that actually waits for the thread to reach parked state. Commit a33d44843d45 removed the protection against this kind of scenario under the stupid assumption that the hotplug serialization and the watchdog_mutex cover everything. Bring it back. Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") Reported-and-tested-by: Guenter Roeck Signed-off-by: Thomas Feels-stupid Gleixner Cc: Peter Zijlstra Cc: Don Zickus Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos --- kernel/watchdog_hld.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 71a62ce..a7f137c 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -21,6 +21,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn); static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); +static DEFINE_PER_CPU(struct perf_event *, dead_event); static struct cpumask dead_events_mask; static unsigned long hardlockup_allcpu_dumped; @@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void) if (event) { perf_event_disable(event); + this_cpu_write(watchdog_ev, NULL); + this_cpu_write(dead_event, event); cpumask_set_cpu(smp_processor_id(), &dead_events_mask); watchdog_cpus--; } @@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void) int cpu; for_each_cpu(cpu, &dead_events_mask) { - struct perf_event *event = per_cpu(watchdog_ev, cpu); + struct perf_event *event = per_cpu(dead_event, cpu); /* * Required because for_each_cpu() reports unconditionally @@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void) */ if (event) perf_event_release_kernel(event); - per_cpu(watchdog_ev, cpu) = NULL; + per_cpu(dead_event, cpu) = NULL; } cpumask_clear(&dead_events_mask); }