From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755280AbdKAUct (ORCPT ); Wed, 1 Nov 2017 16:32:49 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:48559 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755025AbdKAUcs (ORCPT ); Wed, 1 Nov 2017 16:32:48 -0400 X-Google-Smtp-Source: ABhQp+SdXiajViOqvNdNOtzSNgqtXeM5Q7ambgUStTOuAdtzXkRynUKP9hCnT0oaqjEy5xR4oWkfyQ== Date: Wed, 1 Nov 2017 13:32:45 -0700 From: Guenter Roeck To: mingo@kernel.org, linux-kernel@vger.kernel.org, peterz@infradead.org, tglx@linutronix.de, dzickus@redhat.com, hpa@zytor.com Cc: linux-tip-commits@vger.kernel.org Subject: Re: [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") Message-ID: <20171101203245.GA8089@roeck-us.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 01, 2017 at 12:46:00PM -0700, tip-bot for Thomas Gleixner wrote: > Commit-ID: 1c294733b7b9f712f78d15cfa75ffdea72b79abb > Gitweb: https://git.kernel.org/tip/1c294733b7b9f712f78d15cfa75ffdea72b79abb > Author: Thomas Gleixner > AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100 > Committer: Thomas Gleixner > CommitDate: Wed, 1 Nov 2017 20:41:27 +0100 > > watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > Guenter reported a crash in the watchdog/perf code, which is caused by > cleanup() and enable() running concurrently. The reason for this is: > > The watchdog functions are serialized via the watchdog_mutex and cpu > hotplug locking, but the enable of the perf based watchdog happens in > context of the unpark callback of the smpboot thread. But that unpark > function is not synchronous inside the locking. The unparking of the thread > just wakes it up and leaves so there is no guarantee when the thread is > executing. > > If it starts running _before_ the cleanup happened then it will create a > event and overwrite the dead event pointer. The new event is then cleaned > up because the event is marked dead. > > lock(watchdog_mutex); > lockup_detector_reconfigure(); > cpus_read_lock(); > stop(); > park() > update(); > start(); > unpark() > cpus_read_unlock(); thread runs() > overwrite dead event ptr > cleanup(); > free new event, which is active inside perf.... > unlock(watchdog_mutex); > > The park side is safe as that actually waits for the thread to reach > parked state. > > Commit a33d44843d45 removed the protection against this kind of scenario > under the stupid assumption that the hotplug serialization and the > watchdog_mutex cover everything. > > Bring it back. > > Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > Reported-and-tested-by: Guenter Roeck > Signed-off-by: Thomas Feels-stupid Gleixner > Cc: Peter Zijlstra > Cc: Don Zickus > Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos > > --- > kernel/watchdog_hld.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c > index 71a62ce..f8db56b 100644 > --- a/kernel/watchdog_hld.c > +++ b/kernel/watchdog_hld.c > @@ -21,6 +21,7 @@ > static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > +static DEFINE_PER_CPU(struct perf_event *, dead_event); > static struct cpumask dead_events_mask; > > static unsigned long hardlockup_allcpu_dumped; > @@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void) > > if (event) { > perf_event_disable(event); > + this_cpu_write(watchdog_ev, NULL); > + this_cpu_write(dead_event, event); > cpumask_set_cpu(smp_processor_id(), &dead_events_mask); > watchdog_cpus--; > } > @@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void) > int cpu; > > for_each_cpu(cpu, &dead_events_mask) { > - struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct perf_event *event = per_cpu(dead_event, cpu); > > /* > * Required because for_each_cpu() reports unconditionally > @@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void) > */ > if (event) > perf_event_release_kernel(event); > - per_cpu(watchdog_ev, cpu) = NULL; > + per_cpu(dead_event_ev, cpu) = NULL; Uuhh ... there is an extra _ev which somehow slipped in and doesn't compile. Guenter > } > cpumask_clear(&dead_events_mask); > }