LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: mingo@kernel.org, linux-kernel@vger.kernel.org,
	peterz@infradead.org, tglx@linutronix.de, dzickus@redhat.com,
	hpa@zytor.com
Cc: linux-tip-commits@vger.kernel.org
Subject: Re: [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Date: Wed, 1 Nov 2017 13:32:45 -0700
Message-ID: <20171101203245.GA8089@roeck-us.net> (raw)
In-Reply-To: <tip-1c294733b7b9f712f78d15cfa75ffdea72b79abb@git.kernel.org>

On Wed, Nov 01, 2017 at 12:46:00PM -0700, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Gitweb:     https://git.kernel.org/tip/1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Wed, 1 Nov 2017 20:41:27 +0100
> 
> watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> 
> Guenter reported a crash in the watchdog/perf code, which is caused by
> cleanup() and enable() running concurrently. The reason for this is:
> 
> The watchdog functions are serialized via the watchdog_mutex and cpu
> hotplug locking, but the enable of the perf based watchdog happens in
> context of the unpark callback of the smpboot thread. But that unpark
> function is not synchronous inside the locking. The unparking of the thread
> just wakes it up and leaves so there is no guarantee when the thread is
> executing.
> 
> If it starts running _before_ the cleanup happened then it will create a
> event and overwrite the dead event pointer. The new event is then cleaned
> up because the event is marked dead.
> 
>     lock(watchdog_mutex);
>     lockup_detector_reconfigure();
>         cpus_read_lock();
> 	stop();
> 	   park()
> 	update();
> 	start();
> 	   unpark()
> 	cpus_read_unlock();		thread runs()
> 					  overwrite dead event ptr
> 	cleanup();
> 	  free new event, which is active inside perf....
>     unlock(watchdog_mutex);
> 
> The park side is safe as that actually waits for the thread to reach
> parked state.
> 
> Commit a33d44843d45 removed the protection against this kind of scenario
> under the stupid assumption that the hotplug serialization and the
> watchdog_mutex cover everything. 
> 
> Bring it back.
> 
> Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Don Zickus <dzickus@redhat.com>
> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
> 
> ---
>  kernel/watchdog_hld.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 71a62ce..f8db56b 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -21,6 +21,7 @@
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct perf_event *, dead_event);
>  static struct cpumask dead_events_mask;
>  
>  static unsigned long hardlockup_allcpu_dumped;
> @@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
>  
>  	if (event) {
>  		perf_event_disable(event);
> +		this_cpu_write(watchdog_ev, NULL);
> +		this_cpu_write(dead_event, event);
>  		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
>  		watchdog_cpus--;
>  	}
> @@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
>  	int cpu;
>  
>  	for_each_cpu(cpu, &dead_events_mask) {
> -		struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +		struct perf_event *event = per_cpu(dead_event, cpu);
>  
>  		/*
>  		 * Required because for_each_cpu() reports  unconditionally
> @@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
>  		 */
>  		if (event)
>  			perf_event_release_kernel(event);
> -		per_cpu(watchdog_ev, cpu) = NULL;
> +		per_cpu(dead_event_ev, cpu) = NULL;

Uuhh ... there is an extra _ev which somehow slipped in and doesn't
compile.

Guenter

>  	}
>  	cpumask_clear(&dead_events_mask);
>  }

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-30 22:45 Crashes in perf_event_ctx_lock_nested Guenter Roeck
2017-10-31 13:48 ` Peter Zijlstra
2017-10-31 17:16   ` Guenter Roeck
2017-10-31 18:50     ` Don Zickus
2017-10-31 20:12       ` Guenter Roeck
2017-10-31 20:23         ` Don Zickus
2017-10-31 21:32   ` Thomas Gleixner
2017-10-31 22:11     ` Guenter Roeck
2017-11-01 18:11       ` Don Zickus
2017-11-01 18:34         ` Guenter Roeck
2017-11-01 19:46         ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus
2017-11-01 20:28         ` tip-bot for Don Zickus
2017-11-01 18:22       ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner
2017-11-01  8:14     ` Peter Zijlstra
2017-11-01  8:26       ` Thomas Gleixner
2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
2017-11-01 20:32       ` Guenter Roeck [this message]
2017-11-01 20:52         ` Thomas Gleixner
2017-11-01 20:27     ` tip-bot for Thomas Gleixner
2017-10-31 18:48 ` Crashes in perf_event_ctx_lock_nested Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171101203245.GA8089@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git