LKML Archive on lore.kernel.org
 help / color / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] hrtimer: Reset hrtimer cpu base proper on CPU hotplug
Date: Fri, 26 Jan 2018 14:09:17 -0800
Message-ID: <20180126220917.GI3741@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1801261447590.2067@nanos>

On Fri, Jan 26, 2018 at 02:54:32PM +0100, Thomas Gleixner wrote:
> The hrtimer interrupt code contains a hang detection and mitigation
> mechanism, which prevents that a long delayed hrtimer interrupt causes a
> continous retriggering of interrupts which prevent the system from making
> progress. If a hang is detected then the timer hardware is programmed with
> a certain delay into the future and a flag is set in the hrtimer cpu base
> which prevents newly enqueued timers from reprogramming the timer hardware
> prior to the chosen delay. The subsequent hrtimer interrupt after the delay
> clears the flag and resumes normal operation.
> 
> If such a hang happens in the last hrtimer interrupt before a CPU is
> unplugged then the hang_detected flag is set and stays that way when the
> CPU is plugged in again. At that point the timer hardware is not armed and
> it cannot be armed because the hang_detected flag is still active, so
> nothing clears that flag. As a consequence the CPU does not receive hrtimer
> interrupts and no timers expire on that CPU which results in RCU stalls and
> other malfunctions.
> 
> Clear the flag along with some other less critical members of the hrtimer
> cpu base to ensure starting from a clean state when a CPU is plugged in.
> 
> Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
> root cause of that hard to reproduce heisenbug. Once understood it's
> trivial and certainly justifies a brown paperbag.

Thank you very much, and I do know that feeling!  After reading the
commit log, I feel significantly less incompetent for having failed to
find this one.  ;-)  But it did pass rcutorture testing for a great many
years, didn't it?  :-/

I have started an eight-hour seven-way test on the dreaded rcutorture
TREE01 scenario.  In the meantime, off to the train!

							Thanx, Paul

> Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
>  kernel/time/hrtimer.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -655,7 +655,9 @@ static void hrtimer_reprogram(struct hrt
>  static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
>  {
>  	base->expires_next = KTIME_MAX;
> +	base->hang_detected = 0;
>  	base->hres_active = 0;
> +	base->next_timer = NULL;
>  }
> 
>  /*
> @@ -1589,6 +1591,7 @@ int hrtimers_prepare_cpu(unsigned int cp
>  		timerqueue_init_head(&cpu_base->clock_base[i].active);
>  	}
> 
> +	cpu_base->active_bases = 0;
>  	cpu_base->cpu = cpu;
>  	hrtimer_init_hres(cpu_base);
>  	return 0;
> 

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-26 13:54 Thomas Gleixner
2018-01-26 22:09 ` Paul E. McKenney [this message]
2018-01-28  0:53   ` Paul E. McKenney
2018-01-29  8:20   ` Sebastian Sewior
2018-01-29  9:57     ` Paul E. McKenney
2018-01-29 23:43       ` Paul E. McKenney
2018-01-30 21:03         ` Thomas Gleixner
2018-01-31  0:52           ` Paul E. McKenney
2018-01-27 14:31 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180126220917.GI3741@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git