From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:44042 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754524AbeA2UQY (ORCPT ); Mon, 29 Jan 2018 15:16:24 -0500 Date: Mon, 29 Jan 2018 15:32:20 +0100 From: Greg KH To: Sebastian Andrzej Siewior Cc: tglx@linutronix.de, anna-maria@linutronix.de, paulmck@linux.vnet.ibm.com, peterz@infradead.org, stable@vger.kernel.org Subject: Re: [PATCH] hrtimer: Reset hrtimer cpu base proper on CPU hotplug Message-ID: <20180129143220.GA11231@kroah.com> References: <151721332511213@kroah.com> <20180129142032.7zbpgk3zm4m3gmai@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180129142032.7zbpgk3zm4m3gmai@linutronix.de> Sender: stable-owner@vger.kernel.org List-ID: On Mon, Jan 29, 2018 at 03:20:32PM +0100, Sebastian Andrzej Siewior wrote: > From: Thomas Gleixner > > commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream. > > The hrtimer interrupt code contains a hang detection and mitigation > mechanism, which prevents that a long delayed hrtimer interrupt causes a > continous retriggering of interrupts which prevent the system from making > progress. If a hang is detected then the timer hardware is programmed with > a certain delay into the future and a flag is set in the hrtimer cpu base > which prevents newly enqueued timers from reprogramming the timer hardware > prior to the chosen delay. The subsequent hrtimer interrupt after the delay > clears the flag and resumes normal operation. > > If such a hang happens in the last hrtimer interrupt before a CPU is > unplugged then the hang_detected flag is set and stays that way when the > CPU is plugged in again. At that point the timer hardware is not armed and > it cannot be armed because the hang_detected flag is still active, so > nothing clears that flag. As a consequence the CPU does not receive hrtimer > interrupts and no timers expire on that CPU which results in RCU stalls and > other malfunctions. > > Clear the flag along with some other less critical members of the hrtimer > cpu base to ensure starting from a clean state when a CPU is plugged in. > > Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the > root cause of that hard to reproduce heisenbug. Once understood it's > trivial and certainly justifies a brown paperbag. > > Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") > Reported-by: Paul E. McKenney > Signed-off-by: Thomas Gleixner > Cc: Peter Zijlstra > Cc: Sebastian Sewior > Cc: Anna-Maria Gleixner > Cc: stable@vger.kernel.org > Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos > [bigeasy: backport to v3.18, drop ->next_timer it was introduced later] Thanks for the backport, now queued up. greg k-h