From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752916AbbBSRvz (ORCPT ); Thu, 19 Feb 2015 12:51:55 -0500 Received: from relais.videotron.ca ([24.201.245.36]:51108 "EHLO relais.videotron.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750856AbbBSRvy (ORCPT ); Thu, 19 Feb 2015 12:51:54 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: TEXT/PLAIN; CHARSET=US-ASCII Date: Thu, 19 Feb 2015 12:51:52 -0500 (EST) From: Nicolas Pitre To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, rjw@rjwysocki.net, tglx@linutronix.de, Preeti U Murthy Subject: Re: [PATCH 32/35] clockevents: Fix cpu down race for hrtimer based broadcasting In-reply-to: <20150216122413.880378334@infradead.org> Message-id: References: <20150216121435.203983131@infradead.org> <20150216122413.880378334@infradead.org> User-Agent: Alpine 2.11 (LFD 23 2013-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Feb 2015, Peter Zijlstra wrote: > From: Thomas Gleixner > > Preeti reported a cpu down race with hrtimer based broadcasting: > > Assume CPU1 is the CPU which holds the hrtimer broadcasting duty > before it is taken down. > > CPU0 CPU1 > cpu_down() > takedown_cpu() > disable_interrupts() > cpu_die() > while (CPU1 != DEAD) { > msleep(100); > switch_to_idle() > stop_cpu_timer() > schedule_broadcast() > } > > tick_cleanup_dead_cpu() > take_over_broadcast() > > So after CPU1 disabled interrupts it cannot handle the broadcast > hrtimer anymore, so CPU0 will be stuck forever. > > Doing a "while (CPU1 != DEAD) msleep(100);" periodic poll is silly at > best, but we need to fix that nevertheless. > > Split the tick cleanup into two pieces: > > 1) Shutdown and remove all per cpu clockevent devices from > takedown_cpu() > > This is done carefully with respect to existing arch code which > works around the shortcoming of the clockevents core code in > interesting ways. We really want a separate callback for this to > cleanup the workarounds, but that's not scope of this patch > > 2) Takeover the broadcast duty explicitely before calling cpu_die() > > This is a temporary workaround as well. What we really want is a > callback in the clockevent device which allows us to do that from > the dying CPU by pushing the hrtimer onto a different cpu. That > might involve an IPI and is definitely more complex than this > immediate fix. > > Reported-by: Preeti U Murthy > Signed-off-by: Thomas Gleixner This breaks the b.L switcher disabling code which essentially does: static void bL_switcher_restore_cpus(void) { int i; for_each_cpu(i, &bL_switcher_removed_logical_cpus) { struct device *cpu_dev = get_cpu_device(i); int ret = device_online(cpu_dev); if (ret) dev_err(cpu_dev, "switcher: unable to restore CPU\n"); } } However, as soon as one new CPU becomes online, the following crash occurs on that CPU: [ 547.858031] ------------[ cut here ]------------ [ 547.871868] kernel BUG at kernel/time/hrtimer.c:1249! [ 547.886991] Internal error: Oops - BUG: 0 [#1] SMP THUMB2 [ 547.903155] Modules linked in: [ 547.912303] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0-rc5-00058-gdd7a65fbc5 #527 [...] [ 548.599060] [] (hrtimer_interrupt) from [] (tick_do_broadcast.constprop.8+0x8f/0x90) [ 548.627482] [] (tick_do_broadcast.constprop.8) from [] (tick_handle_oneshot_broadcast+0xf1/0x168) [ 548.659290] [] (tick_handle_oneshot_broadcast) from [] (sp804_timer_interrupt+0x2b/0x30) [ 548.688755] [] (sp804_timer_interrupt) from [] (handle_irq_event_percpu+0x37/0x130) [ 548.716916] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x33/0x48) [ 548.743511] [] (handle_irq_event) from [] (handle_fasteoi_irq+0x69/0xe4) [ 548.768804] [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x1d/0x28) [ 548.794619] [] (generic_handle_irq) from [] (__handle_domain_irq+0x3f/0x80) [ 548.820694] [] (__handle_domain_irq) from [] (gic_handle_irq+0x21/0x4c) [ 548.845729] [] (gic_handle_irq) from [] (__irq_svc+0x3b/0x5c) The corresponding code is: void hrtimer_interrupt(struct clock_event_device *dev) { struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); ktime_t expires_next, now, entry_time, delta; int i, retries = 0; BUG_ON(!cpu_base->hres_active); [...] Reverting this patch "fixes" the problem. Nicolas