From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752203AbbDBJzk (ORCPT ); Thu, 2 Apr 2015 05:55:40 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:35379 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750771AbbDBJzj (ORCPT ); Thu, 2 Apr 2015 05:55:39 -0400 Date: Thu, 2 Apr 2015 11:55:33 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Chris J Arges , Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , the arch/x86 maintainers Subject: Re: smp_call_function_single lockups Message-ID: <20150402095533.GA4968@gmail.com> References: <20150331031536.GA9303@canonical.com> <20150331222327.GA12512@canonical.com> <20150401143236.GB12730@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > On Wed, Apr 1, 2015 at 7:32 AM, Chris J Arges > wrote: > > > > I included the full patch in reply to Ingo's email, and when > > running with that I no longer get the ack_APIC_irq WARNs. > > Ok. That means that the printk's themselves just change timing > enough, or change the compiler instruction scheduling so that it > hides the apic problem. So another possibility would be that it's the third change causing this change in behavior: diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6cedd7914581..833a981c5420 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -335,9 +340,11 @@ int apic_retrigger_irq(struct irq_data *data) void apic_ack_edge(struct irq_data *data) { + ack_APIC_irq(); + + /* Might generate IPIs, so do this after having ACKed the APIC: */ irq_complete_move(irqd_cfg(data)); irq_move_irq(data); - ack_APIC_irq(); } /* ... since with this we won't send IPIs in a semi-nested fashion with an unacked APIC, which is a good idea to do in general. It's also a weird enough hardware pattern that virtualization's APIC emulation might get it slightly wrong or slightly different. > Which very much indicates that these things are interconnected. > > For example, Ingo's printk patch does > > cfg->move_in_progress = > cpumask_intersects(cfg->old_domain, cpu_online_mask); > + if (cfg->move_in_progress) > + pr_info("apic: vector %02x, > same-domain move in progress\n", cfg->vector); > cpumask_and(cfg->domain, cfg->domain, tmp_mask); > > and that means that now the setting of move_in_progress is > serialized with the cpumask_and() in a way that it wasn't before. Yeah, that's a possibility too. It all looks very fragile. Thanks, Ingo