From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752719AbbDAPgk (ORCPT ); Wed, 1 Apr 2015 11:36:40 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:35039 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751136AbbDAPgh (ORCPT ); Wed, 1 Apr 2015 11:36:37 -0400 MIME-Version: 1.0 In-Reply-To: <20150401143236.GB12730@canonical.com> References: <20150331031536.GA9303@canonical.com> <20150331222327.GA12512@canonical.com> <20150401143236.GB12730@canonical.com> Date: Wed, 1 Apr 2015 08:36:36 -0700 X-Google-Sender-Auth: S6eKgz_cDo_pnH8UBXub6pLI6sw Message-ID: Subject: Re: smp_call_function_single lockups From: Linus Torvalds To: Chris J Arges Cc: Rafael David Tinoco , Ingo Molnar , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , "the arch/x86 maintainers" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 1, 2015 at 7:32 AM, Chris J Arges wrote: > > I included the full patch in reply to Ingo's email, and when running with that > I no longer get the ack_APIC_irq WARNs. Ok. That means that the printk's themselves just change timing enough, or change the compiler instruction scheduling so that it hides the apic problem. Which very much indicates that these things are interconnected. For example, Ingo's printk patch does cfg->move_in_progress = cpumask_intersects(cfg->old_domain, cpu_online_mask); + if (cfg->move_in_progress) + pr_info("apic: vector %02x, same-domain move in progress\n", cfg->vector); cpumask_and(cfg->domain, cfg->domain, tmp_mask); and that means that now the setting of move_in_progress is serialized with the cpumask_and() in a way that it wasn't before. And while the code takes the "vector_lock" and disables interrupts, the interrupts themselves can happily continue on other cpu's, and they don't take the vector_lock. Neither does send_cleanup_vector(), which clears that bit, afaik. I don't know. The locking there is odd. > My next homework assignments are: > - Testing with irqbalance disabled Definitely. > - Testing w/ the appropriate dump_stack() in Ingo's patch > - L0 testing Thanks, Linus