From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753851AbbBSWp4 (ORCPT <rfc822;w@1wt.eu>);
	Thu, 19 Feb 2015 17:45:56 -0500
Received: from mail-ie0-f173.google.com ([209.85.223.173]:46438 "EHLO
	mail-ie0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752134AbbBSWpz (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 19 Feb 2015 17:45:55 -0500
MIME-Version: 1.0
In-Reply-To: <CA+55aFxd1WGNBzSHeOGiXXdUD1GqDYv9PUNGdrdiGFwaX7HYJQ@mail.gmail.com>
References: <CAMiJ5CX5Ub66mQiVGwwq+PqV5Hf1Aoz2pdsSLSMKH7vJYuhSzg@mail.gmail.com>
	<CA+55aFxbtyciJwhgaLMK7XH8MyQ9Nm4=5Ke-QCo3WOaacegLXw@mail.gmail.com>
	<CA+55aFyGkOCVGD3Ds7Wt1z9Dw7cmk_yXw7YwruACQh5QAXOQvQ@mail.gmail.com>
	<CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.com>
	<20150218222544.GA17717@twins.programming.kicks-ass.net>
	<CAMiJ5CV--EFGnZSvJcrUrYVjy1PWueCQq5i5D+i0=p9BArPnjw@mail.gmail.com>
	<CA+55aFwWT--5mgKqryfFAbgaoEacsZn8dZ0POWH3xpdNgRMuRw@mail.gmail.com>
	<CAMiJ5CU+rvQr-_Ejd3m3ha3HsiSKu0Sq_fTaE2Ws_c_01=qbLQ@mail.gmail.com>
	<CA+55aFxWBKHth7x3FJ+dpGfy0ZT7SUhHnX7tDfgDo-wXTeX5Lg@mail.gmail.com>
	<CA+55aFx2n9zsqwuW=p6KJF62rXp+9_M-HF3wbeJRA-MeT0XLLw@mail.gmail.com>
	<CA+55aFyv1pJod7bhetc0ikmuCKzE=uhmT14KMju_fTbP93gLWA@mail.gmail.com>
	<CA+55aFxd1WGNBzSHeOGiXXdUD1GqDYv9PUNGdrdiGFwaX7HYJQ@mail.gmail.com>
Date: Thu, 19 Feb 2015 14:45:54 -0800
X-Google-Sender-Auth: erO8zsRWvM8k5HuNnrVCw2T_jM0
Message-ID: <CA+55aFxFkw7cKu6R8-v9z=c+yG+jsPHyQKW5-yyn3+M0BuyvxA@mail.gmail.com>
Subject: Re: smp_call_function_single lockups
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Rafael David Tinoco <inaddy@ubuntu.com>, Ingo Molnar <mingo@kernel.org>,
        Peter Anvin <hpa@zytor.com>, Jiang Liu <jiang.liu@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>, LKML <linux-kernel@vger.kernel.org>,
        Jens Axboe <axboe@kernel.dk>, Frederic Weisbecker <fweisbec@gmail.com>,
        Gema Gomez <gema.gomez-solano@canonical.com>,
        Christopher Arges <chris.j.arges@canonical.com>,
        "the arch/x86 maintainers" <x86@kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Feb 19, 2015 at 1:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Is this worth looking at? Or is it something spurious? I might have
> gotten the vectors wrong, and maybe the warning is not because the ISR
> bit isn't set, but because I test the wrong bit.

I edited the patch to do ratelimiting (one per 10s max) rather than
"once". And tested it some more. It seems to work correctly. The irq
case during 8042 probing is not repeatable, and I suspect it happens
because the interrupt source goes away (some probe-time thing that
first triggers an interrupt, but then clears it itself), so it doesn't
happen every boot, and I've gotten it with slightly different
backtraces.

But it's the only warning that happens for me, so I think my code is
right (at least for the cases that trigger on this machine). It's
definitely not a "every interrupt causes the warning because the code
was buggy, and the WARN_ONCE() just printed the first one".

It would be interesting to hear if others see spurious APIC EOI cases
too. In particular, the people seeing the IPI lockup. Because a lot of
the lockups we've seen have *looked* like the IPI interrupt just never
happened, and so we're waiting forever for the target CPU to react to
it. And just maybe the spurious EOI could cause the wrong bit to be
cleared in the ISR, and then the interrupt never shows up. Something
like that would certainly explain why it only happens on some machines
and under certain timing circumstances.

                    Linus