From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1DFEC4363A for ; Mon, 26 Oct 2020 21:21:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B2AD4216FD for ; Mon, 26 Oct 2020 21:21:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="SToqhqmg"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="6HgJyqSF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731497AbgJZVVj (ORCPT ); Mon, 26 Oct 2020 17:21:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:42604 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730958AbgJZVVj (ORCPT ); Mon, 26 Oct 2020 17:21:39 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1603747296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1lkoWn6wcOFMTEOogm/Sqf3fEH2S/rX9rztcdNBPDfA=; b=SToqhqmgRX6Vzs/Mym0aZ8Z2G4TzrKFFfjWUHtG0QByqKG7ws9EAw4uRf2Ez6PT9x8BAb1 LBx3QZEIElMIjEeCkYbCJcAYibNF2Euzo07v2OY9ozAlWswtrzGx/flOvT+pgmlM/7GVmn B4zi7+m2VTjvxgcS0wZqGkdLdHtMdyoanfqbgoqEEe2kEU6FAtIzrVOHRDJ1OUacBlhCJf QUWuZs4NjdYGVGoI2aVJqnvSHmVLtYBiXVK3Jhqs+vtPYGJixLArq98sI1j8AEU4ESs2zg xjGdexISfRQCU6YtSrqom/9k80dBtcfzFOKxyeLNLUmwb4tJGhQcVSnlFjyEqw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1603747296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1lkoWn6wcOFMTEOogm/Sqf3fEH2S/rX9rztcdNBPDfA=; b=6HgJyqSFqo4iDdJgNJrQ9Jjj60Sy44P8+J6oyjZXgqpjmijD/kx/J8YwOJSirJ6rzSldOR qQw34ev+pgekZpCg== To: Guilherme Piccoli Cc: Pingfan Liu , LKML , Peter Zijlstra , Jisheng Zhang , Andrew Morton , Petr Mladek , Marc Zyngier , Linus Walleij , afzal mohammed , Lina Iyer , "Gustavo A. R. Silva" , Maulik Shah , Al Viro , Jonathan Corbet , Pawan Gupta , Mike Kravetz , Oliver Neukum , linux-doc@vger.kernel.org, Kexec Mailing List , Bjorn Helgaas Subject: Re: [PATCH 0/3] warn and suppress irqflood In-Reply-To: References: <1603346163-21645-1-git-send-email-kernelfans@gmail.com> <871rhq7j1h.fsf@nanos.tec.linutronix.de> <87y2js3ghv.fsf@nanos.tec.linutronix.de> Date: Mon, 26 Oct 2020 22:21:36 +0100 Message-ID: <87o8ko3cpr.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 26 2020 at 17:28, Guilherme Piccoli wrote: > On Mon, Oct 26, 2020 at 4:59 PM Thomas Gleixner wrote: >> It gets flooded right at the point where the crash kernel enables >> interrupts in start_kernel(). At that point there is no device driver >> and no interupt requested. All you can see on the console for this is >> >> "common_interrupt: $VECTOR.$CPU No irq handler for vector" >> >> And contrary to Liu's patches which try to disable a requested interrupt >> if too many of them arrive, the kernel cannot do anything because there >> is nothing to disable in your case. That's why you needed to do the MSI >> disable magic in the early PCI quirks which run before interrupts get >> enabled. > > Wow, thank you very much for this great explanation (without a > reproducer) - it's nice to hear somebody that deeply understands the > code! And double thanks for CCing Bjorn. Understanding the code is only half of the picture. You need to understand how the hardware works or not :) > So, I don't want to hijack Liu's thread, but do you think it makes > sense to have my approach as a (debug) parameter to prevent such a > degenerate case? At least it makes sense to some extent even if it's incomplete. What bothers me is that it'd be x86 specific while the issue is pretty much architecture independent. I don't think that the APIC is special in that regard. Rogue MSIs should be able to bring down pretty much all architectures. > Or could we have something in core IRQ code to prevent irq flooding in > such scenarios, something "stronger" than disabling MSIs (APIC-level, > likely)? For your case? No. The APIC cannot be protected against rogue MSIs. The only cure is to disable interrupts or disable MSIs on all PCI[E] devices early on. Disabling interrupts is not so much of an option obviously :) Thanks, tglx