From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83D07C5519F for ; Tue, 17 Nov 2020 12:05:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31F9D2467D for ; Tue, 17 Nov 2020 12:05:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728056AbgKQMEs (ORCPT ); Tue, 17 Nov 2020 07:04:48 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:45449 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726807AbgKQMEr (ORCPT ); Tue, 17 Nov 2020 07:04:47 -0500 Received: from mail-ed1-f70.google.com ([209.85.208.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kezj2-0003C6-Jk for linux-kernel@vger.kernel.org; Tue, 17 Nov 2020 12:04:44 +0000 Received: by mail-ed1-f70.google.com with SMTP id l12so9530576edw.11 for ; Tue, 17 Nov 2020 04:04:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SVhaLNmBtuqnmi4tGMJBvp0RPN7vCBWkGFN0wZbfqIU=; b=mbYn/S7r2k4X1/z6EIdCvnSnMPijzE7XKNlRRqhbY1r6+92kZaA/GPHdaz5ti072r0 cn1SVkRQxvV5CnlBO2jcR2YvjfUn7iYyMwR2nTkbQ3tJhkshsjFk7BVNTBbg8Ir1sWqL kz664PliYoi9mJz3Mb4J8J2R9n5oJGIik2QkoKKc2wSRFYm7MvNF87UbzJeWClj8s/p0 KRYVh5wWgfMlGvvrV2F/YANEDlu7XPS0O3DRvHZOHhX+3Q6Lqsw1LdBmNhZlC/Q8f90M i3QIMWqz6FRgnASD3N+DxHsgXdvhX25Fj2JSWdFi8/4NpPbzaDcwjZo1WnrNXJ91Am5/ jJZQ== X-Gm-Message-State: AOAM533bnidDZTm+zwCQrDkKcZRyj7zp+vwkH61uqarHkhASmXhEnbjt A5B34/zhZHzXwPMlDQIjP6EfYLtWcQPNkh0ujV3Czhp8azZUniM+4x/Qq9foBGXd6EemNiW6v9W eQDCSNoVNXiieRGfTQlx5rCGbWDnPw+FRMhzV2lyYmTkBv8+txNeh6QUFdw== X-Received: by 2002:a17:906:fcdb:: with SMTP id qx27mr19442471ejb.470.1605614684124; Tue, 17 Nov 2020 04:04:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNjl3vH5wIqmDtDTI5xUTL95hPqz+4DoXqLbqHyC7et+hNdTOMMEjmG2P9V+/rCvbOS8n8iZnmmiiRjQEp+Us= X-Received: by 2002:a17:906:fcdb:: with SMTP id qx27mr19442435ejb.470.1605614683864; Tue, 17 Nov 2020 04:04:43 -0800 (PST) MIME-Version: 1.0 References: <20201117001907.GA1342260@bjorn-Precision-5520> <87h7poeqqn.fsf@x220.int.ebiederm.org> In-Reply-To: <87h7poeqqn.fsf@x220.int.ebiederm.org> From: Guilherme Piccoli Date: Tue, 17 Nov 2020 09:04:07 -0300 Message-ID: Subject: Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks To: "Eric W. Biederman" , Bjorn Helgaas , Thomas Gleixner Cc: lukas@wunner.de, linux-pci@vger.kernel.org, Pingfan Liu , andi@firstfloor.org, "H. Peter Anvin" , Baoquan He , x86@kernel.org, Sinan Kaya , Ingo Molnar , Jay Vosburgh , Dave Young , Gavin Guo , Borislav Petkov , Bjorn Helgaas , Guowen Shan , "Rafael J. Wysocki" , "Guilherme G. Piccoli" , kexec mailing list , LKML , Dan Streetman , Vivek Goyal Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 16, 2020 at 10:07 PM Eric W. Biederman wrote: > [...] > > I think we need to disable MSIs in the crashing kernel before the > > kexec. It adds a little more code in the crash_kexec() path, but it > > seems like a worthwhile tradeoff. > > Disabling MSIs in the b0rken kernel is not possible. > > Walking the device tree or even a significant subset of it hugely > decreases the chances that we will run into something that is incorrect > in the known broken kernel. I expect the code to do that would double > or triple the amount of code that must be executed in the known broken > kernel. The last time something like that happened (switching from xchg > to ordinary locks) we had cases that stopped working. Walking all of > the pci devices in the system is much more invasive. > I think we could try to decouple this problem in 2, if that makes sense. Bjorn/others can jump in and correct me if I'm wrong. So, the problem is to walk a PCI topology tree, identify every device and disable MSI(/INTx maybe) in them, and the big deal with doing that in the broken kernel before the kexec is that this task is complex due to the tree walk, mainly. But what if we keep a table (a simple 2-D array) with the address and data to be written to disable the MSIs, and before the kexec we could have a parameter enabling a function that just goes through this array and performs the writes? The table itself would be constructed by the PCI core (and updated in the hotplug path), when device discovery is happening. This table would live in a protected area in memory, with no write/execute access, this way if the kernel itself tries to corrupt that, we get a fault (yeah, I know DMAs can corrupt anywhere, but IOMMU could protect against that). If the parameter "kdump_clear_msi" is passed in the cmdline of the regular kernel, for example, then the function walks this simple table and performs the devices' writes before the kexec... If that's not possible or a bad idea for any reason, I still think the early_quirks() idea hereby proposed is not something we should discard, because it'll help a lot of users even with its limitations (in our case it worked very well). Also, taking here the opportunity to clarify my understanding about the limitations of that approach: Bjorn, in our reproducer machine we had 3 parents in the PCI tree (as per lspci -t), 0000:00, 0000:ff and 0000:80 - are those all under "segment 0" as per your verbiage? In our case the troublemaker was under 0000:80, and the early_quirks() shut the device up successfully. Thanks, Guilherme