From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FCABC636CB for ; Sun, 18 Jul 2021 16:31:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 054E0611CB for ; Sun, 18 Jul 2021 16:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbhGRQeY (ORCPT ); Sun, 18 Jul 2021 12:34:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229462AbhGRQeV (ORCPT ); Sun, 18 Jul 2021 12:34:21 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DD10C061762; Sun, 18 Jul 2021 09:31:22 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id q190so14350816qkd.2; Sun, 18 Jul 2021 09:31:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lbh05RiC9h+30ZwLSJuzKF/YhVuILafOxJla7vQvCA0=; b=jmNWNzfIx2VINMPbOuBUsWw7VW5Q/BWYX9XhYEVPT4sPwF5HDm7MQrRaAbcG3JFPTj V0fsbZaWmM54ex9ja8pt32rWm5SqOzpPoisihs/387WycF2t43snthLEc08kMWxmHPg4 9GSFYDG9wcdXDTrwtVfR6oqOzF3JvuwMnkDPR6jflnc2USTr0Qc18ygfc2E78DrBoEgI 8mbrnnJje1MliJf/CTdrJC2F12vdPXdKvmJ39Rwy/qupeDIUmI9QkLh56j4WCeQR0a83 I2lrPzaM8bGJno6oAXaAHm98gsP+nvZqxrCJmyf+MpTP/fddxjO3LMTnD/6mXjFXX/zg d3+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lbh05RiC9h+30ZwLSJuzKF/YhVuILafOxJla7vQvCA0=; b=Bj6RWrfJWbhOLnESD30NXsBTzjQN9noTCMkjma0hUTlcCDv9ERFgsppcFVL+hd9jR+ 7R2Wfmy6ZdWdgVQbXbRQ0BWnwdALuzjBE++VH/Oz4gCLLCZMvp5d69839PRnwHNeGkPd QzkcdroMpsl4iFbH0tKKyIOkX/xG2dI25iryEBbSPpCGaGT4GHdrpe1VH2GQcqIND7i6 XiFU13SzN0GB5dcyLmqpWJPq8a7N3JeZLvgtpnfJ3UXyWeRLqDTFUufeiXFSa7zsNfjB bwQrB2py7rKNYT38XK4vGcRHCSfU5iVBhOmpRr09Dj5JpFNbOGw8PpDroZ0owf+Zjcok XUBw== X-Gm-Message-State: AOAM530v6EJV8rChzzFiG/ewOTmQC33BN0LrcDdPoRzmAUbZ6axMWrZN MwzlaX6RzUFVz9PZyS1FAz6WwzLIBYvV658c8Ww= X-Google-Smtp-Source: ABdhPJwdZrYq6+2AmWIqGSoKlOkdc1j95wY48iXw7BxS7NWHlKR+ilQMiiMkB4+eflmAtVVIaWMOKxa9rEMT2tFF+/A= X-Received: by 2002:a05:620a:24c7:: with SMTP id m7mr19929203qkn.143.1626625881476; Sun, 18 Jul 2021 09:31:21 -0700 (PDT) MIME-Version: 1.0 References: <20210708154550.GA1019947@bjorn-Precision-5520> In-Reply-To: <20210708154550.GA1019947@bjorn-Precision-5520> From: "Oliver O'Halloran" Date: Mon, 19 Jul 2021 02:31:10 +1000 Message-ID: Subject: Re: [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed To: Bjorn Helgaas Cc: =?UTF-8?Q?Pali_Roh=C3=A1r?= , Aaron Ma , jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com, "David S. Miller" , Jakub Kicinski , intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Linux Kernel Mailing List , =?UTF-8?Q?Krzysztof_Wilczy=C5=84ski?= , linux-pci Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 9, 2021 at 1:45 AM Bjorn Helgaas wrote: > > *snip* > > Apologies for rehashing what's probably obvious to everybody but me. > I'm trying to get a better handle on benign vs poisonous errors. > > MMIO means CPU reads or writes to the device. In PCI, writes are > posted and don't receive a response, so a driver will never see > writel() return an error (although an error may be reported > asynchronously via AER or similar). > > So I think we're mostly talking about CPU reads here. We expect a PCI > response containing the data. Sometimes there's no response or an > error response. The behavior of the host bridge in these error cases > is not defined by PCI, so what the CPU sees is not consistent across > platforms. In some cases, the bridge handles this as a catastrophic > error that forces a system restart. > > But in most cases, at least on x86, the bridge logs an error and > fabricates ~0 data so the CPU read can complete. Then it's up to > software to recognize that an error occurred and decide what to do > about it. Is this a benign or a poisonous error? > > I'd say this is a benign error. It certainly can't be ignored, but as > long as the driver recognizes the error, it should be able to deal > with it without crashing the whole system and forcing a restart. I was thinking more in terms of what the driver author sees rather than what's happening on the CPU side. The crash seen in the OP appears to be because the code is "doing an MMIO." However, the reasons for the crash have nothing to do with the actual mechanics of the operation (which should be benign). The point I was making is that the pattern of: if (is_disconnected()) return failure; return do_mmio_read(addr); does have some utility as a last-ditch attempt to prevent crashes in the face of obnoxious bridges or bad hardware. Granted, that should be a platform concern rather than something that should ever appear in driver code, but considering drivers open-code readl()/writel() calls there's not really any place to put that sort of workaround. That all said, the case in the OP is due to an entirely avoidable driver bug and that sort of hack is absolutely the wrong thing to do. Oliver