From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BA54C433E3 for ; Wed, 15 Jul 2020 22:52:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4BDEB20656 for ; Wed, 15 Jul 2020 22:52:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727094AbgGOWw3 (ORCPT ); Wed, 15 Jul 2020 18:52:29 -0400 Received: from kernel.crashing.org ([76.164.61.194]:37670 "EHLO kernel.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbgGOWw3 (ORCPT ); Wed, 15 Jul 2020 18:52:29 -0400 Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 06FMnOHG014418 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 15 Jul 2020 17:49:27 -0500 Message-ID: <5d4b3a716f85017c17c52a85915fba9e19509e81.camel@kernel.crashing.org> Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 From: Benjamin Herrenschmidt To: Bjorn Helgaas , David Laight Cc: "'Oliver O'Halloran'" , Arnd Bergmann , Keith Busch , Paul Mackerras , sparclinux , Toan Le , Greg Ungerer , Marek Vasut , Rob Herring , Lorenzo Pieralisi , Sagi Grimberg , Russell King , Ley Foon Tan , Christoph Hellwig , Geert Uytterhoeven , Kevin Hilman , linux-pci , Jakub Kicinski , Matt Turner , "linux-kernel-mentees@lists.linuxfoundation.org" , Guenter Roeck , Ray Jui , Jens Axboe , Ivan Kokshaysky , Shuah Khan , "bjorn@helgaas.com" , Boris Ostrovsky , Richard Henderson , Juergen Gross , Bjorn Helgaas , Thomas Bogendoerfer , Scott Branden , Jingoo Han , "Saheed O. Bolarinwa" , "linux-kernel@vger.kernel.org" , Philipp Zabel , Greg Kroah-Hartman , Gustavo Pimentel , linuxppc-dev , "David S. Miller" , Heiner Kallweit Date: Thu, 16 Jul 2020 08:49:21 +1000 In-Reply-To: <20200715221230.GA563957@bjorn-Precision-5520> References: <20200715221230.GA563957@bjorn-Precision-5520> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Herrenschmidt Date: Wed, 15 Jul 2020 22:49:21 +0000 Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 Message-Id: <5d4b3a716f85017c17c52a85915fba9e19509e81.camel@kernel.crashing.org> List-Id: References: <20200715221230.GA563957@bjorn-Precision-5520> In-Reply-To: <20200715221230.GA563957@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Bjorn Helgaas , David Laight Cc: linux-pci , Paul Mackerras , sparclinux , Toan Le , Christoph Hellwig , Marek Vasut , Rob Herring , Lorenzo Pieralisi , Sagi Grimberg , Kevin Hilman , Russell King , Ley Foon Tan , Greg Ungerer , Geert Uytterhoeven , Jakub Kicinski , Matt Turner , "linux-kernel-mentees@lists.linuxfoundation.org" , Guenter Roeck , Arnd Bergmann , Ray Jui , linuxppc-dev , Jens Axboe , Ivan Kokshaysky , Keith Busch , Boris Ostrovsky , Richard Henderson , Juergen Gross , Thomas Bogendoerfer , Scott Branden , Jingoo Han , "linux-kernel@vger.kernel.org" , Philipp Zabel , "Saheed O. Bolarinwa" , 'Oliver O'Halloran' , Gustavo Pimentel , Bjorn Helgaas , "David S. Miller" , Heiner Kallweit On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2798AC433E0 for ; Wed, 15 Jul 2020 22:52:59 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7E09B20656 for ; Wed, 15 Jul 2020 22:52:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E09B20656 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4B6Xj43lpLzDqlP for ; Thu, 16 Jul 2020 08:52:56 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=permerror (SPF Permanent Error: Unknown mechanism found: ip:192.40.192.88/32) smtp.mailfrom=kernel.crashing.org (client-ip=76.164.61.194; helo=kernel.crashing.org; envelope-from=benh@kernel.crashing.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Received: from kernel.crashing.org (kernel.crashing.org [76.164.61.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4B6Xg54f8KzDqgM for ; Thu, 16 Jul 2020 08:51:13 +1000 (AEST) Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 06FMnOHG014418 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 15 Jul 2020 17:49:27 -0500 Message-ID: <5d4b3a716f85017c17c52a85915fba9e19509e81.camel@kernel.crashing.org> Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 From: Benjamin Herrenschmidt To: Bjorn Helgaas , David Laight Date: Thu, 16 Jul 2020 08:49:21 +1000 In-Reply-To: <20200715221230.GA563957@bjorn-Precision-5520> References: <20200715221230.GA563957@bjorn-Precision-5520> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Greg Kroah-Hartman , linux-pci , "bjorn@helgaas.com" , Paul Mackerras , sparclinux , Toan Le , Christoph Hellwig , Marek Vasut , Rob Herring , Lorenzo Pieralisi , Sagi Grimberg , Kevin Hilman , Russell King , Ley Foon Tan , Greg Ungerer , Geert Uytterhoeven , Jakub Kicinski , Matt Turner , "linux-kernel-mentees@lists.linuxfoundation.org" , Guenter Roeck , Arnd Bergmann , Ray Jui , linuxppc-dev , Jens Axboe , Ivan Kokshaysky , Shuah Khan , Keith Busch , Boris Ostrovsky , Richard Henderson , Juergen Gross , Thomas Bogendoerfer , Scott Branden , Jingoo Han , "linux-kernel@vger.kernel.org" , Philipp Zabel , "Saheed O. Bolarinwa" , 'Oliver O'Halloran' , Gustavo Pimentel , Bjorn Helgaas , "David S. Miller" , Heiner Kallweit Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7396C433DF for ; Wed, 15 Jul 2020 22:51:12 +0000 (UTC) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B27BF20656 for ; Wed, 15 Jul 2020 22:51:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B27BF20656 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linux-kernel-mentees-bounces@lists.linuxfoundation.org Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 7CE8988775; Wed, 15 Jul 2020 22:51:12 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OcLJye7FK3my; Wed, 15 Jul 2020 22:51:11 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id 601588875C; Wed, 15 Jul 2020 22:51:11 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 49943C0891; Wed, 15 Jul 2020 22:51:11 +0000 (UTC) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 38BE2C0733 for ; Wed, 15 Jul 2020 22:51:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 2EADD8875C for ; Wed, 15 Jul 2020 22:51:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cH2BgoF_nZdY for ; Wed, 15 Jul 2020 22:51:09 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from kernel.crashing.org (kernel.crashing.org [76.164.61.194]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 95CB0886F4 for ; Wed, 15 Jul 2020 22:51:09 +0000 (UTC) Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 06FMnOHG014418 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 15 Jul 2020 17:49:27 -0500 Message-ID: <5d4b3a716f85017c17c52a85915fba9e19509e81.camel@kernel.crashing.org> From: Benjamin Herrenschmidt To: Bjorn Helgaas , David Laight Date: Thu, 16 Jul 2020 08:49:21 +1000 In-Reply-To: <20200715221230.GA563957@bjorn-Precision-5520> References: <20200715221230.GA563957@bjorn-Precision-5520> X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Cc: linux-pci , Paul Mackerras , sparclinux , Toan Le , Christoph Hellwig , Marek Vasut , Rob Herring , Lorenzo Pieralisi , Sagi Grimberg , Kevin Hilman , Russell King , Ley Foon Tan , Greg Ungerer , Geert Uytterhoeven , Jakub Kicinski , Matt Turner , "linux-kernel-mentees@lists.linuxfoundation.org" , Guenter Roeck , Arnd Bergmann , Ray Jui , linuxppc-dev , Jens Axboe , Ivan Kokshaysky , Keith Busch , Boris Ostrovsky , Richard Henderson , Juergen Gross , Thomas Bogendoerfer , Scott Branden , Jingoo Han , "linux-kernel@vger.kernel.org" , Philipp Zabel , "Saheed O. Bolarinwa" , 'Oliver O'Halloran' , Gustavo Pimentel , Bjorn Helgaas , "David S. Miller" , Heiner Kallweit Subject: Re: [Linux-kernel-mentees] [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 X-BeenThere: linux-kernel-mentees@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-kernel-mentees-bounces@lists.linuxfoundation.org Sender: "Linux-kernel-mentees" On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben. _______________________________________________ Linux-kernel-mentees mailing list Linux-kernel-mentees@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees