From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EBE1C49EA6 for ; Thu, 24 Jun 2021 23:19:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62FA36137D for ; Thu, 24 Jun 2021 23:19:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232871AbhFXXVS (ORCPT ); Thu, 24 Jun 2021 19:21:18 -0400 Received: from foss.arm.com ([217.140.110.172]:41494 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229643AbhFXXVP (ORCPT ); Thu, 24 Jun 2021 19:21:15 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7000ED1; Thu, 24 Jun 2021 16:18:55 -0700 (PDT) Received: from [10.57.9.136] (unknown [10.57.9.136]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D58623F718; Thu, 24 Jun 2021 16:18:53 -0700 (PDT) Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated To: Bjorn Helgaas , Javier Martinez Canillas Cc: linux-kernel@vger.kernel.org, Peter Robinson , Shawn Lin , Bjorn Helgaas , Heiko Stuebner , Lorenzo Pieralisi , Rob Herring , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, linux-rockchip@lists.infradead.org References: <20210624215750.GA3556174@bjorn-Precision-5520> From: Robin Murphy Message-ID: <44c551d7-fee4-13cf-2929-6d2383dd5497@arm.com> Date: Fri, 25 Jun 2021 00:18:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210624215750.GA3556174@bjorn-Precision-5520> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-06-24 22:57, Bjorn Helgaas wrote: > On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote: >> IRQ handlers that are registered for shared interrupts can be called at >> any time after have been registered using the request_irq() function. >> >> It's up to drivers to ensure that's always safe for these to be called. >> >> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since >> their handlers are registered very early in the probe function, an error >> later can lead to these handlers being executed before all the required >> resources have been properly setup. >> >> For example, the rockchip_pcie_read() function used by these IRQ handlers >> expects that some PCIe clocks will already be enabled, otherwise trying >> to access the PCIe registers causes the read to hang and never return. > > The read *never* completes? That might be a bit problematic because > it implies that we may not be able to recover from PCIe errors. Most > controllers will timeout eventually, log an error, and either > fabricate some data (typically ~0) to complete the CPU's read or cause > some kind of abort or machine check. > > Just asking in case there's some controller configuration that should > be tweaked. If I'm following correctly, that'll be a read transaction to the native side of the controller itself; it can't complete that read, or do anything else either, because it's clock-gated, and thus completely oblivious (it might be that if another CPU was able to enable the clocks then everything would carry on as normal, or it might end up totally deadlocking the SoC interconnect). I think it's safe to assume that in that state nothing of importance would be happening on the PCIe side, and even if it was we'd never get to know about it. The only relevant configuration would be "don't turn the clocks off if you're using the thing", which in actual operation can be taken for granted. It's a fairly typical bug to register an IRQ as shared but assume in the handler that you'll only ever be called for your own device's IRQ while it's powered up/clocked/etc. in its normal operational state, hence CONFIG_DEBUG_SHIRQ helps flush those kinds of unreliable assumptions out. Robin. (this reminds me of the "fun" I once had where a machine was locking up during boot, but simply connecting an external debugger to find out exactly where it was stuck happened to automatically enable the offending power domain and un-stick it) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17DD5C49EA5 for ; Thu, 24 Jun 2021 23:19:18 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D72E360232 for ; Thu, 24 Jun 2021 23:19:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D72E360232 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-rockchip-bounces+linux-rockchip=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:Cc:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=oEHkTSr5258CaLa9GrUHcQwORWz/r4zz/DIzoYcVGS8=; b=3YZSg2okjYOmhBCv0axSHpfg1D NtoRyIs5R31DLMV4mdklS1ukZk7eJ7c2tjAMrrH451e2e4lFiUJjXPiYWJvTDX374riipEyPgAi1K 0BpMo4auT+i4uBOFib6vQMDeDTepDW1MhpRT7yd4qJmtjyfpfeLKzkyTkvTELstn3LiLqEaiJO208 8EEROvpu/qd368ebomOF5p78iBFy3RLOCVbfGv/WQgunK5CnPXBjY8fZTlvIOmgPq0VsWruQWKG0H x1nnk9VLoFJ3t+XzsuezwOXiowUF1VWlB4giLUr9M0EiqVcD7tiKhvY91wjPtzmp+qPwB2Tzs7ORh RVBmyUig==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lwYcq-00GiOv-E6; Thu, 24 Jun 2021 23:19:12 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lwYcd-00GiNP-E7; Thu, 24 Jun 2021 23:19:01 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7000ED1; Thu, 24 Jun 2021 16:18:55 -0700 (PDT) Received: from [10.57.9.136] (unknown [10.57.9.136]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D58623F718; Thu, 24 Jun 2021 16:18:53 -0700 (PDT) Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated To: Bjorn Helgaas , Javier Martinez Canillas Cc: linux-kernel@vger.kernel.org, Peter Robinson , Shawn Lin , Bjorn Helgaas , Heiko Stuebner , Lorenzo Pieralisi , Rob Herring , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, linux-rockchip@lists.infradead.org References: <20210624215750.GA3556174@bjorn-Precision-5520> From: Robin Murphy Message-ID: <44c551d7-fee4-13cf-2929-6d2383dd5497@arm.com> Date: Fri, 25 Jun 2021 00:18:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210624215750.GA3556174@bjorn-Precision-5520> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210624_161859_580271_17C31B18 X-CRM114-Status: GOOD ( 21.49 ) X-BeenThere: linux-rockchip@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Upstream kernel work for Rockchip platforms List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "Linux-rockchip" Errors-To: linux-rockchip-bounces+linux-rockchip=archiver.kernel.org@lists.infradead.org On 2021-06-24 22:57, Bjorn Helgaas wrote: > On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote: >> IRQ handlers that are registered for shared interrupts can be called at >> any time after have been registered using the request_irq() function. >> >> It's up to drivers to ensure that's always safe for these to be called. >> >> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since >> their handlers are registered very early in the probe function, an error >> later can lead to these handlers being executed before all the required >> resources have been properly setup. >> >> For example, the rockchip_pcie_read() function used by these IRQ handlers >> expects that some PCIe clocks will already be enabled, otherwise trying >> to access the PCIe registers causes the read to hang and never return. > > The read *never* completes? That might be a bit problematic because > it implies that we may not be able to recover from PCIe errors. Most > controllers will timeout eventually, log an error, and either > fabricate some data (typically ~0) to complete the CPU's read or cause > some kind of abort or machine check. > > Just asking in case there's some controller configuration that should > be tweaked. If I'm following correctly, that'll be a read transaction to the native side of the controller itself; it can't complete that read, or do anything else either, because it's clock-gated, and thus completely oblivious (it might be that if another CPU was able to enable the clocks then everything would carry on as normal, or it might end up totally deadlocking the SoC interconnect). I think it's safe to assume that in that state nothing of importance would be happening on the PCIe side, and even if it was we'd never get to know about it. The only relevant configuration would be "don't turn the clocks off if you're using the thing", which in actual operation can be taken for granted. It's a fairly typical bug to register an IRQ as shared but assume in the handler that you'll only ever be called for your own device's IRQ while it's powered up/clocked/etc. in its normal operational state, hence CONFIG_DEBUG_SHIRQ helps flush those kinds of unreliable assumptions out. Robin. (this reminds me of the "fun" I once had where a machine was locking up during boot, but simply connecting an external debugger to find out exactly where it was stuck happened to automatically enable the offending power domain and un-stick it) _______________________________________________ Linux-rockchip mailing list Linux-rockchip@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FB8CC49EA5 for ; Thu, 24 Jun 2021 23:21:27 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 33D5260230 for ; Thu, 24 Jun 2021 23:21:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33D5260230 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:Cc:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=05GWx0Y9qV0qDeqvgbhPqINB9ZDjia6kQVkFRm7Kreo=; b=cswJyuVAvV2wct2dmNfYq4KKvH hHzBMBye+5+xJWI+yU7Kor7vZW/okOzqEfFdDQzuX+weF2HF3BgDUcU5f2l9pQJkjFJo8Jg9MQB3i QDv00aIZixZh0VnzEXIhDTiZP55bs0Cx9JUd1QV3D8JggxL/vipBnuAjE5TBcg+sbT2wTRE/4HBBT wdPf6x3Q2yeZmXzFEFyf+IEDarFkP/ofN971di2kVvYiMij8LmX8jAopK8CmvNdzgyRwpDucuVWVo loy6Byv5PzRekNC0I5YPorhc7+3soAJtN02vdZLrt2FrooBHGRzDXYzc1qK2m8Cq0lqC9qtHLzgKv E02vwwIQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lwYd1-00GiQs-5x; Thu, 24 Jun 2021 23:19:23 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lwYcd-00GiNP-E7; Thu, 24 Jun 2021 23:19:01 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7000ED1; Thu, 24 Jun 2021 16:18:55 -0700 (PDT) Received: from [10.57.9.136] (unknown [10.57.9.136]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D58623F718; Thu, 24 Jun 2021 16:18:53 -0700 (PDT) Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated To: Bjorn Helgaas , Javier Martinez Canillas Cc: linux-kernel@vger.kernel.org, Peter Robinson , Shawn Lin , Bjorn Helgaas , Heiko Stuebner , Lorenzo Pieralisi , Rob Herring , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, linux-rockchip@lists.infradead.org References: <20210624215750.GA3556174@bjorn-Precision-5520> From: Robin Murphy Message-ID: <44c551d7-fee4-13cf-2929-6d2383dd5497@arm.com> Date: Fri, 25 Jun 2021 00:18:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210624215750.GA3556174@bjorn-Precision-5520> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210624_161859_580271_17C31B18 X-CRM114-Status: GOOD ( 21.49 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 2021-06-24 22:57, Bjorn Helgaas wrote: > On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote: >> IRQ handlers that are registered for shared interrupts can be called at >> any time after have been registered using the request_irq() function. >> >> It's up to drivers to ensure that's always safe for these to be called. >> >> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since >> their handlers are registered very early in the probe function, an error >> later can lead to these handlers being executed before all the required >> resources have been properly setup. >> >> For example, the rockchip_pcie_read() function used by these IRQ handlers >> expects that some PCIe clocks will already be enabled, otherwise trying >> to access the PCIe registers causes the read to hang and never return. > > The read *never* completes? That might be a bit problematic because > it implies that we may not be able to recover from PCIe errors. Most > controllers will timeout eventually, log an error, and either > fabricate some data (typically ~0) to complete the CPU's read or cause > some kind of abort or machine check. > > Just asking in case there's some controller configuration that should > be tweaked. If I'm following correctly, that'll be a read transaction to the native side of the controller itself; it can't complete that read, or do anything else either, because it's clock-gated, and thus completely oblivious (it might be that if another CPU was able to enable the clocks then everything would carry on as normal, or it might end up totally deadlocking the SoC interconnect). I think it's safe to assume that in that state nothing of importance would be happening on the PCIe side, and even if it was we'd never get to know about it. The only relevant configuration would be "don't turn the clocks off if you're using the thing", which in actual operation can be taken for granted. It's a fairly typical bug to register an IRQ as shared but assume in the handler that you'll only ever be called for your own device's IRQ while it's powered up/clocked/etc. in its normal operational state, hence CONFIG_DEBUG_SHIRQ helps flush those kinds of unreliable assumptions out. Robin. (this reminds me of the "fun" I once had where a machine was locking up during boot, but simply connecting an external debugger to find out exactly where it was stuck happened to automatically enable the offending power domain and un-stick it) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel