From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marc Zyngier Subject: Re: [PATCH 04/15] irqchip/gic: WARN if setting the interrupt type fails Date: Mon, 11 Apr 2016 16:39:20 +0100 Message-ID: <570BC528.9050601@arm.com> References: <1458224359-32665-1-git-send-email-jonathanh@nvidia.com> <1458224359-32665-5-git-send-email-jonathanh@nvidia.com> <56EAC761.1040801@nvidia.com> <20160409115854.492090a5@arm.com> <570BC34A.5030806@nvidia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <570BC34A.5030806-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> Sender: linux-tegra-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Jon Hunter Cc: Thomas Gleixner , Jason Cooper , =?UTF-8?Q?Beno=c3=aet_Cousson?= , Tony Lindgren , Rob Herring , Pawel Moll , Mark Rutland , Ian Campbell , Kumar Gala , Stephen Warren , Thierry Reding , Kevin Hilman , Geert Uytterhoeven , Grygorii Strashko , Lars-Peter Clausen , Linus Walleij , linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-tegra@vger.kernel.org On 11/04/16 16:31, Jon Hunter wrote: > Hi Mark, > > On 09/04/16 11:58, Marc Zyngier wrote: >> On Thu, 17 Mar 2016 15:04:01 +0000 >> Jon Hunter wrote: >> >>> >>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>> >>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>> whether this is allowed. There is no way to know if setting the type is >>>>> supported for a given GIC and so the value written is read back to >>>>> verify it matches the desired configuration. If it does not match then >>>>> an error is return. >>>>> >>>>> There are cases where the interrupt configuration read from firmware >>>>> (such as a device-tree blob), has been incorrect and hence >>>>> gic_configure_irq() has returned an error. This error has gone >>>>> undetected because the error code returned was ignored but the interrupt >>>>> still worked fine because the configuration for the interrupt could not >>>>> be overwritten. >>>>> >>>>> Given that this has done undetected and we should only fail to set the >>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>> places in the kernel where we should be checking the return status and >>>>> maintain back compatibility with firmware images that may have incorrect >>>>> interrupt configurations. >>>> >>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>> warn, but why do you want to return success? >>> >>> Yes that would be the correct thing to do I agree. However, the problem >>> is that if we do this, then after the patch "irqdomain: Don't set type >>> when mapping an IRQ" is applied, we may break interrupts for some >>> existing device-tree binaries that have bad configuration (such as omap4 >>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>> is a back compatibility issue. >>> >>> If you are wondering why these interrupts break after "irqdomain: Don't >>> set type when mapping an IRQ", it is because today >>> irq_create_fwspec_mapping() does not check the return code from setting >>> the type, but if we defer setting the type until __setup_irq() which >>> does check the return code, then all of a sudden interrupts that were >>> working (even with bad configurations) start to fail. >>> >>> The reason why I opted not to return an error code from >>> gic_configure_irq() is it really can't fail. The failure being reported >>> does not prevent the interrupt from working, but tells you your >>> configuration does not match the hardware setting which you cannot >>> overwrite. >>> >>> So to maintain back compatibility and avoid any silent errors, I opted >>> to make it a WARN and not return an error. >>> >>> If people are ok with potentially breaking interrupts for device-tree >>> binaries with bad settings, then I am ok to return an error here. >> >> I think we need to phase things. Let's start with warning people for a >> few kernel releases. Actively maintained platforms will quickly address >> the issue (fixing their DT). As I see it, this issue seems rather >> widespread (even kvmtool outputs a DT with the wrong triggering >> information). >> >> Once we've fixed the bulk of the platforms and virtual environments, we >> can start thinking about making it fail harder. > > Ok, so are you OK with this patch as-is? If so, can I add your ACK? It depends where you plan to handle the error. Ideally, I'd keep on returning the error (because that's the right thing to do), and move the WARN_ON() into the core code. We'd keep on ignoring the error as we're doing today, but we'd scream about it. After a couple of releases, we'd turn the WARN_ON into a hard fail. Thoughts? M. -- Jazz is not dead. It just smells funny... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932370AbcDKPj2 (ORCPT ); Mon, 11 Apr 2016 11:39:28 -0400 Received: from foss.arm.com ([217.140.101.70]:50676 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754259AbcDKPj0 (ORCPT ); Mon, 11 Apr 2016 11:39:26 -0400 Subject: Re: [PATCH 04/15] irqchip/gic: WARN if setting the interrupt type fails To: Jon Hunter References: <1458224359-32665-1-git-send-email-jonathanh@nvidia.com> <1458224359-32665-5-git-send-email-jonathanh@nvidia.com> <56EAC761.1040801@nvidia.com> <20160409115854.492090a5@arm.com> <570BC34A.5030806@nvidia.com> Cc: Thomas Gleixner , Jason Cooper , =?UTF-8?Q?Beno=c3=aet_Cousson?= , Tony Lindgren , Rob Herring , Pawel Moll , Mark Rutland , Ian Campbell , Kumar Gala , Stephen Warren , Thierry Reding , Kevin Hilman , Geert Uytterhoeven , Grygorii Strashko , Lars-Peter Clausen , Linus Walleij , linux-tegra@vger.kernel.org, linux-omap@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org From: Marc Zyngier Organization: ARM Ltd Message-ID: <570BC528.9050601@arm.com> Date: Mon, 11 Apr 2016 16:39:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.7.0 MIME-Version: 1.0 In-Reply-To: <570BC34A.5030806@nvidia.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/04/16 16:31, Jon Hunter wrote: > Hi Mark, > > On 09/04/16 11:58, Marc Zyngier wrote: >> On Thu, 17 Mar 2016 15:04:01 +0000 >> Jon Hunter wrote: >> >>> >>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>> >>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>> whether this is allowed. There is no way to know if setting the type is >>>>> supported for a given GIC and so the value written is read back to >>>>> verify it matches the desired configuration. If it does not match then >>>>> an error is return. >>>>> >>>>> There are cases where the interrupt configuration read from firmware >>>>> (such as a device-tree blob), has been incorrect and hence >>>>> gic_configure_irq() has returned an error. This error has gone >>>>> undetected because the error code returned was ignored but the interrupt >>>>> still worked fine because the configuration for the interrupt could not >>>>> be overwritten. >>>>> >>>>> Given that this has done undetected and we should only fail to set the >>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>> places in the kernel where we should be checking the return status and >>>>> maintain back compatibility with firmware images that may have incorrect >>>>> interrupt configurations. >>>> >>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>> warn, but why do you want to return success? >>> >>> Yes that would be the correct thing to do I agree. However, the problem >>> is that if we do this, then after the patch "irqdomain: Don't set type >>> when mapping an IRQ" is applied, we may break interrupts for some >>> existing device-tree binaries that have bad configuration (such as omap4 >>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>> is a back compatibility issue. >>> >>> If you are wondering why these interrupts break after "irqdomain: Don't >>> set type when mapping an IRQ", it is because today >>> irq_create_fwspec_mapping() does not check the return code from setting >>> the type, but if we defer setting the type until __setup_irq() which >>> does check the return code, then all of a sudden interrupts that were >>> working (even with bad configurations) start to fail. >>> >>> The reason why I opted not to return an error code from >>> gic_configure_irq() is it really can't fail. The failure being reported >>> does not prevent the interrupt from working, but tells you your >>> configuration does not match the hardware setting which you cannot >>> overwrite. >>> >>> So to maintain back compatibility and avoid any silent errors, I opted >>> to make it a WARN and not return an error. >>> >>> If people are ok with potentially breaking interrupts for device-tree >>> binaries with bad settings, then I am ok to return an error here. >> >> I think we need to phase things. Let's start with warning people for a >> few kernel releases. Actively maintained platforms will quickly address >> the issue (fixing their DT). As I see it, this issue seems rather >> widespread (even kvmtool outputs a DT with the wrong triggering >> information). >> >> Once we've fixed the bulk of the platforms and virtual environments, we >> can start thinking about making it fail harder. > > Ok, so are you OK with this patch as-is? If so, can I add your ACK? It depends where you plan to handle the error. Ideally, I'd keep on returning the error (because that's the right thing to do), and move the WARN_ON() into the core code. We'd keep on ignoring the error as we're doing today, but we'd scream about it. After a couple of releases, we'd turn the WARN_ON into a hard fail. Thoughts? M. -- Jazz is not dead. It just smells funny...