From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76164C4332F for ; Thu, 15 Dec 2022 18:01:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230011AbiLOSB1 (ORCPT ); Thu, 15 Dec 2022 13:01:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229611AbiLOSBZ (ORCPT ); Thu, 15 Dec 2022 13:01:25 -0500 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FCA22C64D for ; Thu, 15 Dec 2022 10:01:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1671127283; x=1702663283; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=11PN6inZLs5h3RxEB3pplIQ2FSGK6lLgayoo2MSJoIk=; b=Z9mxvRVmRhCcZY00ZNaqcgOypiKBl/ik6n0rT4rkEvlqXAfYrdFTL4og 8Mf34UXt9KqIWnxGAttMRRto0i771MhEPzxU7SjLlixGhwgZld9G01RH7 LHVOG3rmlGChXdXN4JZyl3E4C92eMvG843nLBrDZzYkqcxvaS5rGy7wOW A=; X-IronPort-AV: E=Sophos;i="5.96,248,1665446400"; d="scan'208";a="273903022" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1d-m6i4x-d7759ebe.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 18:01:23 +0000 Received: from EX13MTAUWC002.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1d-m6i4x-d7759ebe.us-east-1.amazon.com (Postfix) with ESMTPS id B30C242A01; Thu, 15 Dec 2022 18:01:21 +0000 (UTC) Received: from EX19D003UWC001.ant.amazon.com (10.13.138.144) by EX13MTAUWC002.ant.amazon.com (10.43.162.240) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Thu, 15 Dec 2022 18:01:21 +0000 Received: from 88665a005865.ant.amazon.com (10.43.162.134) by EX19D003UWC001.ant.amazon.com (10.13.138.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.20; Thu, 15 Dec 2022 18:01:19 +0000 From: Geoff Blake To: CC: , Will Deacon , Mark Rutland , , Subject: [PATCH] perf/arm-cmn: Add shutdown routine Date: Thu, 15 Dec 2022 12:00:39 -0600 Message-ID: <20221215180039.18035-1-blakgeof@amazon.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> References: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.43.162.134] X-ClientProxiedBy: EX13D46UWB002.ant.amazon.com (10.43.161.70) To EX19D003UWC001.ant.amazon.com (10.13.138.144) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Attempt #2 with all the feedback from Robin to do the minimal amount of shutdown and handle spurious IRQs within the CMN driver but still do limited logging in the event a spurious IRQ still occurs in the future. Tested over 100's of kexec's and have no reproduced the spurious IRQs. The CMN driver does not gracefully handle all restart cases, such as kexec. On a kexec if the arm-cmn driver is in use it can be left in a state with still active events that can cause spurious and/or unhandled interrupts that appear as non-fatal kernel errors like below, that can be confusing and misleading: [ 3.895093] irq 28: nobody cared (try booting with the "irqpoll" option) [ 3.895170] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-1011-aws #12 [ 3.895172] Hardware name: Amazon EC2 c6g.metal/Not Specified, BIOS 1.0 10/16/2017 [ 3.895174] Call trace: [ 3.895175] dump_backtrace+0xe8/0x150 [ 3.895181] show_stack+0x28/0x70 [ 3.895183] dump_stack_lvl+0x68/0x9c [ 3.895188] dump_stack+0x1c/0x48 [ 3.895190] __report_bad_irq+0x58/0x138 [ 3.895193] note_interrupt+0x23c/0x360 [ 3.895196] handle_irq_event+0x108/0x1a0 [ 3.895198] handle_fasteoi_irq+0xd0/0x24c [ 3.895201] generic_handle_domain_irq+0x3c/0x70 [ 3.895203] __gic_handle_irq_from_irqson.isra.0+0xcc/0x2c0 [ 3.895207] gic_handle_irq+0x34/0xb0 [ 3.895209] call_on_irq_stack+0x40/0x50 [ 3.895211] do_interrupt_handler+0xb0/0xb4 [ 3.895214] el1_interrupt+0x4c/0xe0 [ 3.895217] el1h_64_irq_handler+0x1c/0x40 [ 3.895220] el1h_64_irq+0x78/0x7c [ 3.895222] __do_softirq+0xd0/0x450 [ 3.895223] __irq_exit_rcu+0xcc/0x120 [ 3.895227] irq_exit_rcu+0x20/0x40 [ 3.895229] el1_interrupt+0x50/0xe0 [ 3.895231] el1h_64_irq_handler+0x1c/0x40 [ 3.895233] el1h_64_irq+0x78/0x7c [ 3.895235] arch_cpu_idle+0x1c/0x6c [ 3.895238] default_idle_call+0x4c/0x19c [ 3.895240] cpuidle_idle_call+0x18c/0x1f0 [ 3.895243] do_idle+0xb0/0x11c [ 3.895245] cpu_startup_entry+0x34/0x40 [ 3.895248] rest_init+0xec/0x104 [ 3.895250] arch_post_acpi_subsys_init+0x0/0x30 [ 3.895254] start_kernel+0x4d0/0x534 [ 3.895256] __primary_switched+0xc4/0xcc [ 3.895259] handlers: [ 3.895292] [<000000008f5364c7>] arm_cmn_handle_irq [arm_cmn] [ 3.895369] Disabling IRQ #28 This type of kernel error can be reproduced by running perf with an arm_cmn event active and then forcing a kexec. On return from the kexec, this message can appear semi-regularly. Signed-off-by: Geoff Blake --- drivers/perf/arm-cmn.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index b80a9b74662b..5e661a9aa0fe 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -112,6 +112,7 @@ #define CMN_DTM_UNIT_INFO 0x0910 #define CMN_DTM_NUM_COUNTERS 4 +#define CMN_DTM_NUM_WPS 4 /* Want more local counters? Why not replicate the whole DTM! Ugh... */ #define CMN_DTM_OFFSET(n) ((n) * 0x200) @@ -1797,6 +1798,7 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) { + static int spurious_count = 100; struct arm_cmn_dtc *dtc = dev_id; irqreturn_t ret = IRQ_NONE; @@ -1825,8 +1827,13 @@ static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) writel_relaxed(status, dtc->base + CMN_DT_PMOVSR_CLR); - if (!dtc->irq_friend) - return ret; + if (!dtc->irq_friend) { + if (ret != IRQ_HANDLED && spurious_count > 0) { + spurious_count--; + WARN_ON(ret != IRQ_HANDLED); + } + return IRQ_HANDLED; + } dtc += dtc->irq_friend; } } @@ -1865,7 +1872,7 @@ static void arm_cmn_init_dtm(struct arm_cmn_dtm *dtm, struct arm_cmn_node *xp, i dtm->base = xp->pmu_base + CMN_DTM_OFFSET(idx); dtm->pmu_config_low = CMN_DTM_PMU_CONFIG_PMU_EN; - for (i = 0; i < 4; i++) { + for (i = 0; i < CMN_DTM_NUM_WPS; i++) { dtm->wp_event[i] = -1; writeq_relaxed(0, dtm->base + CMN_DTM_WPn_MASK(i)); writeq_relaxed(~0ULL, dtm->base + CMN_DTM_WPn_VAL(i)); @@ -2312,11 +2319,18 @@ static int arm_cmn_probe(struct platform_device *pdev) return err; } -static int arm_cmn_remove(struct platform_device *pdev) +static void arm_cmn_shutdown(struct platform_device *pdev) { struct arm_cmn *cmn = platform_get_drvdata(pdev); writel_relaxed(0, cmn->dtc[0].base + CMN_DT_DTC_CTL); +} + +static int arm_cmn_remove(struct platform_device *pdev) +{ + struct arm_cmn *cmn = platform_get_drvdata(pdev); + + arm_cmn_shutdown(pdev); perf_pmu_unregister(&cmn->pmu); cpuhp_state_remove_instance_nocalls(arm_cmn_hp_state, &cmn->cpuhp_node); @@ -2353,6 +2367,7 @@ static struct platform_driver arm_cmn_driver = { }, .probe = arm_cmn_probe, .remove = arm_cmn_remove, + .shutdown = arm_cmn_shutdown, }; static int __init arm_cmn_init(void) -- 2.24.3 (Apple Git-128) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4BCB8C4332F for ; Thu, 15 Dec 2022 18:02:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WCW4muK7PnCYOBo6sv0PRlRXRVwArDF6IFcq/nr3SWM=; b=HcM3jPVaM7AYC1 O3vVWZnBYo6bBBdXL6EbIxCoA4xgCedGq0wRUS7+uqPltSpgXbgvyHEbJr9cmD07gtjaKLW9PEpPs +3G9q11FxcWhN4vRq7EP5HzRL+vLjukHdacjNIW6m2pl6e8Kvi794fYHwXPXHVKgiprry4+ioKjoy gy8CS5bVdeHc/pqZlgUCTQeZV4ch8552bmcxs0Z5CA+GwJJE5EUGOYWTOYTWKix3Pwbqb7tW3kD8v EsojitMHu8j8JRkTE90APWJevG41nJpPkiRUw6Yt5qmC2yAkQohKaO4PNGxrqBilMzJ6xpFKcXQFP 3wWQzeapqKH9UvIac+bw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p5sY3-00Akoh-Vc; Thu, 15 Dec 2022 18:01:36 +0000 Received: from smtp-fw-2101.amazon.com ([72.21.196.25]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p5sY0-00Aklu-7T for linux-arm-kernel@lists.infradead.org; Thu, 15 Dec 2022 18:01:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1671127292; x=1702663292; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=11PN6inZLs5h3RxEB3pplIQ2FSGK6lLgayoo2MSJoIk=; b=hCI+u1nA6/9IOGrtoAzCjJ/BWHVEwcn+EhOXxAP/Jgie/V0M93pLUvi0 wlna/7kXDSS+q/nns6wCyM9PYYsz/2wowryKmWxa34gnnkNpGLPEbW27R RVjC8rWYlox/NNpwrmkzMSwlQHyk8X1eYOiXNa+MQqiYCtfQ8LDdb/BO1 M=; X-IronPort-AV: E=Sophos;i="5.96,248,1665446400"; d="scan'208";a="273903022" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1d-m6i4x-d7759ebe.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 18:01:23 +0000 Received: from EX13MTAUWC002.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-iad-1d-m6i4x-d7759ebe.us-east-1.amazon.com (Postfix) with ESMTPS id B30C242A01; Thu, 15 Dec 2022 18:01:21 +0000 (UTC) Received: from EX19D003UWC001.ant.amazon.com (10.13.138.144) by EX13MTAUWC002.ant.amazon.com (10.43.162.240) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Thu, 15 Dec 2022 18:01:21 +0000 Received: from 88665a005865.ant.amazon.com (10.43.162.134) by EX19D003UWC001.ant.amazon.com (10.13.138.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.20; Thu, 15 Dec 2022 18:01:19 +0000 From: Geoff Blake To: CC: , Will Deacon , Mark Rutland , , Subject: [PATCH] perf/arm-cmn: Add shutdown routine Date: Thu, 15 Dec 2022 12:00:39 -0600 Message-ID: <20221215180039.18035-1-blakgeof@amazon.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> References: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.134] X-ClientProxiedBy: EX13D46UWB002.ant.amazon.com (10.43.161.70) To EX19D003UWC001.ant.amazon.com (10.13.138.144) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221215_100132_483067_DF77044D X-CRM114-Status: GOOD ( 22.79 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Attempt #2 with all the feedback from Robin to do the minimal amount of shutdown and handle spurious IRQs within the CMN driver but still do limited logging in the event a spurious IRQ still occurs in the future. Tested over 100's of kexec's and have no reproduced the spurious IRQs. The CMN driver does not gracefully handle all restart cases, such as kexec. On a kexec if the arm-cmn driver is in use it can be left in a state with still active events that can cause spurious and/or unhandled interrupts that appear as non-fatal kernel errors like below, that can be confusing and misleading: [ 3.895093] irq 28: nobody cared (try booting with the "irqpoll" option) [ 3.895170] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-1011-aws #12 [ 3.895172] Hardware name: Amazon EC2 c6g.metal/Not Specified, BIOS 1.0 10/16/2017 [ 3.895174] Call trace: [ 3.895175] dump_backtrace+0xe8/0x150 [ 3.895181] show_stack+0x28/0x70 [ 3.895183] dump_stack_lvl+0x68/0x9c [ 3.895188] dump_stack+0x1c/0x48 [ 3.895190] __report_bad_irq+0x58/0x138 [ 3.895193] note_interrupt+0x23c/0x360 [ 3.895196] handle_irq_event+0x108/0x1a0 [ 3.895198] handle_fasteoi_irq+0xd0/0x24c [ 3.895201] generic_handle_domain_irq+0x3c/0x70 [ 3.895203] __gic_handle_irq_from_irqson.isra.0+0xcc/0x2c0 [ 3.895207] gic_handle_irq+0x34/0xb0 [ 3.895209] call_on_irq_stack+0x40/0x50 [ 3.895211] do_interrupt_handler+0xb0/0xb4 [ 3.895214] el1_interrupt+0x4c/0xe0 [ 3.895217] el1h_64_irq_handler+0x1c/0x40 [ 3.895220] el1h_64_irq+0x78/0x7c [ 3.895222] __do_softirq+0xd0/0x450 [ 3.895223] __irq_exit_rcu+0xcc/0x120 [ 3.895227] irq_exit_rcu+0x20/0x40 [ 3.895229] el1_interrupt+0x50/0xe0 [ 3.895231] el1h_64_irq_handler+0x1c/0x40 [ 3.895233] el1h_64_irq+0x78/0x7c [ 3.895235] arch_cpu_idle+0x1c/0x6c [ 3.895238] default_idle_call+0x4c/0x19c [ 3.895240] cpuidle_idle_call+0x18c/0x1f0 [ 3.895243] do_idle+0xb0/0x11c [ 3.895245] cpu_startup_entry+0x34/0x40 [ 3.895248] rest_init+0xec/0x104 [ 3.895250] arch_post_acpi_subsys_init+0x0/0x30 [ 3.895254] start_kernel+0x4d0/0x534 [ 3.895256] __primary_switched+0xc4/0xcc [ 3.895259] handlers: [ 3.895292] [<000000008f5364c7>] arm_cmn_handle_irq [arm_cmn] [ 3.895369] Disabling IRQ #28 This type of kernel error can be reproduced by running perf with an arm_cmn event active and then forcing a kexec. On return from the kexec, this message can appear semi-regularly. Signed-off-by: Geoff Blake --- drivers/perf/arm-cmn.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index b80a9b74662b..5e661a9aa0fe 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -112,6 +112,7 @@ #define CMN_DTM_UNIT_INFO 0x0910 #define CMN_DTM_NUM_COUNTERS 4 +#define CMN_DTM_NUM_WPS 4 /* Want more local counters? Why not replicate the whole DTM! Ugh... */ #define CMN_DTM_OFFSET(n) ((n) * 0x200) @@ -1797,6 +1798,7 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) { + static int spurious_count = 100; struct arm_cmn_dtc *dtc = dev_id; irqreturn_t ret = IRQ_NONE; @@ -1825,8 +1827,13 @@ static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) writel_relaxed(status, dtc->base + CMN_DT_PMOVSR_CLR); - if (!dtc->irq_friend) - return ret; + if (!dtc->irq_friend) { + if (ret != IRQ_HANDLED && spurious_count > 0) { + spurious_count--; + WARN_ON(ret != IRQ_HANDLED); + } + return IRQ_HANDLED; + } dtc += dtc->irq_friend; } } @@ -1865,7 +1872,7 @@ static void arm_cmn_init_dtm(struct arm_cmn_dtm *dtm, struct arm_cmn_node *xp, i dtm->base = xp->pmu_base + CMN_DTM_OFFSET(idx); dtm->pmu_config_low = CMN_DTM_PMU_CONFIG_PMU_EN; - for (i = 0; i < 4; i++) { + for (i = 0; i < CMN_DTM_NUM_WPS; i++) { dtm->wp_event[i] = -1; writeq_relaxed(0, dtm->base + CMN_DTM_WPn_MASK(i)); writeq_relaxed(~0ULL, dtm->base + CMN_DTM_WPn_VAL(i)); @@ -2312,11 +2319,18 @@ static int arm_cmn_probe(struct platform_device *pdev) return err; } -static int arm_cmn_remove(struct platform_device *pdev) +static void arm_cmn_shutdown(struct platform_device *pdev) { struct arm_cmn *cmn = platform_get_drvdata(pdev); writel_relaxed(0, cmn->dtc[0].base + CMN_DT_DTC_CTL); +} + +static int arm_cmn_remove(struct platform_device *pdev) +{ + struct arm_cmn *cmn = platform_get_drvdata(pdev); + + arm_cmn_shutdown(pdev); perf_pmu_unregister(&cmn->pmu); cpuhp_state_remove_instance_nocalls(arm_cmn_hp_state, &cmn->cpuhp_node); @@ -2353,6 +2367,7 @@ static struct platform_driver arm_cmn_driver = { }, .probe = arm_cmn_probe, .remove = arm_cmn_remove, + .shutdown = arm_cmn_shutdown, }; static int __init arm_cmn_init(void) -- 2.24.3 (Apple Git-128) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel