From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB3FCC433F5 for ; Fri, 7 Jan 2022 08:55:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346223AbiAGIzB (ORCPT ); Fri, 7 Jan 2022 03:55:01 -0500 Received: from szxga03-in.huawei.com ([45.249.212.189]:31147 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236255AbiAGIzA (ORCPT ); Fri, 7 Jan 2022 03:55:00 -0500 Received: from dggeme758-chm.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4JVcRT3my1zbbcF; Fri, 7 Jan 2022 16:52:21 +0800 (CST) Received: from huawei.com (10.67.174.47) by dggeme758-chm.china.huawei.com (10.3.19.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.20; Fri, 7 Jan 2022 16:54:58 +0800 From: He Ying To: , , , , , , CC: , , Subject: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code Date: Fri, 7 Jan 2022 03:55:36 -0500 Message-ID: <20220107085536.214501-1-heying24@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.67.174.47] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggeme758-chm.china.huawei.com (10.3.19.104) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Our product has been updating its kernel from 4.4 to 5.10 recently and found a performance issue. We do a bussiness test called ARP test, which tests the latency for a ping-pong packets traffic with a certain payload. The result is as following. - 4.4 kernel: avg = ~20s - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s I have been just learning arm64 pseudo-NMI code and have a question, why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI? I wonder if this brings some performance regression. First, I make this patch and then do the test again. Here's the result. - 5.10 kernel with this patch not applied: avg = ~40s - 5.10 kernel with this patch applied: avg = ~23s Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not set. It seems the pseudo-NMI feature actually brings some overhead to performance event if CONFIG_ARM64_PSEUDO_NMI is not set. Furthermore, I find the feature also brings some overhead to vmlinux size. I build 5.10 kernel with this patch applied or not while CONFIG_ARM64_PSEUDO_NMI is not set. - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes. - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes. That means arm64 pseudo-NMI feature may bring ~200KB overhead to vmlinux size. Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size and performance even if config is not set. To avoid it, add macro control all around the related code. Signed-off-by: He Ying --- arch/arm64/include/asm/irqflags.h | 38 +++++++++++++++++++++++++++++-- arch/arm64/kernel/entry.S | 4 ++++ 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index b57b9b1e4344..82f771b41cf5 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -26,6 +26,7 @@ */ static inline void arch_local_irq_enable(void) { +#ifdef CONFIG_ARM64_PSEUDO_NMI if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); @@ -41,10 +42,18 @@ static inline void arch_local_irq_enable(void) : "memory"); pmr_sync(); +#else + asm volatile( + "msr daifclr, #3 // arch_local_irq_enable" + : + : + : "memory"); +#endif } static inline void arch_local_irq_disable(void) { +#ifdef CONFIG_ARM64_PSEUDO_NMI if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); @@ -58,6 +67,13 @@ static inline void arch_local_irq_disable(void) : : "r" ((unsigned long) GIC_PRIO_IRQOFF) : "memory"); +#else + asm volatile( + "msr daifset, #3 // arch_local_irq_disable" + : + : + : "memory"); +#endif } /* @@ -66,7 +82,7 @@ static inline void arch_local_irq_disable(void) static inline unsigned long arch_local_save_flags(void) { unsigned long flags; - +#ifdef CONFIG_ARM64_PSEUDO_NMI asm volatile(ALTERNATIVE( "mrs %0, daif", __mrs_s("%0", SYS_ICC_PMR_EL1), @@ -74,12 +90,19 @@ static inline unsigned long arch_local_save_flags(void) : "=&r" (flags) : : "memory"); - +#else + asm volatile( + "mrs %0, daif" + : "=r" (flags) + : + : "memory"); +#endif return flags; } static inline int arch_irqs_disabled_flags(unsigned long flags) { +#ifdef CONFIG_ARM64_PSEUDO_NMI int res; asm volatile(ALTERNATIVE( @@ -91,6 +114,9 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) : "memory"); return res; +#else + return flags & PSR_I_BIT; +#endif } static inline int arch_irqs_disabled(void) @@ -119,6 +145,7 @@ static inline unsigned long arch_local_irq_save(void) */ static inline void arch_local_irq_restore(unsigned long flags) { +#ifdef CONFIG_ARM64_PSEUDO_NMI asm volatile(ALTERNATIVE( "msr daif, %0", __msr_s(SYS_ICC_PMR_EL1, "%0"), @@ -128,6 +155,13 @@ static inline void arch_local_irq_restore(unsigned long flags) : "memory"); pmr_sync(); +#else + asm volatile( + "msr daif, %0" + : + : "r" (flags) + : "memory"); +#endif } #endif /* __ASM_IRQFLAGS_H */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 2f69ae43941d..ffc32d3d909a 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -300,6 +300,7 @@ alternative_else_nop_endif str w21, [sp, #S_SYSCALLNO] .endif +#ifdef CONFIG_ARM64_PSEUDO_NMI /* Save pmr */ alternative_if ARM64_HAS_IRQ_PRIO_MASKING mrs_s x20, SYS_ICC_PMR_EL1 @@ -307,6 +308,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING mov x20, #GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET msr_s SYS_ICC_PMR_EL1, x20 alternative_else_nop_endif +#endif /* Re-enable tag checking (TCO set on exception entry) */ #ifdef CONFIG_ARM64_MTE @@ -330,6 +332,7 @@ alternative_else_nop_endif disable_daif .endif +#ifdef CONFIG_ARM64_PSEUDO_NMI /* Restore pmr */ alternative_if ARM64_HAS_IRQ_PRIO_MASKING ldr x20, [sp, #S_PMR_SAVE] @@ -339,6 +342,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING dsb sy // Ensure priority change is seen by redistributor .L__skip_pmr_sync\@: alternative_else_nop_endif +#endif ldp x21, x22, [sp, #S_PC] // load ELR, SPSR -- 2.17.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8618FC433EF for ; Fri, 7 Jan 2022 08:56:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:CC :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=3raBMUSMQ4L/YKG3ONNnnRyfCQ2O0jHBnzkQ7saaTBg=; b=lJLgysZUeVMbQm COh4td/969TtzWPQpX7mK8MdUNOACLaIdqhzdPuw7AAOU4FiTZdwJ5IwdnVbaZulPdDwmF3ed5vXO wYnkBzlwJUrgTBQzcP9aqSZhms+lDpIx8U6ZGQ2DkD2tnYoBhL3VGhF3gsalA8rWquBlCbWE5L4gU Esam+mm98SVcdoPneiiaaIBsJOzR7ZiSx6k8HmlZqCclJd78F4vYDlwB4StCGB8/+xQsFiwI1Lrca 6KyzNGigRSacEo/A3HT0ziC47sjPWTWBxysvKdAK27sDZmWa7o0hf3UMlYojH9hkf4663ifhDXA5o j+BL0vJSPuyzQhuNyIkg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n5l1i-002xWV-9m; Fri, 07 Jan 2022 08:55:10 +0000 Received: from szxga03-in.huawei.com ([45.249.212.189]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n5l1e-002xUY-6A for linux-arm-kernel@lists.infradead.org; Fri, 07 Jan 2022 08:55:08 +0000 Received: from dggeme758-chm.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4JVcRT3my1zbbcF; Fri, 7 Jan 2022 16:52:21 +0800 (CST) Received: from huawei.com (10.67.174.47) by dggeme758-chm.china.huawei.com (10.3.19.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.20; Fri, 7 Jan 2022 16:54:58 +0800 From: He Ying To: , , , , , , CC: , , Subject: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the pseudo-NMI code Date: Fri, 7 Jan 2022 03:55:36 -0500 Message-ID: <20220107085536.214501-1-heying24@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.67.174.47] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggeme758-chm.china.huawei.com (10.3.19.104) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220107_005506_607108_67D379AC X-CRM114-Status: GOOD ( 15.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Our product has been updating its kernel from 4.4 to 5.10 recently and found a performance issue. We do a bussiness test called ARP test, which tests the latency for a ping-pong packets traffic with a certain payload. The result is as following. - 4.4 kernel: avg = ~20s - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s I have been just learning arm64 pseudo-NMI code and have a question, why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI? I wonder if this brings some performance regression. First, I make this patch and then do the test again. Here's the result. - 5.10 kernel with this patch not applied: avg = ~40s - 5.10 kernel with this patch applied: avg = ~23s Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not set. It seems the pseudo-NMI feature actually brings some overhead to performance event if CONFIG_ARM64_PSEUDO_NMI is not set. Furthermore, I find the feature also brings some overhead to vmlinux size. I build 5.10 kernel with this patch applied or not while CONFIG_ARM64_PSEUDO_NMI is not set. - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes. - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes. That means arm64 pseudo-NMI feature may bring ~200KB overhead to vmlinux size. Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size and performance even if config is not set. To avoid it, add macro control all around the related code. Signed-off-by: He Ying --- arch/arm64/include/asm/irqflags.h | 38 +++++++++++++++++++++++++++++-- arch/arm64/kernel/entry.S | 4 ++++ 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index b57b9b1e4344..82f771b41cf5 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -26,6 +26,7 @@ */ static inline void arch_local_irq_enable(void) { +#ifdef CONFIG_ARM64_PSEUDO_NMI if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); @@ -41,10 +42,18 @@ static inline void arch_local_irq_enable(void) : "memory"); pmr_sync(); +#else + asm volatile( + "msr daifclr, #3 // arch_local_irq_enable" + : + : + : "memory"); +#endif } static inline void arch_local_irq_disable(void) { +#ifdef CONFIG_ARM64_PSEUDO_NMI if (system_has_prio_mask_debugging()) { u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); @@ -58,6 +67,13 @@ static inline void arch_local_irq_disable(void) : : "r" ((unsigned long) GIC_PRIO_IRQOFF) : "memory"); +#else + asm volatile( + "msr daifset, #3 // arch_local_irq_disable" + : + : + : "memory"); +#endif } /* @@ -66,7 +82,7 @@ static inline void arch_local_irq_disable(void) static inline unsigned long arch_local_save_flags(void) { unsigned long flags; - +#ifdef CONFIG_ARM64_PSEUDO_NMI asm volatile(ALTERNATIVE( "mrs %0, daif", __mrs_s("%0", SYS_ICC_PMR_EL1), @@ -74,12 +90,19 @@ static inline unsigned long arch_local_save_flags(void) : "=&r" (flags) : : "memory"); - +#else + asm volatile( + "mrs %0, daif" + : "=r" (flags) + : + : "memory"); +#endif return flags; } static inline int arch_irqs_disabled_flags(unsigned long flags) { +#ifdef CONFIG_ARM64_PSEUDO_NMI int res; asm volatile(ALTERNATIVE( @@ -91,6 +114,9 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) : "memory"); return res; +#else + return flags & PSR_I_BIT; +#endif } static inline int arch_irqs_disabled(void) @@ -119,6 +145,7 @@ static inline unsigned long arch_local_irq_save(void) */ static inline void arch_local_irq_restore(unsigned long flags) { +#ifdef CONFIG_ARM64_PSEUDO_NMI asm volatile(ALTERNATIVE( "msr daif, %0", __msr_s(SYS_ICC_PMR_EL1, "%0"), @@ -128,6 +155,13 @@ static inline void arch_local_irq_restore(unsigned long flags) : "memory"); pmr_sync(); +#else + asm volatile( + "msr daif, %0" + : + : "r" (flags) + : "memory"); +#endif } #endif /* __ASM_IRQFLAGS_H */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 2f69ae43941d..ffc32d3d909a 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -300,6 +300,7 @@ alternative_else_nop_endif str w21, [sp, #S_SYSCALLNO] .endif +#ifdef CONFIG_ARM64_PSEUDO_NMI /* Save pmr */ alternative_if ARM64_HAS_IRQ_PRIO_MASKING mrs_s x20, SYS_ICC_PMR_EL1 @@ -307,6 +308,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING mov x20, #GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET msr_s SYS_ICC_PMR_EL1, x20 alternative_else_nop_endif +#endif /* Re-enable tag checking (TCO set on exception entry) */ #ifdef CONFIG_ARM64_MTE @@ -330,6 +332,7 @@ alternative_else_nop_endif disable_daif .endif +#ifdef CONFIG_ARM64_PSEUDO_NMI /* Restore pmr */ alternative_if ARM64_HAS_IRQ_PRIO_MASKING ldr x20, [sp, #S_PMR_SAVE] @@ -339,6 +342,7 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING dsb sy // Ensure priority change is seen by redistributor .L__skip_pmr_sync\@: alternative_else_nop_endif +#endif ldp x21, x22, [sp, #S_PC] // load ELR, SPSR -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel