From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F10D7C432BE for ; Thu, 2 Sep 2021 15:00:52 +0000 (UTC) Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 380C8610A4 for ; Thu, 2 Sep 2021 15:00:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 380C8610A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.denx.de Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id F2087832B4; Thu, 2 Sep 2021 17:00:38 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=denx.de; s=phobos-20191101; t=1630594839; bh=OdzmEXprdRQ90uYJhtqS0Fmx9HEhrNVfDcOWmllt1BQ=; h=From:To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From; b=u6fj4VzFawHZ1wsQeE+EWIjQzwTLnBeYo437rrvG7KsmGCiayGG+IF12pO3+QvZns JrjT0SEYb96KjisuBfL1pcj4JThUZn7A0E8VOFbPXs57PkP48BwrxIArYBlpPSdmFL +OOnriJU7vx0pRHXOX3umu1AUu1YRty4MIHRYESdFN+0uNV7l3hOLw85m0q6yyr1TA YK9q+ypG1T/z3iJy4rbRAPfmIjN4Qd4dcV7cz5ZbbZdrQLWe2FVCI51vBriY9+Caew P7wquuurO+Hu3IwdzCHEm47rMPe35ZMgpgvv0jmy4sm30WT4HG/EeiDzWJ3hmvy/+Z hW/UG1+z6S9Lw== Received: by phobos.denx.de (Postfix, from userid 109) id 03EAA832A1; Thu, 2 Sep 2021 17:00:35 +0200 (CEST) Received: from mout-u-107.mailbox.org (mout-u-107.mailbox.org [91.198.250.252]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id 70B2B832A1 for ; Thu, 2 Sep 2021 17:00:28 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: phobos.denx.de; spf=fail smtp.mailfrom=sr@denx.de Received: from smtp102.mailbox.org (smtp102.mailbox.org [80.241.60.233]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-u-107.mailbox.org (Postfix) with ESMTPS id 4H0kcr2BDszQkG9; Thu, 2 Sep 2021 17:00:28 +0200 (CEST) From: Stefan Roese To: u-boot@lists.denx.de Cc: sjg@chromium.org, trini@konsulko.com, Wolfgang Denk , Rasmus Villemoes Subject: [PATCH v6 0/3] arm64: Add optimized memset/memcpy/memove functions Date: Thu, 2 Sep 2021 17:00:16 +0200 Message-Id: <20210902150019.1349263-1-sr@denx.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 8032B26B X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.34 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.2 at phobos.denx.de X-Virus-Status: Clean On an NXP LX2160 based platform it has been noticed, that the currently implemented memset/memcpy functions for aarch64 are suboptimal. Especially the memset() for clearing the NXP MC firmware memory is very expensive (time-wise). This patchset now adds the optimized functions ported from this repository: https://github.com/ARM-software/optimized-routines As the optimized memset function make use of the dc opcode, which needs the caches to be enabled, an additional check is added and a simple memset version is used in this case. Please note that checkpatch.pl complains about some issue with this imported file: arch/arm/lib/asmdefs.h Since it's imported I did explicitly not make any changes here, to make potential future sync'ing easer. Here some numbers to see the speed improments: Current original version: ------------------------- memset() 32 Bytes, 16M times: time: 0.446 seconds memset() 16MiB, 256 times: time: 1.076 seconds memcpy() 512MiB: time: 0.224 seconds New optimized version: ---------------------- memset() 32 Bytes, 16M times: time: 0.287 seconds memset() 16MiB, 256 times: time: 0.292 seconds memcpy() 512MiB: time: 0.222 seconds Summary: The optimized memcpy is nearly identical to the original one. But the optimized memset is much faster, for small and big sizes. Small sizes factor ~1.6 and big sizes factor ~3.7. Note: These measurements were done on the NXP LX2160ARDB board. Thanks, Stefan Changes in v6: - Add GCC version check >= 9.4 on ARM64, as earlier GCC versions throw errors with this new code Changes in v5: - memmove is now auto-selected (or deselected) with the memcpy Kconfig selection as it's entry is the same as memcpy for ARM64 Changes in v4: - Use macros instead of register names, following the optimized code - Add zero size check Changes in v3: - Add memmove alias, as this function also handles it optimized - Add memmove as well Changes in v2: - Add file names and locations and git commit ID from imported files to the commit message - New patch Stefan Roese (3): arm64: arch/arm/lib: Add optimized memset/memcpy/memmove functions arm64: memset-arm64: Use simple memset when cache is disabled arm64: Kconfig: Enable usage of optimized memset/memcpy/memmove arch/arm/Kconfig | 39 +++++- arch/arm/include/asm/string.h | 4 + arch/arm/lib/Makefile | 5 + arch/arm/lib/asmdefs.h | 98 ++++++++++++++ arch/arm/lib/memcpy-arm64.S | 242 ++++++++++++++++++++++++++++++++++ arch/arm/lib/memset-arm64.S | 148 +++++++++++++++++++++ 6 files changed, 530 insertions(+), 6 deletions(-) create mode 100644 arch/arm/lib/asmdefs.h create mode 100644 arch/arm/lib/memcpy-arm64.S create mode 100644 arch/arm/lib/memset-arm64.S -- 2.33.0