From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55B68C4707F for ; Tue, 25 May 2021 14:34:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 398DD61417 for ; Tue, 25 May 2021 14:34:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234037AbhEYOgV (ORCPT ); Tue, 25 May 2021 10:36:21 -0400 Received: from mail-eopbgr110112.outbound.protection.outlook.com ([40.107.11.112]:35893 "EHLO GBR01-CWL-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S234024AbhEYOgH (ORCPT ); Tue, 25 May 2021 10:36:07 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nyBVdBDkAhaodh3ZuceRHTni9Y4BCpXIRsanlz0p+Ss0D6cfpj8Pqw81OfW8d2PvPXflF/UdnJnW/WN+teEkhQcL6aOEtls4JWO9yEj3BZDcV8HUGdlSg7RAdy239fNa4sIHkZddLt+sYbqnOP2nwT8Nvuo4Z1qVlnqTVty4dx2a33/QaGWChXIRNaTXAXgO2lVunvDLnDnYtEZg5gcxgLsMws7/cEj/xe82M34YyiuCUmD8KdMHLzZEKvXaXxnS06WY7zdR4x8r7gQ4B20dMPikTFLD9HXIFiPQGJNo6xbCsabNYe7c7Ds/UCwzKnuvvQtaUPXxVly/+s5TkMuUKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xMHhm/pjsF4ufDo9X5m3enGmxTU4/l4yWsActk8eq6E=; b=XkmIj+HRQlZ/0WyAKYbxyOKThsh4foT+u8HkWnETqgWI4NLFDv5z4GiVo3GLvIjm0ZKwiOTTqH1PW4FPj6YaKQhZ72Nsp/H5abV+flBQeMrgvZi/WW+JKZdeItR60fYGNS6g2SCHTJtsUgdndnqxRXdSZIw7yzQ4iWe/D32GARH4wtI/I896r9Bc3iNkgO2faegR4/+WAkCVJhHmOL1wZaPeibMLisHZ+35iv/ZCpEYxGxHHfeEmtZhLr3dHB9Ln1WSBmDAnMr4044H+l8rr5Mo53gQZV8YGUYwre2dWLIAz66gNJExnqDM8gjIDT+zFenwkRegPi1a+6dn3TjDBpQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=garyguo.net; dmarc=pass action=none header.from=garyguo.net; dkim=pass header.d=garyguo.net; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=garyguo.net; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xMHhm/pjsF4ufDo9X5m3enGmxTU4/l4yWsActk8eq6E=; b=unwzMyrkOY5e/I1QkmMLDBRk4kSOImt5+ncLHSzYQnZ3nzQjqcuh2tgVqmtKo8tNdc1qYgiOy53ri46W0h0yXp9+X38glNKMptO2t3LTQ/JBfYy9x8I6z6dayUreKnO9kiLkduR9eyAKVPlqAcSOXZqgMQm/OIc0tydBOdly6Fc= Authentication-Results: ACULAB.COM; dkim=none (message not signed) header.d=none;ACULAB.COM; dmarc=none action=none header.from=garyguo.net; Received: from LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:16::23) by LO0P265MB3417.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:186::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23; Tue, 25 May 2021 14:34:36 +0000 Received: from LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM ([fe80::944f:5a46:312d:8099]) by LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM ([fe80::944f:5a46:312d:8099%3]) with mapi id 15.20.4150.027; Tue, 25 May 2021 14:34:36 +0000 Date: Tue, 25 May 2021 15:34:32 +0100 From: Gary Guo To: David Laight Cc: 'Palmer Dabbelt' , Paul Walmsley , "aou@eecs.berkeley.edu" , "nickhu@andestech.com" , "nylon7@andestech.com" , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] riscv: fix memmove and optimise memcpy when misalign Message-ID: <20210525153431.0000508d@garyguo.net> In-Reply-To: <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com> References: <20210522232256.00003f08@garyguo.net> <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; i686-w64-mingw32) Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Originating-IP: [2001:470:6972:501:81f9:56c7:d6ec:ac27] X-ClientProxiedBy: LNXP265CA0092.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:76::32) To LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:16::23) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from localhost (2001:470:6972:501:81f9:56c7:d6ec:ac27) by LNXP265CA0092.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:76::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23 via Frontend Transport; Tue, 25 May 2021 14:34:35 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1d24965c-1926-44d2-cee4-08d91f8a382e X-MS-TrafficTypeDiagnostic: LO0P265MB3417: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cEnmAqo8clrzBWDeZjvQgdmOJ/0VqU4Q70hQmAucNUEjK57SuSUHmmH1cdajVaRuULD8ne3Gbt0VLEFNATqpLmVeeqRpHxl4CvP4YxmuDuSMIe1cRetAE0QsO6xJP/KjR+BnfBiwfnsTbufpgRvC8tTJeuecw5Oz7MUEXOM0afIbZlM5HSh48CBbVdGd8TmIHUNzl+BQpWSYgFG95IBVWy+3QPdmuqEpdJYeec+lc/QFfpU0ufx6RGyJ/h+yKVoZwO1dxXJ0oPy4Y99eGQmp3p5CEIF1OvhlERmdF7LkgNBgxz/+iW8E9Q3Shn3avhozEC7or4LcNZ8s5weWur9JMfhPuKzecTmeRRoRB6ey28dkSXGGXw6wCTMXN1/JH3IUtMdJQNCN+idvZemOBgMmXQFu5+hip8eRfXjg3BBlRKujnNeIdgiNQwUKVt4LETtqWAXZTM6v3n8waP/fCAmGxj1P3L8PjExRw07e2drNsdvXTsQfYoo5q8i3neDxPS+ldRC6UFNpVgjU9wy2heKJhX9NaD7G3gjcZukmsE85kmkiDFuQsPn+tEOeZz5EVJCpiaY/vO7HC34iILeh5RxffbeeUPIVPOdOPnLVwKs6tbc= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(39830400003)(136003)(376002)(346002)(396003)(366004)(8676002)(6496006)(66556008)(478600001)(1076003)(66476007)(52116002)(5660300002)(66946007)(16526019)(38100700002)(2616005)(36756003)(2906002)(83380400001)(4326008)(54906003)(186003)(6666004)(8936002)(86362001)(316002)(6916009)(6486002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData: =?us-ascii?Q?FgAckPYeM1JC2i8zPbWljq93M7hvC/Q7b4jmYal0lk97nmDgZ8M3MrHcMt56?= =?us-ascii?Q?BIIVL+Bud59xRK6c+IiZ/LcyVyJyFMlM0P+UnxuHn4WBdcSSTQKzoGb6bQfn?= =?us-ascii?Q?Zgeippz9EpaArsbZdw2qO8bjjGQHTQ/0adKPODe/O8/ABIJ27310e/jbS4/Q?= =?us-ascii?Q?AL/u5WRB5A/LkgJg2qkGOZ8t8dnhxTxfHoDWqRdtflyj8WKIbjksPiowmajB?= =?us-ascii?Q?p2y3Hr80i7v4YuilXj6r7m5O3+u1neAAIXeug54lq9Mu2P1pyarcdxkyWtB0?= =?us-ascii?Q?V/GSQdEli9/mNKIA4IaTPSirKZXKyNQdfiIRWFWYexdaQTbpoZ8ubQGnCSHb?= =?us-ascii?Q?iyRl9bkl+AJuQdqWiTWb5gEfRDUbS5pTNEoBVMCGOiwjiu57AJPBfISaE6P/?= =?us-ascii?Q?6g5108q4IbgtVmE/5mDHi1tptc2ypueUQXvWszlvMI6hFz6Vi5mVUM0ui+s0?= =?us-ascii?Q?ZJrWAf/JeiyCNEfeoqjaAhL+6Ml5Zmurt66xR6xqXQD4IqhUAvuaxItut8ow?= =?us-ascii?Q?WOjXEs9dS+AntF7IoY0WyrZpwSo57wRkhdwb82APc+L8HDuk8g92VqD6Gl2I?= =?us-ascii?Q?Byzz9D8EGah2+ulsKb2JI6GjOUYMl0Abt2U+ZAHCoaFOTSPQW7A4NNcrzH3Q?= =?us-ascii?Q?2CNi3KfiLaj9qx7jAMK/6IuRyTLIGWxNr6SiYKhh9r5uXnsigp2ZAFtbBPkA?= =?us-ascii?Q?gSiv6pbsj0mKrSlU2rkY+S2H2wpyi0Uu9hu+BlIO9bWWtHleTLEEEDE1FJ5b?= =?us-ascii?Q?7rl+eb6nR43ZSsD9t8AgEdTdKwb08x++kT6EL3pe/PJKkSK1+lKf/IOCoE8O?= =?us-ascii?Q?ZtU6lbuCkCkrR/Gu+ffsFQy3mC/LYrYhPGh/pgVNFOuqFP28Q4fkQ+qSmr2h?= =?us-ascii?Q?mVG0bwtyPlGJpnTNcSSwms3YV+m0kascPgdrTWZkhlvvdPJD4Sd3RIHeSdnw?= =?us-ascii?Q?nnyNhOXo0QUx2YbRoeNz6na7jUmNCwnH1l9cwBpZLzjix2NaT5HrEkX8YC2D?= =?us-ascii?Q?kBkvFMaYPkSLBvfqphknz4G3m2/RdHHC8EWPXmAID0eAxKW086ZvFt95LJSX?= =?us-ascii?Q?qcRAGst2fe3PtM1Z1kdGoXzYnvegH9IxmknvlTVvbWdDDMKouGBTKeGhNdyd?= =?us-ascii?Q?eRZaKPNtz0mSQ4ia+NRmVsXvwUTVIIUXunYxMbkHHWiwSJVlIQkrcBtrRoq+?= =?us-ascii?Q?w9hsHx1Hu8uYgVosUMTgs8/Gqgdpm9v4xdnXR6YhnVkf12rgFtfEziF3NcXj?= =?us-ascii?Q?Vkzffn2pW1Rpu7Yyxk32lmtJrh5fCaEMLPbbRw6ymWCxP/P/+4tIU/2rkZ9h?= =?us-ascii?Q?TiLSNqENYfDqm5qP30mwbZksrIhkgb91oOdIEbDs/XJzlDAB7QPq4WwdX4fR?= =?us-ascii?Q?Xpr1wr7zdBjgXGxXlNnVSlXJHe19?= X-OriginatorOrg: garyguo.net X-MS-Exchange-CrossTenant-Network-Message-Id: 1d24965c-1926-44d2-cee4-08d91f8a382e X-MS-Exchange-CrossTenant-AuthSource: LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 May 2021 14:34:35.9704 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: bbc898ad-b10f-4e10-8552-d9377b823d45 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: X69Dq6DMorI8x7N5e43wZWx2DyMd84MpQ/6cLeg8Rgy1SLN9mOzHcmRF4v8bfUFAJ4zIiV4/BD/IUjyzSHSkQQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LO0P265MB3417 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 23 May 2021 17:12:23 +0000 David Laight wrote: > From: Palmer Dabbelt > > Sent: 23 May 2021 02:47 > ... > > IMO the right way to go here is to just move to C-based string > > routines, at least until we get to the point where we're seriously > > optimizing for specific processors. We went with the C-based > > string rountines in glibc as part of the upstreaming process and > > found only some small performance differences when compared to the > > hand-written assembly, and they're way easier to maintain. I prefer C versions as well, and actually before commit 04091d6 we are indeed using the generic C version. The issue is that 04091d6 introduces an assembly version that's very broken. It does not offer and performance improvement to the C version, and breaks all processors without hardware misalignment support (yes, firmware is expected to trap and handle these, but they are painfully slow). I noticed the issue because I ran Linux on my own firmware and found that kernel couldn't boot. I didn't implement misalignment emulation at that time (and just send the trap to the supervisor). Because 04091d6 is accepted, my assumption is that we need an assembly version. So I spent some time writing, testing and optimising the assembly. > > > > IIRC Linux only has trivial C string routines in lib, I think the > > best way to go about that would be to higher performance versions > > in there. That will allow other ports to use them. > > I certainly wonder how much benefit these massively unrolled > loops have on modern superscaler processors - especially those > with any form of 'out of order' execution. > > It is often easy to write assembler where all the loop > control instructions happen in parallel with the memory > accesses - which cannot be avoided. > Loop unrolling is so 1970s. > > Sometimes you need to unroll once. > And maybe interleave the loads and stores. > But after that you can just be trashing the i-cache. I didn't introduce the loop unrolling though. The loop unrolled assembly is there before this patch, and I didn't even change the unroll factor. I only added a path to handle misaligned case. There are a lot of diffs because I did made some changes to the register allocation so that the code is more optimal. I also made a few cleanups and added a few comments. It might be easier to review if you apply the patch locally and just look at the file. - Gary From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23C56C2B9F8 for ; Tue, 25 May 2021 14:35:06 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CBC3C61417 for ; Tue, 25 May 2021 14:35:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBC3C61417 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=garyguo.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=peStAjvJUZ4yzRBHOdMXaiUise+jJ5gqHB+ejBFNVLE=; b=4Q1h48FOgu6PS+ VHdYEJHrzLWtb1ls6+7p5lJucVdFTEAX98d4TofVbjnesV5foSAlmgEve4eD3l7FfmnFhTFSxGJXu Ai6cLktWerFY3jpvs8kCPsKfql3S38iCH0CEbr0T6IFDAecjzSaTi7OLn8/EqQlY66/DcgC8L4kpy 1Cg+lInWue4UqiaJC6X3iErqZ2MzAXOTE72YbfwCkDCZEs3PBH/41NZBcL7yRcIAX9bErJHk9zL5q BS6PbhzAh42DL3eVxlKd2h41JsLfBRF9VttlIZD0pKhR08qNiSfVZvvzDmTAO/YfoL0Voq6RRXj8a s4r2yO9xOh3YfU95QTUQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1llY8q-005faV-D7; Tue, 25 May 2021 14:34:44 +0000 Received: from mail-eopbgr110133.outbound.protection.outlook.com ([40.107.11.133] helo=GBR01-CWL-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1llY8n-005fY3-7S for linux-riscv@lists.infradead.org; Tue, 25 May 2021 14:34:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nyBVdBDkAhaodh3ZuceRHTni9Y4BCpXIRsanlz0p+Ss0D6cfpj8Pqw81OfW8d2PvPXflF/UdnJnW/WN+teEkhQcL6aOEtls4JWO9yEj3BZDcV8HUGdlSg7RAdy239fNa4sIHkZddLt+sYbqnOP2nwT8Nvuo4Z1qVlnqTVty4dx2a33/QaGWChXIRNaTXAXgO2lVunvDLnDnYtEZg5gcxgLsMws7/cEj/xe82M34YyiuCUmD8KdMHLzZEKvXaXxnS06WY7zdR4x8r7gQ4B20dMPikTFLD9HXIFiPQGJNo6xbCsabNYe7c7Ds/UCwzKnuvvQtaUPXxVly/+s5TkMuUKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xMHhm/pjsF4ufDo9X5m3enGmxTU4/l4yWsActk8eq6E=; b=XkmIj+HRQlZ/0WyAKYbxyOKThsh4foT+u8HkWnETqgWI4NLFDv5z4GiVo3GLvIjm0ZKwiOTTqH1PW4FPj6YaKQhZ72Nsp/H5abV+flBQeMrgvZi/WW+JKZdeItR60fYGNS6g2SCHTJtsUgdndnqxRXdSZIw7yzQ4iWe/D32GARH4wtI/I896r9Bc3iNkgO2faegR4/+WAkCVJhHmOL1wZaPeibMLisHZ+35iv/ZCpEYxGxHHfeEmtZhLr3dHB9Ln1WSBmDAnMr4044H+l8rr5Mo53gQZV8YGUYwre2dWLIAz66gNJExnqDM8gjIDT+zFenwkRegPi1a+6dn3TjDBpQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=garyguo.net; dmarc=pass action=none header.from=garyguo.net; dkim=pass header.d=garyguo.net; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=garyguo.net; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xMHhm/pjsF4ufDo9X5m3enGmxTU4/l4yWsActk8eq6E=; b=unwzMyrkOY5e/I1QkmMLDBRk4kSOImt5+ncLHSzYQnZ3nzQjqcuh2tgVqmtKo8tNdc1qYgiOy53ri46W0h0yXp9+X38glNKMptO2t3LTQ/JBfYy9x8I6z6dayUreKnO9kiLkduR9eyAKVPlqAcSOXZqgMQm/OIc0tydBOdly6Fc= Authentication-Results: ACULAB.COM; dkim=none (message not signed) header.d=none;ACULAB.COM; dmarc=none action=none header.from=garyguo.net; Received: from LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:16::23) by LO0P265MB3417.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:186::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23; Tue, 25 May 2021 14:34:36 +0000 Received: from LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM ([fe80::944f:5a46:312d:8099]) by LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM ([fe80::944f:5a46:312d:8099%3]) with mapi id 15.20.4150.027; Tue, 25 May 2021 14:34:36 +0000 Date: Tue, 25 May 2021 15:34:32 +0100 From: Gary Guo To: David Laight Cc: 'Palmer Dabbelt' , Paul Walmsley , "aou@eecs.berkeley.edu" , "nickhu@andestech.com" , "nylon7@andestech.com" , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] riscv: fix memmove and optimise memcpy when misalign Message-ID: <20210525153431.0000508d@garyguo.net> In-Reply-To: <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com> References: <20210522232256.00003f08@garyguo.net> <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; i686-w64-mingw32) X-Originating-IP: [2001:470:6972:501:81f9:56c7:d6ec:ac27] X-ClientProxiedBy: LNXP265CA0092.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:76::32) To LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:16::23) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from localhost (2001:470:6972:501:81f9:56c7:d6ec:ac27) by LNXP265CA0092.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:76::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23 via Frontend Transport; Tue, 25 May 2021 14:34:35 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1d24965c-1926-44d2-cee4-08d91f8a382e X-MS-TrafficTypeDiagnostic: LO0P265MB3417: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cEnmAqo8clrzBWDeZjvQgdmOJ/0VqU4Q70hQmAucNUEjK57SuSUHmmH1cdajVaRuULD8ne3Gbt0VLEFNATqpLmVeeqRpHxl4CvP4YxmuDuSMIe1cRetAE0QsO6xJP/KjR+BnfBiwfnsTbufpgRvC8tTJeuecw5Oz7MUEXOM0afIbZlM5HSh48CBbVdGd8TmIHUNzl+BQpWSYgFG95IBVWy+3QPdmuqEpdJYeec+lc/QFfpU0ufx6RGyJ/h+yKVoZwO1dxXJ0oPy4Y99eGQmp3p5CEIF1OvhlERmdF7LkgNBgxz/+iW8E9Q3Shn3avhozEC7or4LcNZ8s5weWur9JMfhPuKzecTmeRRoRB6ey28dkSXGGXw6wCTMXN1/JH3IUtMdJQNCN+idvZemOBgMmXQFu5+hip8eRfXjg3BBlRKujnNeIdgiNQwUKVt4LETtqWAXZTM6v3n8waP/fCAmGxj1P3L8PjExRw07e2drNsdvXTsQfYoo5q8i3neDxPS+ldRC6UFNpVgjU9wy2heKJhX9NaD7G3gjcZukmsE85kmkiDFuQsPn+tEOeZz5EVJCpiaY/vO7HC34iILeh5RxffbeeUPIVPOdOPnLVwKs6tbc= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(39830400003)(136003)(376002)(346002)(396003)(366004)(8676002)(6496006)(66556008)(478600001)(1076003)(66476007)(52116002)(5660300002)(66946007)(16526019)(38100700002)(2616005)(36756003)(2906002)(83380400001)(4326008)(54906003)(186003)(6666004)(8936002)(86362001)(316002)(6916009)(6486002); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData: =?us-ascii?Q?FgAckPYeM1JC2i8zPbWljq93M7hvC/Q7b4jmYal0lk97nmDgZ8M3MrHcMt56?= =?us-ascii?Q?BIIVL+Bud59xRK6c+IiZ/LcyVyJyFMlM0P+UnxuHn4WBdcSSTQKzoGb6bQfn?= =?us-ascii?Q?Zgeippz9EpaArsbZdw2qO8bjjGQHTQ/0adKPODe/O8/ABIJ27310e/jbS4/Q?= =?us-ascii?Q?AL/u5WRB5A/LkgJg2qkGOZ8t8dnhxTxfHoDWqRdtflyj8WKIbjksPiowmajB?= =?us-ascii?Q?p2y3Hr80i7v4YuilXj6r7m5O3+u1neAAIXeug54lq9Mu2P1pyarcdxkyWtB0?= =?us-ascii?Q?V/GSQdEli9/mNKIA4IaTPSirKZXKyNQdfiIRWFWYexdaQTbpoZ8ubQGnCSHb?= =?us-ascii?Q?iyRl9bkl+AJuQdqWiTWb5gEfRDUbS5pTNEoBVMCGOiwjiu57AJPBfISaE6P/?= =?us-ascii?Q?6g5108q4IbgtVmE/5mDHi1tptc2ypueUQXvWszlvMI6hFz6Vi5mVUM0ui+s0?= =?us-ascii?Q?ZJrWAf/JeiyCNEfeoqjaAhL+6Ml5Zmurt66xR6xqXQD4IqhUAvuaxItut8ow?= =?us-ascii?Q?WOjXEs9dS+AntF7IoY0WyrZpwSo57wRkhdwb82APc+L8HDuk8g92VqD6Gl2I?= =?us-ascii?Q?Byzz9D8EGah2+ulsKb2JI6GjOUYMl0Abt2U+ZAHCoaFOTSPQW7A4NNcrzH3Q?= =?us-ascii?Q?2CNi3KfiLaj9qx7jAMK/6IuRyTLIGWxNr6SiYKhh9r5uXnsigp2ZAFtbBPkA?= =?us-ascii?Q?gSiv6pbsj0mKrSlU2rkY+S2H2wpyi0Uu9hu+BlIO9bWWtHleTLEEEDE1FJ5b?= =?us-ascii?Q?7rl+eb6nR43ZSsD9t8AgEdTdKwb08x++kT6EL3pe/PJKkSK1+lKf/IOCoE8O?= =?us-ascii?Q?ZtU6lbuCkCkrR/Gu+ffsFQy3mC/LYrYhPGh/pgVNFOuqFP28Q4fkQ+qSmr2h?= =?us-ascii?Q?mVG0bwtyPlGJpnTNcSSwms3YV+m0kascPgdrTWZkhlvvdPJD4Sd3RIHeSdnw?= =?us-ascii?Q?nnyNhOXo0QUx2YbRoeNz6na7jUmNCwnH1l9cwBpZLzjix2NaT5HrEkX8YC2D?= =?us-ascii?Q?kBkvFMaYPkSLBvfqphknz4G3m2/RdHHC8EWPXmAID0eAxKW086ZvFt95LJSX?= =?us-ascii?Q?qcRAGst2fe3PtM1Z1kdGoXzYnvegH9IxmknvlTVvbWdDDMKouGBTKeGhNdyd?= =?us-ascii?Q?eRZaKPNtz0mSQ4ia+NRmVsXvwUTVIIUXunYxMbkHHWiwSJVlIQkrcBtrRoq+?= =?us-ascii?Q?w9hsHx1Hu8uYgVosUMTgs8/Gqgdpm9v4xdnXR6YhnVkf12rgFtfEziF3NcXj?= =?us-ascii?Q?Vkzffn2pW1Rpu7Yyxk32lmtJrh5fCaEMLPbbRw6ymWCxP/P/+4tIU/2rkZ9h?= =?us-ascii?Q?TiLSNqENYfDqm5qP30mwbZksrIhkgb91oOdIEbDs/XJzlDAB7QPq4WwdX4fR?= =?us-ascii?Q?Xpr1wr7zdBjgXGxXlNnVSlXJHe19?= X-OriginatorOrg: garyguo.net X-MS-Exchange-CrossTenant-Network-Message-Id: 1d24965c-1926-44d2-cee4-08d91f8a382e X-MS-Exchange-CrossTenant-AuthSource: LNXP265MB0746.GBRP265.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 May 2021 14:34:35.9704 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: bbc898ad-b10f-4e10-8552-d9377b823d45 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: X69Dq6DMorI8x7N5e43wZWx2DyMd84MpQ/6cLeg8Rgy1SLN9mOzHcmRF4v8bfUFAJ4zIiV4/BD/IUjyzSHSkQQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LO0P265MB3417 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210525_073441_364927_8DC7AE44 X-CRM114-Status: GOOD ( 24.66 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Sun, 23 May 2021 17:12:23 +0000 David Laight wrote: > From: Palmer Dabbelt > > Sent: 23 May 2021 02:47 > ... > > IMO the right way to go here is to just move to C-based string > > routines, at least until we get to the point where we're seriously > > optimizing for specific processors. We went with the C-based > > string rountines in glibc as part of the upstreaming process and > > found only some small performance differences when compared to the > > hand-written assembly, and they're way easier to maintain. I prefer C versions as well, and actually before commit 04091d6 we are indeed using the generic C version. The issue is that 04091d6 introduces an assembly version that's very broken. It does not offer and performance improvement to the C version, and breaks all processors without hardware misalignment support (yes, firmware is expected to trap and handle these, but they are painfully slow). I noticed the issue because I ran Linux on my own firmware and found that kernel couldn't boot. I didn't implement misalignment emulation at that time (and just send the trap to the supervisor). Because 04091d6 is accepted, my assumption is that we need an assembly version. So I spent some time writing, testing and optimising the assembly. > > > > IIRC Linux only has trivial C string routines in lib, I think the > > best way to go about that would be to higher performance versions > > in there. That will allow other ports to use them. > > I certainly wonder how much benefit these massively unrolled > loops have on modern superscaler processors - especially those > with any form of 'out of order' execution. > > It is often easy to write assembler where all the loop > control instructions happen in parallel with the memory > accesses - which cannot be avoided. > Loop unrolling is so 1970s. > > Sometimes you need to unroll once. > And maybe interleave the loads and stores. > But after that you can just be trashing the i-cache. I didn't introduce the loop unrolling though. The loop unrolled assembly is there before this patch, and I didn't even change the unroll factor. I only added a path to handle misaligned case. There are a lot of diffs because I did made some changes to the register allocation so that the code is more optimal. I also made a few cleanups and added a few comments. It might be easier to review if you apply the patch locally and just look at the file. - Gary _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv