From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5D3AC07E94 for ; Fri, 4 Jun 2021 09:53:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9232261400 for ; Fri, 4 Jun 2021 09:53:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229982AbhFDJzc (ORCPT ); Fri, 4 Jun 2021 05:55:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbhFDJzb (ORCPT ); Fri, 4 Jun 2021 05:55:31 -0400 Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B13E7C06174A for ; Fri, 4 Jun 2021 02:53:45 -0700 (PDT) Received: by mail-qt1-x833.google.com with SMTP id a15so6590003qta.0 for ; Fri, 04 Jun 2021 02:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=FlfOt5ZHd9jswm+VbsyTDC9ioUME0yS40PkkEczf0Y0=; b=Je2OMatfku272SCys5wuDeevnAQuoLeZkgdAbEr/5K4pXTiED5RWjXXFR6ASBJkcga 4RfBcqtAd0IEbQ0zMUoqZrkmh//WGaFOMBQ69RV54tQfjUQ2j7cYRWC2ty4gvYeARXNM zOFzgoQG7/yDFnrzF1kCUkEX42wG+jzUP752BWL+iisTJItJC/DFS5OD8Un3gsKexafi 6MFnmuN9Yie17X2YSDJqir774ETG2ESiWC7iDaZa5C/b7KGdcQLhkfjgol4vJQgDSeXw hsiEnsAZihm4EUozsqqAo/CY41FQHSd1FraBTNt8/kTl4L4+alMm2Aci7LVS+YZFPtAA o5pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=FlfOt5ZHd9jswm+VbsyTDC9ioUME0yS40PkkEczf0Y0=; b=by72XsVBSuf74TKkMG7qTadywHpChbgNEA/TC79TWjtrevnBraiSCLcdPLXfR/AoVC M+lleXPIESoS/w7t2723+B9/MB6+Al0sAqdRecIbPyNcfgtaC3xBKhtPCCEHCoB3Ayrn Q/rwDEWN5NRBMvcqh1fLHHA6H2V2uYJlTLeWDskuNZ7/G7T4PKdK6JjK78mu4gDi1zBx YPgSeCrOg/8oYaE7YVlkqFk8khwjLKp8Ccfl9UIkjwl1seOHoaoYBZj1gkWYacSYzaMJ 9R81wM54xgmCJ1fVlTAgL2rrlAKbDbdfWzl5tLSxT3gQ12TSVl7C1n4FpO5gmChUmc2Y JOlw== X-Gm-Message-State: AOAM5339ePn+CuUOFbYQAnfoVMaQQzZqzXaGDLThDjyyPvOzFdFkQ5F2 D34cm5DFx222Ipno9a3FS2DQo3HJPIain+e9j4c= X-Google-Smtp-Source: ABdhPJwVwznMNJLi7K3cb0oL7/O0blrCBHqQbUVLtyL8W6808bqa4dS7SJcbZZti9fTrimhgectolbRglfaKv4KR5Ww= X-Received: by 2002:ac8:5313:: with SMTP id t19mr3898290qtn.190.1622800424625; Fri, 04 Jun 2021 02:53:44 -0700 (PDT) MIME-Version: 1.0 From: Akira Tsukamoto Date: Fri, 4 Jun 2021 18:53:33 +0900 Message-ID: Subject: [PATCH 0/1] riscv: better network performance with memcpy, uaccess To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Gary Guo , Nick Hu , Nylon Chen , Akira Tsukamoto , linux-riscv@lists.infradead.org, Linux kernel mailing list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am adding a cover letter to explain the history and details since improvement is a combination with Gary's memcpy patch [1]. Comparison of iperf3 benchmark results by applying Gary's memcpy patch and my uaccess optimization patch. All results are from the same base kernel, same rootfs and save BeagleV beta board. First left column : beaglev 5.13.rc4 kernel [2] Second column : Added Palmer's memcpy in C + my uaccess patch [3] Third column : Added Gary's memcpy + my uaccess patch [4] --- TCP recv --- 686 Mbits/sec | 700 Mbits/sec | 904 Mbits/sec 683 Mbits/sec | 701 Mbits/sec | 898 Mbits/sec 695 Mbits/sec | 702 Mbits/sec | 905 Mbits/sec --- TCP send --- 383 Mbits/sec | 390 Mbits/sec | 393 Mbits/sec 384 Mbits/sec | 393 Mbits/sec | 392 Mbits/sec --- UDP send --- 307 Mbits/sec | 358 Mbits/sec | 402 Mbits/sec 307 Mbits/sec | 359 Mbits/sec | 402 Mbits/sec --- UDP recv --- 630 Mbits/sec | 799 Mbits/sec | 875 Mbits/sec 730 Mbits/sec | 796 Mbits/sec | 873 Mbits/sec The uaccess patch is reducing pipeline stall of read after write (RAW) by unroling load and store. The main reason for using assembler inside uaccess.S is because the __asm_to/copy_from_user() handling page fault must be done manually inside the functions. The above result is combination from Gary $B!G (Bs memcpy speeding up by reducing the S-mode and M-mode switching and my uaccess reducing pipeline stall for user space uses syscall with large data. We had a discussion of improving network performance on the BeagleV beta board with Palmer. Palmer suggested to use C-based string routines, which checks the unaligned address and use 8 bytes aligned copy if the both src and dest are aligned and if not use the current copy function. The Gary's assembly version of memcpy is improving by not using unaligned access in 64 bit boundary, uses shifting it after reading with offset of aligned access, because every misaligned access is trapped and switches to opensbi in M-mode. The main speed up is coming from avoiding S-mode (kernel) and M-mode (opensbi) switching. Processing network packets require a lot of unaligned access for the packet header, which is not able to change the design of the header format to be aligned. And user applications pass large packet data with send/recf() and sendto/ recvfrom() to repeat less function calls for reading and writing data for the optimization. Akira [1] https://lkml.org/lkml/2021/2/16/778 [2] https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved [3] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-palmer-string [4] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-gary Akira Tsukamoto (1): riscv: prevent pipeline stall in __asm_to/copy_from_user arch/riscv/lib/uaccess.S | 106 +++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 33 deletions(-) -- 2.17.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 437D8C07E94 for ; Fri, 4 Jun 2021 09:55:10 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 09CBB61405 for ; Fri, 4 Jun 2021 09:55:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 09CBB61405 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From: MIME-Version:Reply-To:Cc:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=2VLd+nv9XxApC2uHnoV6eyQm4GrYg2p+ojmOq93AktU=; b=IZt G74GTpu6W6Lkofq+6BCoFFNWR6wQzhT868bTvcECHKpBq56oB+RTsVMOhi3pb+k9qfZlVzvikhLCq o1+IqIMhIoLiLLHdX7BgurapVIDTPliN3/qFYqljdrh5prLvftjgwe9belMd1qPAfP/lfu6GnSd1Z EUzjhfxjQyq7Jvm5RBxpOF9IbdpMc+zyL1xrE/gpzEi7BKg9QA/wT3l1raD1pxKBf/+DRT65IK161 xgse9tY2SsYKVu3RmFx7AfYVbS6U7lzzZI8Hqo4xy450G/LIGxVyl3/LAQS/VEaIy8DMty5Yfsn+0 6X0Age4yV+eZR+Q2tSdUO+K92HMD8bA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1lp6XV-00ClNx-SO; Fri, 04 Jun 2021 09:54:53 +0000 Received: from mail-qt1-f172.google.com ([209.85.160.172]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1lp6XR-00ClMZ-Cd for linux-riscv@lists.infradead.org; Fri, 04 Jun 2021 09:54:53 +0000 Received: by mail-qt1-f172.google.com with SMTP id t20so6534995qtx.8 for ; Fri, 04 Jun 2021 02:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=FlfOt5ZHd9jswm+VbsyTDC9ioUME0yS40PkkEczf0Y0=; b=Je2OMatfku272SCys5wuDeevnAQuoLeZkgdAbEr/5K4pXTiED5RWjXXFR6ASBJkcga 4RfBcqtAd0IEbQ0zMUoqZrkmh//WGaFOMBQ69RV54tQfjUQ2j7cYRWC2ty4gvYeARXNM zOFzgoQG7/yDFnrzF1kCUkEX42wG+jzUP752BWL+iisTJItJC/DFS5OD8Un3gsKexafi 6MFnmuN9Yie17X2YSDJqir774ETG2ESiWC7iDaZa5C/b7KGdcQLhkfjgol4vJQgDSeXw hsiEnsAZihm4EUozsqqAo/CY41FQHSd1FraBTNt8/kTl4L4+alMm2Aci7LVS+YZFPtAA o5pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=FlfOt5ZHd9jswm+VbsyTDC9ioUME0yS40PkkEczf0Y0=; b=OoABhEBulhKYQAdxKGa2wDHDUqK3taAlLlon0AFBYkR9DVUFHQgKmasyM/AqnWQ1Tw dXI7zWWwkkyOa94LLX5gQoZRkbMaDP875Wj5VbP1LtDzwe02FErYGJ/Ue3SyZnbXMNih 0p7CUkJDWGaSSyb+ePr7mHg9u26GbCoAGu0hkmQZy/alQ3ZTuoSlR+K1ADobtyrSZ8N3 fswxGuVmqitSfvvNQzo+wofMlLktfBA4mInl8YrYUx6fHb0H+mjzZRfC+hhMyYjJ4eGo dWmChEAXovqgTeObVJhI6K6jdarL0pcXGz8QjgLd6FyeSGnrflLynztSJuVyAupFDodT URng== X-Gm-Message-State: AOAM530Foc6S5IBD3m0iBG9hWGrQwxAzJZbK7BUCaP7a9tzQ9w9iDaT4 q4BDX/ZZxoiswqyXC16d35Ilnm1zL01uGOOA9gA= X-Google-Smtp-Source: ABdhPJwVwznMNJLi7K3cb0oL7/O0blrCBHqQbUVLtyL8W6808bqa4dS7SJcbZZti9fTrimhgectolbRglfaKv4KR5Ww= X-Received: by 2002:ac8:5313:: with SMTP id t19mr3898290qtn.190.1622800424625; Fri, 04 Jun 2021 02:53:44 -0700 (PDT) MIME-Version: 1.0 From: Akira Tsukamoto Date: Fri, 4 Jun 2021 18:53:33 +0900 Message-ID: Subject: [PATCH 0/1] riscv: better network performance with memcpy, uaccess To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Gary Guo , Nick Hu , Nylon Chen , Akira Tsukamoto , linux-riscv@lists.infradead.org, Linux kernel mailing list X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210604_025449_458479_F263945F X-CRM114-Status: GOOD ( 10.75 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org I am adding a cover letter to explain the history and details since improvement is a combination with Gary's memcpy patch [1]. Comparison of iperf3 benchmark results by applying Gary's memcpy patch and my uaccess optimization patch. All results are from the same base kernel, same rootfs and save BeagleV beta board. First left column : beaglev 5.13.rc4 kernel [2] Second column : Added Palmer's memcpy in C + my uaccess patch [3] Third column : Added Gary's memcpy + my uaccess patch [4] --- TCP recv --- 686 Mbits/sec | 700 Mbits/sec | 904 Mbits/sec 683 Mbits/sec | 701 Mbits/sec | 898 Mbits/sec 695 Mbits/sec | 702 Mbits/sec | 905 Mbits/sec --- TCP send --- 383 Mbits/sec | 390 Mbits/sec | 393 Mbits/sec 384 Mbits/sec | 393 Mbits/sec | 392 Mbits/sec --- UDP send --- 307 Mbits/sec | 358 Mbits/sec | 402 Mbits/sec 307 Mbits/sec | 359 Mbits/sec | 402 Mbits/sec --- UDP recv --- 630 Mbits/sec | 799 Mbits/sec | 875 Mbits/sec 730 Mbits/sec | 796 Mbits/sec | 873 Mbits/sec The uaccess patch is reducing pipeline stall of read after write (RAW) by unroling load and store. The main reason for using assembler inside uaccess.S is because the __asm_to/copy_from_user() handling page fault must be done manually inside the functions. The above result is combination from Gary $B!G (Bs memcpy speeding up by reducing the S-mode and M-mode switching and my uaccess reducing pipeline stall for user space uses syscall with large data. We had a discussion of improving network performance on the BeagleV beta board with Palmer. Palmer suggested to use C-based string routines, which checks the unaligned address and use 8 bytes aligned copy if the both src and dest are aligned and if not use the current copy function. The Gary's assembly version of memcpy is improving by not using unaligned access in 64 bit boundary, uses shifting it after reading with offset of aligned access, because every misaligned access is trapped and switches to opensbi in M-mode. The main speed up is coming from avoiding S-mode (kernel) and M-mode (opensbi) switching. Processing network packets require a lot of unaligned access for the packet header, which is not able to change the design of the header format to be aligned. And user applications pass large packet data with send/recf() and sendto/ recvfrom() to repeat less function calls for reading and writing data for the optimization. Akira [1] https://lkml.org/lkml/2021/2/16/778 [2] https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved [3] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-palmer-string [4] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-gary Akira Tsukamoto (1): riscv: prevent pipeline stall in __asm_to/copy_from_user arch/riscv/lib/uaccess.S | 106 +++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 33 deletions(-) -- 2.17.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv