From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A59FFC4320A for ; Wed, 11 Aug 2021 14:16:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8642D6101E for ; Wed, 11 Aug 2021 14:16:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232183AbhHKOQp (ORCPT ); Wed, 11 Aug 2021 10:16:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:57956 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231872AbhHKOQo (ORCPT ); Wed, 11 Aug 2021 10:16:44 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7BFA760FE6; Wed, 11 Aug 2021 14:16:20 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mDp1m-004LwJ-Id; Wed, 11 Aug 2021 15:16:18 +0100 Date: Wed, 11 Aug 2021 15:16:18 +0100 Message-ID: <87o8a49idp.wl-maz@kernel.org> From: Marc Zyngier To: Eric Dumazet Cc: Thierry Reding , Matteo Croce , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Giuseppe Cavallaro , Alexandre Torgue , "David S. Miller" , Jakub Kicinski , Palmer Dabbelt , Paul Walmsley , Drew Fustini , Emil Renner Berthing , Jon Hunter , Will Deacon Subject: Re: [PATCH net-next] stmmac: align RX buffers In-Reply-To: <202417ef-f8ae-895d-4d07-1f9f3d89b4a4@gmail.com> References: <20210614022504.24458-1-mcroce@linux.microsoft.com> <871r71azjw.wl-maz@kernel.org> <202417ef-f8ae-895d-4d07-1f9f3d89b4a4@gmail.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: eric.dumazet@gmail.com, thierry.reding@gmail.com, mcroce@linux.microsoft.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, peppe.cavallaro@st.com, alexandre.torgue@foss.st.com, davem@davemloft.net, kuba@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, drew@beagleboard.org, kernel@esmil.dk, jonathanh@nvidia.com, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 11 Aug 2021 13:53:59 +0100, Eric Dumazet wrote: > > > > On 8/11/21 12:28 PM, Thierry Reding wrote: > > On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote: > >> Hi all, > >> > >> [adding Thierry, Jon and Will to the fun] > >> > >> On Mon, 14 Jun 2021 03:25:04 +0100, > >> Matteo Croce wrote: > >>> > >>> From: Matteo Croce > >>> > >>> On RX an SKB is allocated and the received buffer is copied into it. > >>> But on some architectures, the memcpy() needs the source and destination > >>> buffers to have the same alignment to be efficient. > >>> > >>> This is not our case, because SKB data pointer is misaligned by two bytes > >>> to compensate the ethernet header. > >>> > >>> Align the RX buffer the same way as the SKB one, so the copy is faster. > >>> An iperf3 RX test gives a decent improvement on a RISC-V machine: > >>> > >>> before: > >>> [ ID] Interval Transfer Bitrate Retr > >>> [ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender > >>> [ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver > >>> > >>> after: > >>> [ ID] Interval Transfer Bitrate Retr > >>> [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender > >>> [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver > >>> > >>> And the memcpy() overhead during the RX drops dramatically. > >>> > >>> before: > >>> Overhead Shared O Symbol > >>> 43.35% [kernel] [k] memcpy > >>> 33.77% [kernel] [k] __asm_copy_to_user > >>> 3.64% [kernel] [k] sifive_l2_flush64_range > >>> > >>> after: > >>> Overhead Shared O Symbol > >>> 45.40% [kernel] [k] __asm_copy_to_user > >>> 28.09% [kernel] [k] memcpy > >>> 4.27% [kernel] [k] sifive_l2_flush64_range > >>> > >>> Signed-off-by: Matteo Croce > >> > >> This patch completely breaks my Jetson TX2 system, composed of 2 > >> Nvidia Denver and 4 Cortex-A57, in a very "funny" way. > >> > >> Any significant amount of traffic result in all sort of corruption > >> (ssh connections get dropped, Debian packages downloaded have the > >> wrong checksums) if any Denver core is involved in any significant way > >> (packet processing, interrupt handling). And it is all triggered by > >> this very change. > >> > >> The only way I have to make it work on a Denver core is to route the > >> interrupt to that particular core and taskset the workload to it. Any > >> other configuration involving a Denver CPU results in some sort of > >> corruption. On their own, the A57s are fine. > >> > >> This smells of memory ordering going really wrong, which this change > >> would expose. I haven't had a chance to dig into the driver yet (it > >> took me long enough to bisect it), but if someone points me at what is > >> supposed to synchronise the DMA when receiving an interrupt, I'll have > >> a look. > > > > I recall that Jon was looking into a similar issue recently, though I > > think the failure mode was slightly different. I also vaguely recall > > that CPU frequency was impacting this to some degree (lower CPU > > frequencies would increase the chances of this happening). > > > > Jon's currently out of office, but let me try and dig up the details > > on this. > > > > Thierry > > > >> > >> Thanks, > >> > >> M. > >> > >>> --- > >>> drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h > >>> index b6cd43eda7ac..04bdb3950d63 100644 > >>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h > >>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h > >>> @@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv) > >>> static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv) > >>> { > >>> if (stmmac_xdp_is_enabled(priv)) > >>> - return XDP_PACKET_HEADROOM; > >>> + return XDP_PACKET_HEADROOM + NET_IP_ALIGN; > >>> > >>> - return 0; > >>> + return NET_SKB_PAD + NET_IP_ALIGN; > >>> } > >>> > >>> void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue); > >>> -- > >>> 2.31.1 > >>> > >>> > >> > >> -- > >> Without deviation from the norm, progress is not possible. > > Are you sure you do not need to adjust stmmac_set_bfsize(), > stmmac_rx_buf1_len() and stmmac_rx_buf2_len() ? > > Presumably DEFAULT_BUFSIZE also want to be increased by NET_SKB_PAD > > Patch for stmmac_rx_buf1_len() : > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 7b8404a21544cf29668e8a14240c3971e6bce0c3..041a74e7efca3436bfe3e17f972dd156173957a9 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -4508,12 +4508,12 @@ static unsigned int stmmac_rx_buf1_len(struct stmmac_priv *priv, > > /* First descriptor, not last descriptor and not split header */ > if (status & rx_not_ls) > - return priv->dma_buf_sz; > + return priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN; > > plen = stmmac_get_rx_frame_len(priv, p, coe); > > /* First descriptor and last descriptor and not split header */ > - return min_t(unsigned int, priv->dma_buf_sz, plen); > + return min_t(unsigned int, priv->dma_buf_sz - NET_SKB_PAD - NET_IP_ALIGN, plen); > } > > static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv, Feels like a major deficiency of the original patch. Happy to test a more complete patch if/when you have one. Thanks, M. -- Without deviation from the norm, progress is not possible.