From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Gortmaker Subject: Re: [PATCH net-next 2/5] gianfar: Cleanup and optimize struct gfar_private Date: Tue, 12 Feb 2013 11:49:37 -0500 Message-ID: <511A72A1.8070009@windriver.com> References: <1360673237-349-1-git-send-email-claudiu.manoil@freescale.com> <1360673237-349-2-git-send-email-claudiu.manoil@freescale.com> <511A5BA3.4020306@windriver.com> <511A66A3.7040704@freescale.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: , "David S. Miller" To: Claudiu Manoil Return-path: Received: from mail1.windriver.com ([147.11.146.13]:59981 "EHLO mail1.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758698Ab3BLQtH (ORCPT ); Tue, 12 Feb 2013 11:49:07 -0500 In-Reply-To: <511A66A3.7040704@freescale.com> Sender: netdev-owner@vger.kernel.org List-ID: On 13-02-12 10:58 AM, Claudiu Manoil wrote: > On 2/12/2013 5:11 PM, Paul Gortmaker wrote: >> On 13-02-12 07:47 AM, Claudiu Manoil wrote: >>> * group run-time critical fields within the 1st cacheline >>> followed by the tx|rx_queue reference arrays and the interrupt >>> group instances (gfargrp) (all cacheline aligned) >>> * change 'padding' from unsigned short to u16 >>> * clear 20+ byte memory hole >> >> Per prev. mail, it gets harder to see which change is where, >> when they are all lumped together like this. For example, it >> wasn't obvious to me where the 20 byte hole was. Also, it >> doesn't look like you changed the padding, but rather instead >> totally re-purposed it, leaving no alignment padding after the >> uchar bitfields (where it was originally). >> >> P. > > The 20 byte hole was here: > > struct gfar_private { > unsigned int num_tx_queues; > unsigned int num_rx_queues; > unsigned int num_grps; > unsigned int mode; > unsigned int total_tx_ring_size; > unsigned int total_rx_ring_size; > struct device_node * node; > struct net_device * ndev; > /* --- cacheline 1 boundary (32 bytes) --- */ > struct platform_device * ofdev; > enum gfar_errata errata; > > /* XXX 24 bytes hole, try to pack */ OK, so this is good information, that wasn't in your commit log. (Is it 20 or 24 though? I'm guessing 24, and original commit log was wrong in saying 20.) Something like: "The default gcc layout of gfar_private leaves an implicit 24 byte hole (bytes XYZ --> ABC in the 1st cache line) after the errata enum...." > > /* --- cacheline 2 boundary (64 bytes) --- */ > struct gfar_priv_grp gfargrp[2]; > > > At the end of the patch series the first cacheline is without holes. > Please note that the re-grouping of members and their order is most > important. For instance why keep in the first cacheline something The importance of ordering and alignment was never in question. What was missing, was conveying to the reviewer exactly how you'd made it better, without forcing them to manually deconstruct gfar_priv in the before and after cases, esp. when you'd already done that analysis. > as unimportant as total_rx_ring_size? Members like rx_buffer_size, > or padding, or even errata, are critical however for the fast path. > Rx processing (gfar_poll + clean_rx_ring) is the bottleneck here, > keeping the CPU to 100%. So the main goal is to optimize this path, > including memory access/cache optimizations. For instance, better > results were obtained by inverting rx|tx_queue[] with gfargrp[], originally: > /* --- cacheline 1 boundary (32 bytes) --- */ > struct gfar_priv_tx_q * tx_queue[8]; /* 32 32 */ > /* --- cacheline 2 boundary (64 bytes) --- */ > struct gfar_priv_rx_q * rx_queue[8]; /* 64 32 */ > /* --- cacheline 3 boundary (96 bytes) --- */ > struct gfar_priv_grp gfargrp[2]; /* 96 192 */ > > The uchar bitfields are unimportant here (used at "init time"), and > they take 4 bytes including padding anyway. So whether it's uchar or > uint, it's the same, maybe better left uchar to discourage the abuse > of these bitfields. (A 2-3byte hole here doesn't change anything > to the whole structure size, which is padded to be at least a 32B > multiple.) This here too, i.e. why the padding after the bitfields is unimportant and hence removed should also be in the long log. Thanks, Paul. -- > > Thanks, > Claudiu > >