From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728745AbgHDQGx (ORCPT ); Tue, 4 Aug 2020 12:06:53 -0400 From: Arvind Sankar Date: Tue, 4 Aug 2020 12:06:49 -0400 Subject: Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections Message-ID: <20200804160649.GA2409491@rani.riverdale.lan> References: <20200731230820.1742553-1-keescook@chromium.org> <20200731230820.1742553-14-keescook@chromium.org> <20200801035128.GB2800311@rani.riverdale.lan> <20200803190506.GE1299820@tassilo.jf.intel.com> <20200803201525.GA1351390@rani.riverdale.lan> <20200804044532.GC1321588@tassilo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200804044532.GC1321588@tassilo.jf.intel.com> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Andi Kleen Cc: Arvind Sankar , Kees Cook , Thomas Gleixner , Will Deacon , Nick Desaulniers , Jian Cai , =?utf-8?B?RsSBbmctcnXDrCBTw7JuZw==?= , Luis Lozano , Manoj Gupta , stable@vger.kernel.org, Catalin Marinas , Mark Rutland , Ard Biesheuvel , Peter Collingbourne , James Morse , Borislav Petkov , Ingo Molnar , Russell King , Masahiro Yamada , Nathan Chancellor , Arnd Bergmann , x86@kernel.org, clang-built-linux@googlegroups.com, linux-arch@vger.kernel.org, linux-efi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Michal Marek Message-ID: <20200804160649.8q__ij3SUxRXwhoQLhsOlFL8xeI-ob31UKmC5shkZ3E@z> On Mon, Aug 03, 2020 at 09:45:32PM -0700, Andi Kleen wrote: > > Why is that? Both .text and .text.hot have alignment of 2^4 (default > > function alignment on x86) by default, so it doesn't seem like it should > > matter for packing density. Avoiding interspersing cold text among > > You may lose part of a cache line on each unit boundary. Linux has > a lot of units, some of them small. All these bytes add up. Separating out .text.unlikely, which isn't aligned, slightly _reduces_ this loss, but not by much -- just over 1K on a defconfig. More importantly, it moves cold code out of line (~320k on a defconfig), giving better code density for the hot code. For .text and .text.hot, you lose the alignment padding on every function boundary, not unit boundary, because of the 16-byte alignment. Whether .text.hot and .text are arranged by translation unit or not makes no difference. With *(.text.hot) *(.text) you get HHTT, with *(.text.hot .text) you get HTHT, but in both cases the individual chunks are already aligned to 16 bytes. If .text.hot _had_ different alignment requirements to .text, the HHTT should actually give better packing in general, I think. > > It's bad for TLB locality too. Sadly with all the fine grained protection > changes the 2MB coverage is eroding anyways, but this makes it even worse. > Yes, that could be true for .text.hot, depending on whether the hot functions are called from all over the kernel (in which case putting them together ought to be better) or mostly from regular text within the unit in which they appeared (in which case it would be better together with that code).