From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3DB4C433F5 for ; Thu, 29 Sep 2022 16:13:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235768AbiI2QN1 (ORCPT ); Thu, 29 Sep 2022 12:13:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235290AbiI2QNZ (ORCPT ); Thu, 29 Sep 2022 12:13:25 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 399FB1D35BE for ; Thu, 29 Sep 2022 09:13:24 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id d14so944155ilf.2 for ; Thu, 29 Sep 2022 09:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=2/Sh/v9WLQ5DacoeCo5LR01Tpb6RXV73Us2dj8jMEgg=; b=Fw7CFHoqbIneHRGW1eSzygncQww0Nzr+13COSVf/LuaJXbMsGVbWr2JwANb/NZY7cw IpMmrX8xtoPKi0vCSFChe1T9LUihF8v98KKbZkQqp5p+Fdl7hypt3C2YYhClYj9wpsT/ ru/Gk4dlWn5p2GGuFVngT45Qc6okNOm50vGB8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=2/Sh/v9WLQ5DacoeCo5LR01Tpb6RXV73Us2dj8jMEgg=; b=sAua3KXPzeBkllDbhNjnzxpYlqrJ47THb0/Jdm6zaCkTMlgtcbm1Obkd1Nk7NUOFZ/ Z/HbzWd8Th58pKFb8zUTMKF8wxSgnPYvldm4J2CCO2FY1QBByKoaqfWFiPaltQSdMUIU xzLJiC5DxnI4a7ROQ56PSS8MK4CdKfUfn9Hp5XrUkFbGpT1FK9EFP+kh5kFwh3IrsYQV 1paXz/MDORijpUx/KV8wO/CaWnSvXLJcAFMiuPHkYDxiN8MHpNdEAPSdXiPvKycvd5gj 870NvnGtM1xsmLvgsN6mdg5nHzcGY8eu8NCx36heb119SkvXmC4Ea7HAgJ5t55ZR2YnJ bNhw== X-Gm-Message-State: ACrzQf2QT3nlxhiiLIpE2yd9cN0rXtTpgJ+jJPRCTkP5ud68dCPm5/qc BpbhQG3Cv2CErVi0LKnif3P4eCO1HftI63rETSL+Iw== X-Google-Smtp-Source: AMsMyM5Xp6QV/gtVq6nfq38rfm0HFlPCG++1YgV5/2Las3rfpesEcOIm0y5DnS+EK3kcktlg/qc98KnIYXyEMrRhv6E= X-Received: by 2002:a05:6e02:b45:b0:2f8:ab79:fc70 with SMTP id f5-20020a056e020b4500b002f8ab79fc70mr1980049ilu.214.1664468003604; Thu, 29 Sep 2022 09:13:23 -0700 (PDT) MIME-Version: 1.0 References: <20220920082005.2459826-1-denik@chromium.org> <20220922053145.944786-1-denik@chromium.org> <87h70zk83g.wl-maz@kernel.org> In-Reply-To: From: Denis Nikitin Date: Thu, 29 Sep 2022 09:13:12 -0700 Message-ID: Subject: Re: [PATCH v2] KVM: arm64: nvhe: Fix build with profile optimization To: Marc Zyngier Cc: Catalin Marinas , Will Deacon , James Morse , Alexandru Elisei , Nick Desaulniers , David Brazdil , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, Manoj Gupta Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, Please let me know what you think about this approach. Thanks, Denis On Thu, Sep 22, 2022 at 11:04 PM Manoj Gupta wrote: > > > > On Thu, Sep 22, 2022 at 10:01 PM Denis Nikitin wrote: >> >> Hi Mark, >> >> On Thu, Sep 22, 2022 at 3:38 AM Marc Zyngier wrote: >> > >> > I was really hoping that you'd just drop the flags from the CFLAGS >> > instead of removing the generated section. Something like: >> > >> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile >> > index b5c5119c7396..e5b2d43925b4 100644 >> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile >> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile >> > @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@ >> > >> > # Remove ftrace, Shadow Call Stack, and CFI CFLAGS. >> > # This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations. >> > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) >> > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS)) >> > >> > # KVM nVHE code is run at a different exception code with a different map, so >> > # compiler instrumentation that inserts callbacks or checks into the code may >> >> Sorry, I moved on with a different approach and didn't explain the rationale. >> >> Like you mentioned before, the flag `-fprofile-sample-use` does not appear >> in the kernel. And it looks confusing when the flag is disabled or filtered out >> here. This was the first reason. >> >> The root cause of the build failure wasn't the compiler profile guided >> optimization but the extra metadata in SHT_REL section which llvm injected >> into kvm_nvhe.tmp.o for further link optimization. >> If we remove the .llvm.call-graph-profile section we fix the build and avoid >> potential problems with relocations optimized by the linker. The profile >> guided optimization will still be applied by the compiler. >> >> Let me know what you think about it. >> >> > >> > However, I even failed to reproduce your problem using LLVM 14 as >> > packaged by Debian (if that matters, I'm using an arm64 build >> > machine). I build the kernel with: >> > >> > $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux >> > >> > and the offending object only contains the following sections: >> > > > > Just some comments based on my ChromeOS build experience. > > fprofile-sample-use needs the profile file name argument to read the pgo data from > i.e. -fprofile-sample-use=/path/to/gcov.profile. > > Since the path to filename can change, it makes filtering out more difficult. > It is certainly possible to find and filter the exact argument by some string search of KCFLAGS. > But passing -fno-profile-sample-use is easier and less error prone which I believe the previous patch version tried to do. > > >> > arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: file format elf64-littleaarch64 >> > >> > Sections: >> > Idx Name Size VMA LMA File off Algn >> > 0 .hyp.idmap.text 00000ae4 0000000000000000 0000000000000000 00000800 2**11 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> > 1 .hyp.text 0000e988 0000000000000000 0000000000000000 00001800 2**11 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE >> > 2 .hyp.data..ro_after_init 00000820 0000000000000000 0000000000000000 00010188 2**3 >> > CONTENTS, ALLOC, LOAD, DATA >> > 3 .hyp.rodata 00002e70 0000000000000000 0000000000000000 000109a8 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 4 .hyp.data..percpu 00001ee0 0000000000000000 0000000000000000 00013820 2**4 >> > CONTENTS, ALLOC, LOAD, DATA >> > 5 .hyp.bss 00001158 0000000000000000 0000000000000000 00015700 2**3 >> > ALLOC >> > 6 .comment 0000001f 0000000000000000 0000000000000000 00017830 2**0 >> > CONTENTS, READONLY >> > 7 .llvm_addrsig 000000b8 0000000000000000 0000000000000000 0001784f 2**0 >> > CONTENTS, READONLY, EXCLUDE >> > 8 .altinstructions 00001284 0000000000000000 0000000000000000 00015700 2**0 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 9 __jump_table 00000960 0000000000000000 0000000000000000 00016988 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, DATA >> > 10 __bug_table 0000051c 0000000000000000 0000000000000000 000172e8 2**2 >> > CONTENTS, ALLOC, LOAD, RELOC, DATA >> > 11 __kvm_ex_table 00000028 0000000000000000 0000000000000000 00017808 2**3 >> > CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA >> > 12 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00027370 2**0 >> > CONTENTS, READONLY >> > >> > So what am I missing to trigger this issue? Does it rely on something >> > like PGO, which is not upstream yet? A bit of handholding would be >> > much appreciated. >> >> Right, it relies on the PGO profile. >> On ChromeOS we collect the sample PGO profile from Arm devices with >> enabled CoreSight/ETM. You can find more details on ETM at >> https://www.kernel.org/doc/Documentation/trace/coresight/coresight.rst. >> >> https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md >> contains information about the pipeline of collecting, processing, and applying >> the profile. >> > > Generally the difficult part is in collecting a good matching profile for the workload. > So I think this patch is better than previous since it still keeps the compiler optimization for the hot code paths > in the file but removes the problematic section. > > Thanks, > Manoj > > >> >> > >> > Thanks, >> > >> > M. >> > >> > -- >> > Without deviation from the norm, progress is not possible. >> >> Thanks, >> Denis