From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2DCAC433DF for ; Thu, 30 Jul 2020 11:42:30 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9E1272082E for ; Thu, 30 Jul 2020 11:42:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="dN7LX1C8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E1272082E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=QlyfaSPb5pqdVjY5d4JIP0nAfvIRZwpTH7QUrts2tzk=; b=dN7LX1C8ZiTz6sv3Kp6veUoAD 3C5p6A0c7GJuxSiE3zodbez14G7tbE0aeHCuYKKINZK3Uy523Euffzcw/6BtaQ30pSYifwM/MfTo/ 0kIUXFpbeU17soA5h3XLbrBWUlaWKkfXrK+O0bhKk5s9/qy/BTggYDND0UrYyrFi+6Wvf0uVAn6/r 1Yaa61gzNa8yalerzmDclOKsbj+94uBlkXXujwSD5j7v1uPSr7kxsfkiggNYpEu6s4R5U4ypkGAb2 UrXmbFexA4tzhTrY2V4Pa0gEQut7GesKes0iaahl8/LTJGwvu5wXXECw3w/tI0vVBEnDv+3Q56JWK 9kGql7Wdw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k16vk-0005vU-EY; Thu, 30 Jul 2020 11:41:00 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k16vg-0005pI-UV for linux-arm-kernel@lists.infradead.org; Thu, 30 Jul 2020 11:40:59 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 37B9630E; Thu, 30 Jul 2020 04:40:49 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.4.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7EA5D3F71F; Thu, 30 Jul 2020 04:40:47 -0700 (PDT) Date: Thu, 30 Jul 2020 12:40:44 +0100 From: Mark Rutland To: Pingfan Liu Subject: Re: [PATCHv3] arm64/mm: save memory access in check_and_switch_context() fast switch path Message-ID: <20200730114044.GB46086@C02TD0UTHF1T.local> References: <1594389852-19949-1-git-send-email-kernelfans@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1594389852-19949-1-git-send-email-kernelfans@gmail.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200730_074057_083559_4FDC068F X-CRM114-Status: GOOD ( 28.45 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jean-Philippe Brucker , Vladimir Murzin , Steve Capper , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Jul 10, 2020 at 10:04:12PM +0800, Pingfan Liu wrote: > On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable, > using the per-cpu offset stored in the tpidr_el1 system register. In > some cases we generate a per-cpu address with a sequence like: > > cpu_ptr = &per_cpu(ptr, smp_processor_id()); > > Which potentially incurs a cache miss for both `cpu_number` and the > in-memory `__per_cpu_offset` array. This can be written more optimally > as: > > cpu_ptr = this_cpu_ptr(ptr); > > Which only needs the offset from tpidr_el1, and does not need to > load from memory. > > The following two test cases show a small performance improvement measured > on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel. > > Test 1: (about 0.3% improvement) > #cat b.sh > make clean && make all -j138 > #perf stat --repeat 10 --null --sync sh b.sh > > - before this patch > Performance counter stats for 'sh b.sh' (10 runs): > > 298.62 +- 1.86 seconds time elapsed ( +- 0.62% ) > > - after this patch > Performance counter stats for 'sh b.sh' (10 runs): > > 297.734 +- 0.954 seconds time elapsed ( +- 0.32% ) > > Test 2: (about 1.69% improvement) > 'perf stat -r 10 perf bench sched messaging' > Then sum the total time of 'sched/messaging' by manual. > > - before this patch > total 0.707 sec for 10 times > - after this patch > totol 0.695 sec for 10 times > > Signed-off-by: Pingfan Liu > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Steve Capper > Cc: Mark Rutland > Cc: Vladimir Murzin > Cc: Jean-Philippe Brucker > To: linux-arm-kernel@lists.infradead.org The patch looks sound, so FWIW: Acked-by: Mark Rutland ... I'll leave it to Catalin and Will to decide whether to pick this up. Mark. > --- > v2 -> v3: improve commit log with performance result > arch/arm64/include/asm/mmu_context.h | 6 ++---- > arch/arm64/mm/context.c | 10 ++++++---- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h > index ab46187..808c3be 100644 > --- a/arch/arm64/include/asm/mmu_context.h > +++ b/arch/arm64/include/asm/mmu_context.h > @@ -175,7 +175,7 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp) > * take CPU migration into account. > */ > #define destroy_context(mm) do { } while(0) > -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu); > +void check_and_switch_context(struct mm_struct *mm); > > #define init_new_context(tsk,mm) ({ atomic64_set(&(mm)->context.id, 0); 0; }) > > @@ -214,8 +214,6 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) > > static inline void __switch_mm(struct mm_struct *next) > { > - unsigned int cpu = smp_processor_id(); > - > /* > * init_mm.pgd does not contain any user mappings and it is always > * active for kernel addresses in TTBR1. Just set the reserved TTBR0. > @@ -225,7 +223,7 @@ static inline void __switch_mm(struct mm_struct *next) > return; > } > > - check_and_switch_context(next, cpu); > + check_and_switch_context(next); > } > > static inline void > diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c > index d702d60..a206655 100644 > --- a/arch/arm64/mm/context.c > +++ b/arch/arm64/mm/context.c > @@ -198,9 +198,10 @@ static u64 new_context(struct mm_struct *mm) > return idx2asid(asid) | generation; > } > > -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) > +void check_and_switch_context(struct mm_struct *mm) > { > unsigned long flags; > + unsigned int cpu; > u64 asid, old_active_asid; > > if (system_supports_cnp()) > @@ -222,9 +223,9 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) > * relaxed xchg in flush_context will treat us as reserved > * because atomic RmWs are totally ordered for a given location. > */ > - old_active_asid = atomic64_read(&per_cpu(active_asids, cpu)); > + old_active_asid = atomic64_read(this_cpu_ptr(&active_asids)); > if (old_active_asid && asid_gen_match(asid) && > - atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu), > + atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_asids), > old_active_asid, asid)) > goto switch_mm_fastpath; > > @@ -236,10 +237,11 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) > atomic64_set(&mm->context.id, asid); > } > > + cpu = smp_processor_id(); > if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) > local_flush_tlb_all(); > > - atomic64_set(&per_cpu(active_asids, cpu), asid); > + atomic64_set(this_cpu_ptr(&active_asids), asid); > raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); > > switch_mm_fastpath: > -- > 2.7.5 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel