From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 795D5C433E5 for ; Fri, 10 Jul 2020 14:06:23 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48A1D20748 for ; Fri, 10 Jul 2020 14:06:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="oXRaOr2i"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OUBS2SSP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48A1D20748 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Owner; bh=KFLv8Gemamh8MszWKx0G8GmqftyldBl4M003gYhGyUY=; b=oXRaOr2iD1De7Q9oHfYO62sI5U Mwr9Eg+UZPujEbWzAuPEsduPjTL2vEnc7OmnK/4bNRCO08aVp0JNq/+4AdPNm/g+FGd4L4I+rqqzP Kz5N6wWZdBrfkPNiWV8lPWYb8YKz93ql2D1GiG9O+53teZi96FYSz2AqsdemOT+93syWaRwoi5k7H GgTg6tbx6gaDs26+7TbQJOyzSzVKcOyFUMuA7S2808CtMXJ2v2+vdProPZsyz3V7gxgiyWo/f9mNJ Yk84Fjb/+OmbERVP1HlAUlII8qH2ipoqTLZepjGUOuVj/eOxac7SS/QFhznaksmVieKlPu7EimWtD cd4gogYQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jttdu-0005tV-Aw; Fri, 10 Jul 2020 14:04:46 +0000 Received: from mail-pl1-x644.google.com ([2607:f8b0:4864:20::644]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jttdi-0005pa-HK for linux-arm-kernel@lists.infradead.org; Fri, 10 Jul 2020 14:04:37 +0000 Received: by mail-pl1-x644.google.com with SMTP id k4so2279062pld.12 for ; Fri, 10 Jul 2020 07:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=SDcCHmeq2ZWoyFhHLySdePu2o5b6LqYOHsGjDofyqRQ=; b=OUBS2SSPCn1YJrKKyq17e0pHc7d4N3rPswUFZ+mWcrYzrx0DTcO8lb45R1H+9T7dOP U1gKvZYAtmQhgABzrvIVpkbjqzCKY7zU7oBfQPR7RZqeS5/IYbJ+aAlhAiBX5lyEJmIo 0yu8ZkFoSSDmCNcZwfJZ07wR9ddMQyYd9KjYHd06XFNQoSS/bTLPFKid/xunEREu2hbJ mTvovmP7ecnCk5I8jGoErsGgGsVxNd8HVTglv0QQRDt+17R+deeDAuvaq3Tv2peJ7iG8 UYTAL6VYllhvpMlxHBT8WyVamraPjUWGepC2zm9oQPuoySbWIpDlDUwZf6JOVeCDYPcK hSfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=SDcCHmeq2ZWoyFhHLySdePu2o5b6LqYOHsGjDofyqRQ=; b=F+/+1Cp0DXfGGuYnwoDiQx97gqUa0vuOjN8dyZ4G8ynbIa22A8h2gO32XXbgb1GPhZ 43bI5NZBzvQ8+HKICIO2LHC9QKD5/+DOF07WmVS3HSm7mo6ZMTfptdPFe7HRmi5/Wy17 4nJd3oLI7XqXddaBpXJOn63gKdVAXjycA5x92ZNj0PkYDgsWEQEWilKzM25KNmEKu3ti yh99MGBPOsYCutSrdjcQiPwIptm6PYee8EXOulUwapE0asb8z6dubGJvwau3jILdiw7A 3sbvKkLFO5Shhz2g5oMs2FosOgVtuzdIW9uNUSo1yXqPr3MRePr7RLsIFNK7jtWjryK9 jCrA== X-Gm-Message-State: AOAM532WsY4yt87K1j8OXLXJaAWC2L/KkAboQRAcsLb6jLiULCSj+++x 0Nhm8tz7p2vfCiM6KJsnWkL39cQ= X-Google-Smtp-Source: ABdhPJyFZMrDDNzV658nHLOnanRfLBLzczxT6jyOD87rXlNENF5d1fke+SIOy1PgvPtyW8vfmv1wlg== X-Received: by 2002:a17:90a:348d:: with SMTP id p13mr5895908pjb.108.1594389871622; Fri, 10 Jul 2020 07:04:31 -0700 (PDT) Received: from mylaptop.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id c27sm5941697pfj.163.2020.07.10.07.04.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jul 2020 07:04:31 -0700 (PDT) From: Pingfan Liu To: linux-arm-kernel@lists.infradead.org Subject: [PATCHv3] arm64/mm: save memory access in check_and_switch_context() fast switch path Date: Fri, 10 Jul 2020 22:04:12 +0800 Message-Id: <1594389852-19949-1-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.5 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200710_100434_655910_6736080B X-CRM114-Status: GOOD ( 15.54 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Jean-Philippe Brucker , Vladimir Murzin , Steve Capper , Catalin Marinas , Pingfan Liu , Will Deacon MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable, using the per-cpu offset stored in the tpidr_el1 system register. In some cases we generate a per-cpu address with a sequence like: cpu_ptr = &per_cpu(ptr, smp_processor_id()); Which potentially incurs a cache miss for both `cpu_number` and the in-memory `__per_cpu_offset` array. This can be written more optimally as: cpu_ptr = this_cpu_ptr(ptr); Which only needs the offset from tpidr_el1, and does not need to load from memory. The following two test cases show a small performance improvement measured on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel. Test 1: (about 0.3% improvement) #cat b.sh make clean && make all -j138 #perf stat --repeat 10 --null --sync sh b.sh - before this patch Performance counter stats for 'sh b.sh' (10 runs): 298.62 +- 1.86 seconds time elapsed ( +- 0.62% ) - after this patch Performance counter stats for 'sh b.sh' (10 runs): 297.734 +- 0.954 seconds time elapsed ( +- 0.32% ) Test 2: (about 1.69% improvement) 'perf stat -r 10 perf bench sched messaging' Then sum the total time of 'sched/messaging' by manual. - before this patch total 0.707 sec for 10 times - after this patch totol 0.695 sec for 10 times Signed-off-by: Pingfan Liu Cc: Catalin Marinas Cc: Will Deacon Cc: Steve Capper Cc: Mark Rutland Cc: Vladimir Murzin Cc: Jean-Philippe Brucker To: linux-arm-kernel@lists.infradead.org --- v2 -> v3: improve commit log with performance result arch/arm64/include/asm/mmu_context.h | 6 ++---- arch/arm64/mm/context.c | 10 ++++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index ab46187..808c3be 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -175,7 +175,7 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp) * take CPU migration into account. */ #define destroy_context(mm) do { } while(0) -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu); +void check_and_switch_context(struct mm_struct *mm); #define init_new_context(tsk,mm) ({ atomic64_set(&(mm)->context.id, 0); 0; }) @@ -214,8 +214,6 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) static inline void __switch_mm(struct mm_struct *next) { - unsigned int cpu = smp_processor_id(); - /* * init_mm.pgd does not contain any user mappings and it is always * active for kernel addresses in TTBR1. Just set the reserved TTBR0. @@ -225,7 +223,7 @@ static inline void __switch_mm(struct mm_struct *next) return; } - check_and_switch_context(next, cpu); + check_and_switch_context(next); } static inline void diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c index d702d60..a206655 100644 --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -198,9 +198,10 @@ static u64 new_context(struct mm_struct *mm) return idx2asid(asid) | generation; } -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) +void check_and_switch_context(struct mm_struct *mm) { unsigned long flags; + unsigned int cpu; u64 asid, old_active_asid; if (system_supports_cnp()) @@ -222,9 +223,9 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) * relaxed xchg in flush_context will treat us as reserved * because atomic RmWs are totally ordered for a given location. */ - old_active_asid = atomic64_read(&per_cpu(active_asids, cpu)); + old_active_asid = atomic64_read(this_cpu_ptr(&active_asids)); if (old_active_asid && asid_gen_match(asid) && - atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu), + atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_asids), old_active_asid, asid)) goto switch_mm_fastpath; @@ -236,10 +237,11 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) atomic64_set(&mm->context.id, asid); } + cpu = smp_processor_id(); if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending)) local_flush_tlb_all(); - atomic64_set(&per_cpu(active_asids, cpu), asid); + atomic64_set(this_cpu_ptr(&active_asids), asid); raw_spin_unlock_irqrestore(&cpu_asid_lock, flags); switch_mm_fastpath: -- 2.7.5 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel