All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Date: Fri, 3 Jul 2020 11:13:36 +0100	[thread overview]
Message-ID: <20200703101336.GA31383@C02TD0UTHF1T.local> (raw)
In-Reply-To: <1593755079-2160-1-git-send-email-kernelfans@gmail.com>

On Fri, Jul 03, 2020 at 01:44:39PM +0800, Pingfan Liu wrote:
> The cpu_number and __per_cpu_offset cost two different cache lines, and may
> not exist after a heavy user space load.
> 
> By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> fast path, register is used and these memory access are avoided.

How about:

| On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
| using the per-cpu offset stored in the tpidr_el1 system register. In
| some cases we generate a per-cpu address with a sequence like:
|
| | cpu_ptr = &per_cpu(ptr, smp_processor_id());
|
| Which potentially incurs a cache miss for both `cpu_number` and the
| in-memory `__per_cpu_offset` array. This can be written more optimally
| as:
|
| | cpu_ptr = this_cpu_ptr(ptr);
|
| ... which only needs the offset from tpidr_el1, and does not need to
| load from memory.

> By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> fast path, register is used and these memory access are avoided.

Do you have any numbers that show benefit here? It's not clear to me how
often the above case would apply where the cahes would also be hot for
everything else we need, and numbers would help to justify that.

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> To: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/mmu_context.h |  6 ++----
>  arch/arm64/mm/context.c              | 10 ++++++----
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index ab46187..808c3be 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -175,7 +175,7 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp)
>   * take CPU migration into account.
>   */
>  #define destroy_context(mm)		do { } while(0)
> -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
> +void check_and_switch_context(struct mm_struct *mm);
>  
>  #define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
>  
> @@ -214,8 +214,6 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
>  
>  static inline void __switch_mm(struct mm_struct *next)
>  {
> -	unsigned int cpu = smp_processor_id();
> -
>  	/*
>  	 * init_mm.pgd does not contain any user mappings and it is always
>  	 * active for kernel addresses in TTBR1. Just set the reserved TTBR0.
> @@ -225,7 +223,7 @@ static inline void __switch_mm(struct mm_struct *next)
>  		return;
>  	}
>  
> -	check_and_switch_context(next, cpu);
> +	check_and_switch_context(next);
>  }
>  
>  static inline void
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index d702d60..a206655 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -198,9 +198,10 @@ static u64 new_context(struct mm_struct *mm)
>  	return idx2asid(asid) | generation;
>  }
>  
> -void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
> +void check_and_switch_context(struct mm_struct *mm)
>  {
>  	unsigned long flags;
> +	unsigned int cpu;
>  	u64 asid, old_active_asid;
>  
>  	if (system_supports_cnp())
> @@ -222,9 +223,9 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
>  	 *   relaxed xchg in flush_context will treat us as reserved
>  	 *   because atomic RmWs are totally ordered for a given location.
>  	 */
> -	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
> +	old_active_asid = atomic64_read(this_cpu_ptr(&active_asids));
>  	if (old_active_asid && asid_gen_match(asid) &&
> -	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
> +	    atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_asids),
>  				     old_active_asid, asid))
>  		goto switch_mm_fastpath;
>  
> @@ -236,10 +237,11 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
>  		atomic64_set(&mm->context.id, asid);
>  	}
>  
> +	cpu = smp_processor_id();
>  	if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
>  		local_flush_tlb_all();
>  
> -	atomic64_set(&per_cpu(active_asids, cpu), asid);
> +	atomic64_set(this_cpu_ptr(&active_asids), asid);
>  	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);

FWIW, this looks sound to me.

Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-03 10:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  5:44 [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path Pingfan Liu
2020-07-03 10:13 ` Mark Rutland [this message]
2020-07-06  8:10   ` Pingfan Liu
2020-07-07  1:50     ` Pingfan Liu
2020-07-09 11:48       ` Mark Rutland
2020-07-10  8:03         ` Pingfan Liu
2020-07-10  9:35           ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200703101336.GA31383@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=jean-philippe@linaro.org \
    --cc=kernelfans@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=steve.capper@arm.com \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.