All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pingfan Liu <kernelfans@gmail.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Date: Tue, 7 Jul 2020 09:50:58 +0800	[thread overview]
Message-ID: <CAFgQCTtu9U2bB9JfXMw5TLd=tcrXkexVZOSgP=CDHnfQamddbQ@mail.gmail.com> (raw)
In-Reply-To: <CAFgQCTtnLzZuJ7D4XBkSdD9ba1f5g_2GHo_WPYx+FpJx9XepKA@mail.gmail.com>

On Mon, Jul 6, 2020 at 4:10 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Fri, Jul 3, 2020 at 6:13 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Fri, Jul 03, 2020 at 01:44:39PM +0800, Pingfan Liu wrote:
> > > The cpu_number and __per_cpu_offset cost two different cache lines, and may
> > > not exist after a heavy user space load.
> > >
> > > By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> > > fast path, register is used and these memory access are avoided.
> >
> > How about:
> >
> > | On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
> > | using the per-cpu offset stored in the tpidr_el1 system register. In
> > | some cases we generate a per-cpu address with a sequence like:
> > |
> > | | cpu_ptr = &per_cpu(ptr, smp_processor_id());
> > |
> > | Which potentially incurs a cache miss for both `cpu_number` and the
> > | in-memory `__per_cpu_offset` array. This can be written more optimally
> > | as:
> > |
> > | | cpu_ptr = this_cpu_ptr(ptr);
> > |
> > | ... which only needs the offset from tpidr_el1, and does not need to
> > | load from memory.
> Appreciate for your clear document.
> >
> > > By replacing per_cpu(active_asids, cpu) with this_cpu_ptr(&active_asids) in
> > > fast path, register is used and these memory access are avoided.
> >
> > Do you have any numbers that show benefit here? It's not clear to me how
> > often the above case would apply where the cahes would also be hot for
> > everything else we need, and numbers would help to justify that.
> Initially, I was just abstracted by the macro __my_cpu_offset
> implement, and came to this question. But following your thinking, I
> realized data is needed to make things clear.
>
> I have finished a test with 5.8.0-rc4 kernel on a 46 cpus qualcomm machine.
> command: time -p make all -j138
>
> Before this patch:
> real 291.86
> user 11050.18
> sys 362.91
>
> After this patch
> real 291.11
> user 11055.62
> sys 363.39
>
> As the data, it shows a very small improvement.
The data may be affected by random factors, and less persuasive. And I
tried to do some repeated tests with perf-stat.
#cat b.sh
make clean && make all -j138

#perf stat --repeat 10 --null --sync sh b.sh

- before this patch
 Performance counter stats for 'sh b.sh' (10 runs):

            298.62 +- 1.86 seconds time elapsed  ( +-  0.62% )


- after this patch
 Performance counter stats for 'sh b.sh' (10 runs):

           297.734 +- 0.954 seconds time elapsed  ( +-  0.32% )


As the mean value  298.62 VS 297.734 shows, this trivial change does
bring a small improvement in performance.
>
> Thanks,
> Pingfan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-07  1:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  5:44 [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path Pingfan Liu
2020-07-03 10:13 ` Mark Rutland
2020-07-06  8:10   ` Pingfan Liu
2020-07-07  1:50     ` Pingfan Liu [this message]
2020-07-09 11:48       ` Mark Rutland
2020-07-10  8:03         ` Pingfan Liu
2020-07-10  9:35           ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFgQCTtu9U2bB9JfXMw5TLd=tcrXkexVZOSgP=CDHnfQamddbQ@mail.gmail.com' \
    --to=kernelfans@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=jean-philippe@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=steve.capper@arm.com \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.