Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path

From: Mark Rutland <mark.rutland@arm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Date: Fri, 10 Jul 2020 10:35:16 +0100	[thread overview]
Message-ID: <20200710093516.GA25856@C02TD0UTHF1T.local> (raw)
In-Reply-To: <CAFgQCTviLCPkvCfrZ0Cwubqfzpht6n6=hJW-RsRQejYNHozT9Q@mail.gmail.com>

On Fri, Jul 10, 2020 at 04:03:39PM +0800, Pingfan Liu wrote:
> On Thu, Jul 9, 2020 at 7:48 PM Mark Rutland <mark.rutland@arm.com> wrote:
> [...]
> >
> > IIUC that's a 0.3% improvement. It'd be worth putting these results in
> > the commit message.
> Sure, I will.
> >
> > Could you also try that with "perf bench sched messaging" as the
> > workload? As a microbenchmark, that might show the highest potential
> > benefit, and it'd be nice to have those figures too if possible.
> I have finished 10 times of this test, and will put the results in the
> commit log too. In summary, this microbenchmark has about 1.69%
> improvement after this patch.

Great; thanks for gathering this data!

Mark.

> 
> Test data:
> 
> 1. without this patch, total 0.707 sec for 10 times
> 
> # perf stat -r 10 perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.074 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.071 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.068 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.068 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> 
>  Performance counter stats for 'perf bench sched messaging' (10 runs):
> 
>           3,102.15 msec task-clock                #   11.018 CPUs
> utilized            ( +-  0.47% )
>             16,468      context-switches          #    0.005 M/sec
>                ( +-  2.56% )
>              6,877      cpu-migrations            #    0.002 M/sec
>                ( +-  3.44% )
>             83,645      page-faults               #    0.027 M/sec
>                ( +-  0.05% )
>      6,440,897,966      cycles                    #    2.076 GHz
>                ( +-  0.37% )
>      3,620,264,483      instructions              #    0.56  insn per
> cycle           ( +-  0.11% )
>    <not supported>      branches
>         11,187,394      branch-misses
>                ( +-  0.73% )
> 
>            0.28155 +- 0.00166 seconds time elapsed  ( +-  0.59% )
> 
> 2. with this patch, totol 0.695 sec for 10 times
> perf stat -r 10 perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.071 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.066 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> 
>      Total time: 0.069 [sec]
> 
>  Performance counter stats for 'perf bench sched messaging' (10 runs):
> 
>           3,098.48 msec task-clock                #   11.182 CPUs
> utilized            ( +-  0.38% )
>             15,485      context-switches          #    0.005 M/sec
>                ( +-  2.28% )
>              6,707      cpu-migrations            #    0.002 M/sec
>                ( +-  2.80% )
>             83,606      page-faults               #    0.027 M/sec
>                ( +-  0.00% )
>      6,435,068,186      cycles                    #    2.077 GHz
>                ( +-  0.26% )
>      3,611,197,297      instructions              #    0.56  insn per
> cycle           ( +-  0.08% )
>    <not supported>      branches
>         11,323,244      branch-misses
>                ( +-  0.51% )
> 
>           0.277087 +- 0.000625 seconds time elapsed  ( +-  0.23% )
> 
> 
> Thanks,
> Pingfan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel