All of lore.kernel.org
 help / color / mirror / Atom feed
From: xhao@linux.alibaba.com
To: Barry Song <21cnbao@gmail.com>, Yicong Yang <yangyicong@huawei.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LAK <linux-arm-kernel@lists.infradead.org>, x86 <x86@kernel.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	LKML <linux-kernel@vger.kernel.org>,
	"Darren Hart" <darren@os.amperecomputing.com>,
	huzhanyuan@oppo.com, "李培锋(wink)" <lipeifeng@oppo.com>,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>, 郭健 <guojian@oppo.com>,
	"real mz" <realmz6@gmail.com>,
	linux-mips@vger.kernel.org, openrisc@lists.librecores.org,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>
Subject: Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH
Date: Sat, 23 Jul 2022 17:22:45 +0800	[thread overview]
Message-ID: <3ac4b1a3-8067-3edb-be4f-326e2a4943ed@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4x9hLbXGMU737SShZGS89_4zywyhvkcRfz3W5s_p7O1PA@mail.gmail.com>


On 7/20/22 7:18 PM, Barry Song wrote:
> On Tue, Jul 19, 2022 at 1:28 AM Yicong Yang <yangyicong@huawei.com> wrote:
>> On 2022/7/14 12:51, Barry Song wrote:
>>> On Thu, Jul 14, 2022 at 3:29 PM Xin Hao <xhao@linux.alibaba.com> wrote:
>>>> Hi barry.
>>>>
>>>> I do some test on Kunpeng arm64 machine use Unixbench.
>>>>
>>>> The test  result as below.
>>>>
>>>> One core, we can see the performance improvement above +30%.
>>> I am really pleased to see the 30%+ improvement on unixbench on single core.
>>>
>>>> ./Run -c 1 -i 1 shell1
>>>> w/o
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 5481.0 1292.7
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1292.7
>>>>
>>>> w/
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 6974.6 1645.0
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1645.0
>>>>
>>>>
>>>> But with whole cores, there have little performance degradation above -5%
>>> That is sad as we might get more concurrency between mprotect(), madvise(),
>>> mremap(), zap_pte_range() and the deferred tlbi.
>>>
>>>> ./Run -c 96 -i 1 shell1
>>>> w/o
>>>> Shell Scripts (1 concurrent)                  80765.5 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 80765.5 19048.5
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        19048.5
>>>>
>>>> w
>>>> Shell Scripts (1 concurrent)                  76333.6 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 76333.6 18003.2
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        18003.2
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> After discuss with you, and do some changes in the patch.
>>>>
>>>> ndex a52381a680db..1ecba81f1277 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
>>>>           int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;
>>>>
>>>>           if (pending != flushed) {
>>>> +#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
>>>>                   flush_tlb_mm(mm);
>>>> +#else
>>>> +               dsb(ish);
>>>> +#endif
>>>>
>>> i was guessing the problem might be flush_tlb_batched_pending()
>>> so i asked you to change this to verify my guess.
>>>
>> flush_tlb_batched_pending() looks like the critical path for this issue then the code
>> above can mitigate this.
>>
>> I cannot reproduce this on a 2P 128C Kunpeng920 server. The kernel is based on the
>> v5.19-rc6 and unixbench of version 5.1.3. The result of `./Run -c 128 -i 1 shell1` is:
>>        iter-1      iter-2     iter-3
>> w/o  17708.1     17637.1    17630.1
>> w    17766.0     17752.3    17861.7
>>
>> And flush_tlb_batched_pending()isn't the hot spot with the patch:
>>     7.00%  sh        [kernel.kallsyms]      [k] ptep_clear_flush
>>     4.17%  sh        [kernel.kallsyms]      [k] ptep_set_access_flags
>>     2.43%  multi.sh  [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.98%  sh        [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>     1.69%  sh        [kernel.kallsyms]      [k] next_uptodate_page
>>     1.66%  sort      [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.56%  multi.sh  [kernel.kallsyms]      [k] ptep_set_access_flags
>>     1.27%  sh        [kernel.kallsyms]      [k] page_counter_cancel
>>     1.11%  sh        [kernel.kallsyms]      [k] page_remove_rmap
>>     1.06%  sh        [kernel.kallsyms]      [k] perf_event_alloc
>>
>> Hi Xin Hao,
>>
>> I'm not sure the test setup as well as the config is same with yours. (96C vs 128C
>> should not be the reason I think). Did you check that the 5% is a fluctuation or
>> not? It'll be helpful if more information provided for reproducing this issue.
>>
>> Thanks.
> I guess that is because  "./Run -c 1 -i 1 shell1" isn't an application
> stressed on
> memory. Hi Xin, in what kinds of configurations can we reproduce your test
> result?

Oh, my fault, I do the test is not based on the lastest upstream kernel, there maybe some impact here,
i will do a new test on the lastest kernel.

> As I suppose tlbbatch will mainly affect the performance of user scenarios
> which require memory page-out/page-in like reclaiming file/anon pages.
> "./Run -c 1 -i 1 shell1" on a system with sufficient free memory won't be
> affected by tlbbatch at all, I believe.
>
> Thanks
> Barry

-- 
Best Regards!
Xin Hao


WARNING: multiple messages have this Message-ID (diff)
From: xhao@linux.alibaba.com
To: Barry Song <21cnbao@gmail.com>, Yicong Yang <yangyicong@huawei.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LAK <linux-arm-kernel@lists.infradead.org>, x86 <x86@kernel.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	LKML <linux-kernel@vger.kernel.org>,
	"Darren Hart" <darren@os.amperecomputing.com>,
	huzhanyuan@oppo.com, "李培锋(wink)" <lipeifeng@oppo.com>,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>, 郭健 <guojian@oppo.com>,
	"real mz" <realmz6@gmail.com>,
	linux-mips@vger.kernel.org, openrisc@lists.librecores.org,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>
Subject: Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH
Date: Sat, 23 Jul 2022 17:22:45 +0800	[thread overview]
Message-ID: <3ac4b1a3-8067-3edb-be4f-326e2a4943ed@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4x9hLbXGMU737SShZGS89_4zywyhvkcRfz3W5s_p7O1PA@mail.gmail.com>


On 7/20/22 7:18 PM, Barry Song wrote:
> On Tue, Jul 19, 2022 at 1:28 AM Yicong Yang <yangyicong@huawei.com> wrote:
>> On 2022/7/14 12:51, Barry Song wrote:
>>> On Thu, Jul 14, 2022 at 3:29 PM Xin Hao <xhao@linux.alibaba.com> wrote:
>>>> Hi barry.
>>>>
>>>> I do some test on Kunpeng arm64 machine use Unixbench.
>>>>
>>>> The test  result as below.
>>>>
>>>> One core, we can see the performance improvement above +30%.
>>> I am really pleased to see the 30%+ improvement on unixbench on single core.
>>>
>>>> ./Run -c 1 -i 1 shell1
>>>> w/o
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 5481.0 1292.7
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1292.7
>>>>
>>>> w/
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 6974.6 1645.0
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1645.0
>>>>
>>>>
>>>> But with whole cores, there have little performance degradation above -5%
>>> That is sad as we might get more concurrency between mprotect(), madvise(),
>>> mremap(), zap_pte_range() and the deferred tlbi.
>>>
>>>> ./Run -c 96 -i 1 shell1
>>>> w/o
>>>> Shell Scripts (1 concurrent)                  80765.5 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 80765.5 19048.5
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        19048.5
>>>>
>>>> w
>>>> Shell Scripts (1 concurrent)                  76333.6 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 76333.6 18003.2
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        18003.2
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> After discuss with you, and do some changes in the patch.
>>>>
>>>> ndex a52381a680db..1ecba81f1277 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
>>>>           int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;
>>>>
>>>>           if (pending != flushed) {
>>>> +#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
>>>>                   flush_tlb_mm(mm);
>>>> +#else
>>>> +               dsb(ish);
>>>> +#endif
>>>>
>>> i was guessing the problem might be flush_tlb_batched_pending()
>>> so i asked you to change this to verify my guess.
>>>
>> flush_tlb_batched_pending() looks like the critical path for this issue then the code
>> above can mitigate this.
>>
>> I cannot reproduce this on a 2P 128C Kunpeng920 server. The kernel is based on the
>> v5.19-rc6 and unixbench of version 5.1.3. The result of `./Run -c 128 -i 1 shell1` is:
>>        iter-1      iter-2     iter-3
>> w/o  17708.1     17637.1    17630.1
>> w    17766.0     17752.3    17861.7
>>
>> And flush_tlb_batched_pending()isn't the hot spot with the patch:
>>     7.00%  sh        [kernel.kallsyms]      [k] ptep_clear_flush
>>     4.17%  sh        [kernel.kallsyms]      [k] ptep_set_access_flags
>>     2.43%  multi.sh  [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.98%  sh        [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>     1.69%  sh        [kernel.kallsyms]      [k] next_uptodate_page
>>     1.66%  sort      [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.56%  multi.sh  [kernel.kallsyms]      [k] ptep_set_access_flags
>>     1.27%  sh        [kernel.kallsyms]      [k] page_counter_cancel
>>     1.11%  sh        [kernel.kallsyms]      [k] page_remove_rmap
>>     1.06%  sh        [kernel.kallsyms]      [k] perf_event_alloc
>>
>> Hi Xin Hao,
>>
>> I'm not sure the test setup as well as the config is same with yours. (96C vs 128C
>> should not be the reason I think). Did you check that the 5% is a fluctuation or
>> not? It'll be helpful if more information provided for reproducing this issue.
>>
>> Thanks.
> I guess that is because  "./Run -c 1 -i 1 shell1" isn't an application
> stressed on
> memory. Hi Xin, in what kinds of configurations can we reproduce your test
> result?

Oh, my fault, I do the test is not based on the lastest upstream kernel, there maybe some impact here,
i will do a new test on the lastest kernel.

> As I suppose tlbbatch will mainly affect the performance of user scenarios
> which require memory page-out/page-in like reclaiming file/anon pages.
> "./Run -c 1 -i 1 shell1" on a system with sufficient free memory won't be
> affected by tlbbatch at all, I believe.
>
> Thanks
> Barry

-- 
Best Regards!
Xin Hao


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: xhao@linux.alibaba.com
To: Barry Song <21cnbao@gmail.com>, Yicong Yang <yangyicong@huawei.com>
Cc: "Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	Linux-MM <linux-mm@kvack.org>, 郭健 <guojian@oppo.com>,
	linux-riscv@lists.infradead.org, "Will Deacon" <will@kernel.org>,
	linux-s390@vger.kernel.org,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>,
	"李培锋(wink)" <lipeifeng@oppo.com>,
	"Jonathan Corbet" <corbet@lwn.net>, x86 <x86@kernel.org>,
	linux-mips@vger.kernel.org, "Arnd Bergmann" <arnd@arndb.de>,
	"real mz" <realmz6@gmail.com>,
	openrisc@lists.librecores.org,
	"Darren Hart" <darren@os.amperecomputing.com>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	huzhanyuan@oppo.com, "tiantao (H)" <tiantao6@hisilicon.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH
Date: Sat, 23 Jul 2022 17:22:45 +0800	[thread overview]
Message-ID: <3ac4b1a3-8067-3edb-be4f-326e2a4943ed@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4x9hLbXGMU737SShZGS89_4zywyhvkcRfz3W5s_p7O1PA@mail.gmail.com>


On 7/20/22 7:18 PM, Barry Song wrote:
> On Tue, Jul 19, 2022 at 1:28 AM Yicong Yang <yangyicong@huawei.com> wrote:
>> On 2022/7/14 12:51, Barry Song wrote:
>>> On Thu, Jul 14, 2022 at 3:29 PM Xin Hao <xhao@linux.alibaba.com> wrote:
>>>> Hi barry.
>>>>
>>>> I do some test on Kunpeng arm64 machine use Unixbench.
>>>>
>>>> The test  result as below.
>>>>
>>>> One core, we can see the performance improvement above +30%.
>>> I am really pleased to see the 30%+ improvement on unixbench on single core.
>>>
>>>> ./Run -c 1 -i 1 shell1
>>>> w/o
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 5481.0 1292.7
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1292.7
>>>>
>>>> w/
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 6974.6 1645.0
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1645.0
>>>>
>>>>
>>>> But with whole cores, there have little performance degradation above -5%
>>> That is sad as we might get more concurrency between mprotect(), madvise(),
>>> mremap(), zap_pte_range() and the deferred tlbi.
>>>
>>>> ./Run -c 96 -i 1 shell1
>>>> w/o
>>>> Shell Scripts (1 concurrent)                  80765.5 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 80765.5 19048.5
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        19048.5
>>>>
>>>> w
>>>> Shell Scripts (1 concurrent)                  76333.6 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 76333.6 18003.2
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        18003.2
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> After discuss with you, and do some changes in the patch.
>>>>
>>>> ndex a52381a680db..1ecba81f1277 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
>>>>           int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;
>>>>
>>>>           if (pending != flushed) {
>>>> +#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
>>>>                   flush_tlb_mm(mm);
>>>> +#else
>>>> +               dsb(ish);
>>>> +#endif
>>>>
>>> i was guessing the problem might be flush_tlb_batched_pending()
>>> so i asked you to change this to verify my guess.
>>>
>> flush_tlb_batched_pending() looks like the critical path for this issue then the code
>> above can mitigate this.
>>
>> I cannot reproduce this on a 2P 128C Kunpeng920 server. The kernel is based on the
>> v5.19-rc6 and unixbench of version 5.1.3. The result of `./Run -c 128 -i 1 shell1` is:
>>        iter-1      iter-2     iter-3
>> w/o  17708.1     17637.1    17630.1
>> w    17766.0     17752.3    17861.7
>>
>> And flush_tlb_batched_pending()isn't the hot spot with the patch:
>>     7.00%  sh        [kernel.kallsyms]      [k] ptep_clear_flush
>>     4.17%  sh        [kernel.kallsyms]      [k] ptep_set_access_flags
>>     2.43%  multi.sh  [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.98%  sh        [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>     1.69%  sh        [kernel.kallsyms]      [k] next_uptodate_page
>>     1.66%  sort      [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.56%  multi.sh  [kernel.kallsyms]      [k] ptep_set_access_flags
>>     1.27%  sh        [kernel.kallsyms]      [k] page_counter_cancel
>>     1.11%  sh        [kernel.kallsyms]      [k] page_remove_rmap
>>     1.06%  sh        [kernel.kallsyms]      [k] perf_event_alloc
>>
>> Hi Xin Hao,
>>
>> I'm not sure the test setup as well as the config is same with yours. (96C vs 128C
>> should not be the reason I think). Did you check that the 5% is a fluctuation or
>> not? It'll be helpful if more information provided for reproducing this issue.
>>
>> Thanks.
> I guess that is because  "./Run -c 1 -i 1 shell1" isn't an application
> stressed on
> memory. Hi Xin, in what kinds of configurations can we reproduce your test
> result?

Oh, my fault, I do the test is not based on the lastest upstream kernel, there maybe some impact here,
i will do a new test on the lastest kernel.

> As I suppose tlbbatch will mainly affect the performance of user scenarios
> which require memory page-out/page-in like reclaiming file/anon pages.
> "./Run -c 1 -i 1 shell1" on a system with sufficient free memory won't be
> affected by tlbbatch at all, I believe.
>
> Thanks
> Barry

-- 
Best Regards!
Xin Hao


WARNING: multiple messages have this Message-ID (diff)
From: xhao@linux.alibaba.com
To: Barry Song <21cnbao@gmail.com>, Yicong Yang <yangyicong@huawei.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LAK <linux-arm-kernel@lists.infradead.org>, x86 <x86@kernel.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	LKML <linux-kernel@vger.kernel.org>,
	"Darren Hart" <darren@os.amperecomputing.com>,
	huzhanyuan@oppo.com, "李培锋(wink)" <lipeifeng@oppo.com>,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>, 郭健 <guojian@oppo.com>,
	"real mz" <realmz6@gmail.com>,
	linux-mips@vger.kernel.org, openrisc@lists.librecores.org,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	"tiantao (H)" <tiantao6@hisilicon.com>
Subject: Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH
Date: Sat, 23 Jul 2022 17:22:45 +0800	[thread overview]
Message-ID: <3ac4b1a3-8067-3edb-be4f-326e2a4943ed@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4x9hLbXGMU737SShZGS89_4zywyhvkcRfz3W5s_p7O1PA@mail.gmail.com>


On 7/20/22 7:18 PM, Barry Song wrote:
> On Tue, Jul 19, 2022 at 1:28 AM Yicong Yang <yangyicong@huawei.com> wrote:
>> On 2022/7/14 12:51, Barry Song wrote:
>>> On Thu, Jul 14, 2022 at 3:29 PM Xin Hao <xhao@linux.alibaba.com> wrote:
>>>> Hi barry.
>>>>
>>>> I do some test on Kunpeng arm64 machine use Unixbench.
>>>>
>>>> The test  result as below.
>>>>
>>>> One core, we can see the performance improvement above +30%.
>>> I am really pleased to see the 30%+ improvement on unixbench on single core.
>>>
>>>> ./Run -c 1 -i 1 shell1
>>>> w/o
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 5481.0 1292.7
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1292.7
>>>>
>>>> w/
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 6974.6 1645.0
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                         1645.0
>>>>
>>>>
>>>> But with whole cores, there have little performance degradation above -5%
>>> That is sad as we might get more concurrency between mprotect(), madvise(),
>>> mremap(), zap_pte_range() and the deferred tlbi.
>>>
>>>> ./Run -c 96 -i 1 shell1
>>>> w/o
>>>> Shell Scripts (1 concurrent)                  80765.5 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 80765.5 19048.5
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        19048.5
>>>>
>>>> w
>>>> Shell Scripts (1 concurrent)                  76333.6 lpm   (60.0 s, 1
>>>> samples)
>>>> System Benchmarks Partial Index              BASELINE RESULT INDEX
>>>> Shell Scripts (1 concurrent)                     42.4 76333.6 18003.2
>>>> ========
>>>> System Benchmarks Index Score (Partial Only)                        18003.2
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> After discuss with you, and do some changes in the patch.
>>>>
>>>> ndex a52381a680db..1ecba81f1277 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
>>>>           int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;
>>>>
>>>>           if (pending != flushed) {
>>>> +#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
>>>>                   flush_tlb_mm(mm);
>>>> +#else
>>>> +               dsb(ish);
>>>> +#endif
>>>>
>>> i was guessing the problem might be flush_tlb_batched_pending()
>>> so i asked you to change this to verify my guess.
>>>
>> flush_tlb_batched_pending() looks like the critical path for this issue then the code
>> above can mitigate this.
>>
>> I cannot reproduce this on a 2P 128C Kunpeng920 server. The kernel is based on the
>> v5.19-rc6 and unixbench of version 5.1.3. The result of `./Run -c 128 -i 1 shell1` is:
>>        iter-1      iter-2     iter-3
>> w/o  17708.1     17637.1    17630.1
>> w    17766.0     17752.3    17861.7
>>
>> And flush_tlb_batched_pending()isn't the hot spot with the patch:
>>     7.00%  sh        [kernel.kallsyms]      [k] ptep_clear_flush
>>     4.17%  sh        [kernel.kallsyms]      [k] ptep_set_access_flags
>>     2.43%  multi.sh  [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.98%  sh        [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>     1.69%  sh        [kernel.kallsyms]      [k] next_uptodate_page
>>     1.66%  sort      [kernel.kallsyms]      [k] ptep_clear_flush
>>     1.56%  multi.sh  [kernel.kallsyms]      [k] ptep_set_access_flags
>>     1.27%  sh        [kernel.kallsyms]      [k] page_counter_cancel
>>     1.11%  sh        [kernel.kallsyms]      [k] page_remove_rmap
>>     1.06%  sh        [kernel.kallsyms]      [k] perf_event_alloc
>>
>> Hi Xin Hao,
>>
>> I'm not sure the test setup as well as the config is same with yours. (96C vs 128C
>> should not be the reason I think). Did you check that the 5% is a fluctuation or
>> not? It'll be helpful if more information provided for reproducing this issue.
>>
>> Thanks.
> I guess that is because  "./Run -c 1 -i 1 shell1" isn't an application
> stressed on
> memory. Hi Xin, in what kinds of configurations can we reproduce your test
> result?

Oh, my fault, I do the test is not based on the lastest upstream kernel, there maybe some impact here,
i will do a new test on the lastest kernel.

> As I suppose tlbbatch will mainly affect the performance of user scenarios
> which require memory page-out/page-in like reclaiming file/anon pages.
> "./Run -c 1 -i 1 shell1" on a system with sufficient free memory won't be
> affected by tlbbatch at all, I believe.
>
> Thanks
> Barry

-- 
Best Regards!
Xin Hao


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-07-23  9:23 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-11  3:46 [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH Barry Song
2022-07-11  3:46 ` Barry Song
2022-07-11  3:46 ` Barry Song
2022-07-11  3:46 ` Barry Song
2022-07-11  3:46 ` [PATCH v2 1/4] Revert "Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64" Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46 ` [PATCH v2 2/4] mm: rmap: Allow platforms without mm_cpumask to defer TLB flush Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11 13:35   ` Kefeng Wang
2022-07-11 13:35     ` Kefeng Wang
2022-07-11 13:35     ` Kefeng Wang
2022-07-11 13:35     ` Kefeng Wang
2022-07-11 22:52     ` Barry Song
2022-07-11 22:52       ` Barry Song
2022-07-11 22:52       ` Barry Song
2022-07-11 22:52       ` Barry Song
2022-07-11  3:46 ` [PATCH v2 3/4] mm: rmap: Extend tlbbatch APIs to fit new platforms Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46 ` [PATCH v2 4/4] arm64: support batched/deferred tlb shootdown during page reclamation Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-11  3:46   ` Barry Song
2022-07-14  3:28 ` [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH Xin Hao
2022-07-14  3:28   ` Xin Hao
2022-07-14  3:28   ` Xin Hao
2022-07-14  3:28   ` Xin Hao
2022-07-14  4:51   ` Barry Song
2022-07-14  4:51     ` Barry Song
2022-07-14  4:51     ` Barry Song
2022-07-14  4:51     ` Barry Song
2022-07-15  2:47     ` Yicong Yang
2022-07-15  2:47       ` Yicong Yang
2022-07-15  2:47       ` Yicong Yang
2022-07-15  2:47       ` Yicong Yang
2022-07-18 13:28     ` Yicong Yang
2022-07-18 13:28       ` Yicong Yang
2022-07-18 13:28       ` Yicong Yang
2022-07-18 13:28       ` Yicong Yang
2022-07-20 11:18       ` Barry Song
2022-07-20 11:18         ` Barry Song
2022-07-20 11:18         ` Barry Song
2022-07-20 11:18         ` Barry Song
2022-07-23  9:22         ` xhao [this message]
2022-07-23  9:22           ` xhao
2022-07-23  9:22           ` xhao
2022-07-23  9:22           ` xhao
2022-07-23  9:17       ` xhao
2022-07-23  9:17         ` xhao
2022-07-23  9:17         ` xhao
2022-07-23  9:17         ` xhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ac4b1a3-8067-3edb-be4f-326e2a4943ed@linux.alibaba.com \
    --to=xhao@linux.alibaba.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=darren@os.amperecomputing.com \
    --cc=guojian@oppo.com \
    --cc=huzhanyuan@oppo.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lipeifeng@oppo.com \
    --cc=openrisc@lists.librecores.org \
    --cc=realmz6@gmail.com \
    --cc=tiantao6@hisilicon.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yangyicong@hisilicon.com \
    --cc=yangyicong@huawei.com \
    --cc=zhangshiming@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.