From: Aaron Lu <aaron.lu@intel.com> To: "ying.huang@intel.com" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net>, kernel test robot <oliver.sang@intel.com>, Linus Torvalds <torvalds@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, Dave Hansen <dave.hansen@linux.intel.com>, Jesper Dangaard Brouer <brouer@redhat.com>, Michal Hocko <mhocko@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, <lkp@lists.01.org>, <lkp@intel.com>, <feng.tang@intel.com>, <zhengjun.xing@linux.intel.com>, <fengwei.yin@intel.com> Subject: Re: [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression Date: Tue, 10 May 2022 11:43:16 +0800 [thread overview] Message-ID: <c11ae803-cea7-8b7f-9992-2f640c90f104@intel.com> (raw) In-Reply-To: <d13688d1483e9d87ec477292893f2916832b3bdc.camel@intel.com> On 5/7/2022 3:44 PM, ying.huang@intel.com wrote: > On Sat, 2022-05-07 at 15:31 +0800, Aaron Lu wrote: ... ... >> >> I thought the overhead of changing the cache line from "shared" to >> "own"/"modify" is pretty cheap. > > This is the read/write pattern of cache ping-pong. Although it should > be cheaper than the write/write pattern of cache ping-pong in theory, we > have gotten sevious regression for that before. > Can you point me to the regression report? I would like to take a look, thanks. >> Also, this is the same case as the Skylake desktop machine, why it is a >> gain there but a loss here? > > I guess the reason is the private cache size. The size of the private > L2 cache of SKL server is much larger than that of SKL client (1MB vs. > 256KB). So there's much more core-2-core traffic on SKL server. > It could be. The 256KiB L2 in Skylake desktop can only store 8 order-3 pages and that means the allocator side may have a higher chance of reusing a page that is evicted from the free cpu's L2 cache than the server machine, whose L2 can store 40 order-3 pages. I can do more tests using different high for the two machines: 1) high=0, this is the case when page reuse is the extreme. core-2-core transfer should be the most. This is the behavior of this bisected commit. 2) high=L2_size, this is the case when page reuse is fewer compared to the above case, core-2-core should still be the majority. 3) high=2 times of L2_size and smaller than llc size, this is the case when cache reuse is further reduced, and when the page is indeed reused, it shouldn't cause core-2-core transfer but can benefit from llc. 4) high>llc_size, this is the case when page reuse is the least and when page is indeed reused, it is likely not in the entire cache hierarchy. This is the behavior of this bisected commit's parent commit for the Skylake desktop machine. I expect case 3) should give us the best performance and 1) or 4) is the worst for this testcase. case 4) is difficult to test on the server machine due to the cap of pcp->high which is affected by the low watermark of the zone. The server machine has 128 cpus but only 128G memory, which makes the pcp->high capped at 421, while llc size is 40MiB and that translates to a page number of 12288. >> Is it that this "overhead" is much greater >> in server machine to the extent that it is even better to use a totally >> cold page than a hot one? > > Yes. And I think the private cache size matters here. And after being > evicted from the private cache (L1/L2), the cache lines of the reused > pages will go to shared cache (L3), that will help performance. > Sounds reasonable. >> If so, it seems to suggest we should avoid >> cache reuse in server machine unless the two CPUs happens to be two >> hyperthreads of the same core. > > Yes. I think so.
WARNING: multiple messages have this Message-ID (diff)
From: Aaron Lu <aaron.lu@intel.com> To: lkp@lists.01.org Subject: Re: [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression Date: Tue, 10 May 2022 11:43:16 +0800 [thread overview] Message-ID: <c11ae803-cea7-8b7f-9992-2f640c90f104@intel.com> (raw) In-Reply-To: <d13688d1483e9d87ec477292893f2916832b3bdc.camel@intel.com> [-- Attachment #1: Type: text/plain, Size: 2910 bytes --] On 5/7/2022 3:44 PM, ying.huang(a)intel.com wrote: > On Sat, 2022-05-07 at 15:31 +0800, Aaron Lu wrote: ... ... >> >> I thought the overhead of changing the cache line from "shared" to >> "own"/"modify" is pretty cheap. > > This is the read/write pattern of cache ping-pong. Although it should > be cheaper than the write/write pattern of cache ping-pong in theory, we > have gotten sevious regression for that before. > Can you point me to the regression report? I would like to take a look, thanks. >> Also, this is the same case as the Skylake desktop machine, why it is a >> gain there but a loss here? > > I guess the reason is the private cache size. The size of the private > L2 cache of SKL server is much larger than that of SKL client (1MB vs. > 256KB). So there's much more core-2-core traffic on SKL server. > It could be. The 256KiB L2 in Skylake desktop can only store 8 order-3 pages and that means the allocator side may have a higher chance of reusing a page that is evicted from the free cpu's L2 cache than the server machine, whose L2 can store 40 order-3 pages. I can do more tests using different high for the two machines: 1) high=0, this is the case when page reuse is the extreme. core-2-core transfer should be the most. This is the behavior of this bisected commit. 2) high=L2_size, this is the case when page reuse is fewer compared to the above case, core-2-core should still be the majority. 3) high=2 times of L2_size and smaller than llc size, this is the case when cache reuse is further reduced, and when the page is indeed reused, it shouldn't cause core-2-core transfer but can benefit from llc. 4) high>llc_size, this is the case when page reuse is the least and when page is indeed reused, it is likely not in the entire cache hierarchy. This is the behavior of this bisected commit's parent commit for the Skylake desktop machine. I expect case 3) should give us the best performance and 1) or 4) is the worst for this testcase. case 4) is difficult to test on the server machine due to the cap of pcp->high which is affected by the low watermark of the zone. The server machine has 128 cpus but only 128G memory, which makes the pcp->high capped at 421, while llc size is 40MiB and that translates to a page number of 12288. >> Is it that this "overhead" is much greater >> in server machine to the extent that it is even better to use a totally >> cold page than a hot one? > > Yes. And I think the private cache size matters here. And after being > evicted from the private cache (L1/L2), the cache lines of the reused > pages will go to shared cache (L3), that will help performance. > Sounds reasonable. >> If so, it seems to suggest we should avoid >> cache reuse in server machine unless the two CPUs happens to be two >> hyperthreads of the same core. > > Yes. I think so.
next prev parent reply other threads:[~2022-05-10 3:43 UTC|newest] Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-04-20 1:35 [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression kernel test robot 2022-04-20 1:35 ` kernel test robot 2022-04-29 11:29 ` Aaron Lu 2022-04-29 11:29 ` Aaron Lu 2022-04-29 13:39 ` Mel Gorman 2022-04-29 13:39 ` Mel Gorman 2022-05-05 8:27 ` Aaron Lu 2022-05-05 8:27 ` Aaron Lu 2022-05-05 11:09 ` Mel Gorman 2022-05-05 11:09 ` Mel Gorman 2022-05-05 14:29 ` Aaron Lu 2022-05-05 14:29 ` Aaron Lu 2022-05-06 8:40 ` ying.huang 2022-05-06 8:40 ` ying.huang 2022-05-06 12:17 ` Aaron Lu 2022-05-06 12:17 ` Aaron Lu 2022-05-07 0:54 ` ying.huang 2022-05-07 0:54 ` ying.huang 2022-05-07 3:27 ` Aaron Lu 2022-05-07 3:27 ` Aaron Lu 2022-05-07 7:11 ` ying.huang 2022-05-07 7:11 ` ying.huang 2022-05-07 7:31 ` Aaron Lu 2022-05-07 7:31 ` Aaron Lu 2022-05-07 7:44 ` ying.huang 2022-05-07 7:44 ` ying.huang 2022-05-10 3:43 ` Aaron Lu [this message] 2022-05-10 3:43 ` Aaron Lu 2022-05-10 6:23 ` ying.huang 2022-05-10 6:23 ` ying.huang 2022-05-10 18:05 ` Linus Torvalds 2022-05-10 18:05 ` Linus Torvalds 2022-05-10 18:47 ` Waiman Long 2022-05-10 18:47 ` Waiman Long 2022-05-10 19:03 ` Linus Torvalds 2022-05-10 19:03 ` Linus Torvalds 2022-05-10 19:25 ` Linus Torvalds 2022-05-10 19:25 ` Linus Torvalds 2022-05-10 19:46 ` Waiman Long 2022-05-10 19:46 ` Waiman Long 2022-05-10 19:27 ` Peter Zijlstra 2022-05-10 19:27 ` Peter Zijlstra 2022-05-11 1:58 ` ying.huang 2022-05-11 1:58 ` ying.huang 2022-05-11 2:06 ` Waiman Long 2022-05-11 2:06 ` Waiman Long 2022-05-11 11:04 ` Aaron Lu 2022-05-11 11:04 ` Aaron Lu 2022-05-12 3:17 ` ying.huang 2022-05-12 3:17 ` ying.huang 2022-05-12 12:45 ` Aaron Lu 2022-05-12 12:45 ` Aaron Lu 2022-05-12 17:42 ` Linus Torvalds 2022-05-12 17:42 ` Linus Torvalds 2022-05-12 18:06 ` Andrew Morton 2022-05-12 18:06 ` Andrew Morton 2022-05-12 18:49 ` Linus Torvalds 2022-05-12 18:49 ` Linus Torvalds 2022-06-14 2:09 ` Feng Tang 2022-06-14 2:09 ` Feng Tang 2022-05-13 6:19 ` ying.huang 2022-05-13 6:19 ` ying.huang 2022-05-11 3:40 ` Aaron Lu 2022-05-11 3:40 ` Aaron Lu 2022-05-11 7:32 ` ying.huang 2022-05-11 7:32 ` ying.huang 2022-05-11 7:53 ` Aaron Lu 2022-05-11 7:53 ` Aaron Lu 2022-06-01 2:19 ` Aaron Lu 2022-06-01 2:19 ` Aaron Lu 2022-05-11 12:13 ` [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression #forregzbot Thorsten Leemhuis 2022-05-13 8:37 ` Thorsten Leemhuis 2022-09-08 11:39 ` Thorsten Leemhuis
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c11ae803-cea7-8b7f-9992-2f640c90f104@intel.com \ --to=aaron.lu@intel.com \ --cc=akpm@linux-foundation.org \ --cc=brouer@redhat.com \ --cc=dave.hansen@linux.intel.com \ --cc=feng.tang@intel.com \ --cc=fengwei.yin@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@intel.com \ --cc=lkp@lists.01.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@kernel.org \ --cc=oliver.sang@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ --cc=ying.huang@intel.com \ --cc=zhengjun.xing@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.