From: Andrew Morton <akpm@linux-foundation.org> To: Aaron Lu <aaron.lu@intel.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, Dave Hansen <dave.hansen@intel.com>, Kemi Wang <kemi.wang@intel.com>, Tim Chen <tim.c.chen@linux.intel.com>, Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Matthew Wilcox <willy@infradead.org>, David Rientjes <rientjes@google.com> Subject: Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Date: Fri, 2 Mar 2018 13:23:32 -0800 [thread overview] Message-ID: <20180302132332.2c69559686ff24d15ff44ae8@linux-foundation.org> (raw) In-Reply-To: <20180302080125.GB6356@intel.com> On Fri, 2 Mar 2018 16:01:25 +0800 Aaron Lu <aaron.lu@intel.com> wrote: > On Thu, Mar 01, 2018 at 04:01:05PM -0800, Andrew Morton wrote: > > On Thu, 1 Mar 2018 14:28:44 +0800 Aaron Lu <aaron.lu@intel.com> wrote: > > > > > When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy, > > > the zone->lock is held and then pages are chosen from PCP's migratetype > > > list. While there is actually no need to do this 'choose part' under > > > lock since it's PCP pages, the only CPU that can touch them is us and > > > irq is also disabled. > > > > > > Moving this part outside could reduce lock held time and improve > > > performance. Test with will-it-scale/page_fault1 full load: > > > > > > kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) > > > v4.16-rc2+ 9034215 7971818 13667135 15677465 > > > this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% > > > > > > What the test does is: starts $nr_cpu processes and each will repeatedly > > > do the following for 5 minutes: > > > 1 mmap 128M anonymouse space; > > > 2 write access to that space; > > > 3 munmap. > > > The score is the aggregated iteration. > > > > But it's a loss for uniprocessor systems: it adds more code and adds an > > additional pass across a list. > > Performance wise, I assume the loss is pretty small and can not > be measured. > > On my Sandybridge desktop, with will-it-scale/page_fault1/single process > run to emulate uniprocessor system, the score is(average of 3 runs): > > base(patch 1/3): 649710 > this patch: 653554 +0.6% Does that mean we got faster or slower? > prefetch(patch 3/3): 650336 (in noise range compared to base) > > On 4 sockets Intel Broadwell with will-it-scale/page_fault1/single > process run: > > base(patch 1/3): 498649 > this patch: 504171 +1.1% > prefetch(patch 3/3): 506334 +1.5% (compared to base) > > It looks like we don't need to worry too much about performance for > uniprocessor system. Well. We can say that of hundreds of patches. And we end up with a fatter and slower kernel than we otherwise would. Please take a look, see if there's a tidy way of avoiding this. Probably there isn't, in which case oh well. But let's at least try.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org> To: Aaron Lu <aaron.lu@intel.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying <ying.huang@intel.com>, Dave Hansen <dave.hansen@intel.com>, Kemi Wang <kemi.wang@intel.com>, Tim Chen <tim.c.chen@linux.intel.com>, Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Matthew Wilcox <willy@infradead.org>, David Rientjes <rientjes@google.com> Subject: Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Date: Fri, 2 Mar 2018 13:23:32 -0800 [thread overview] Message-ID: <20180302132332.2c69559686ff24d15ff44ae8@linux-foundation.org> (raw) In-Reply-To: <20180302080125.GB6356@intel.com> On Fri, 2 Mar 2018 16:01:25 +0800 Aaron Lu <aaron.lu@intel.com> wrote: > On Thu, Mar 01, 2018 at 04:01:05PM -0800, Andrew Morton wrote: > > On Thu, 1 Mar 2018 14:28:44 +0800 Aaron Lu <aaron.lu@intel.com> wrote: > > > > > When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy, > > > the zone->lock is held and then pages are chosen from PCP's migratetype > > > list. While there is actually no need to do this 'choose part' under > > > lock since it's PCP pages, the only CPU that can touch them is us and > > > irq is also disabled. > > > > > > Moving this part outside could reduce lock held time and improve > > > performance. Test with will-it-scale/page_fault1 full load: > > > > > > kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S) > > > v4.16-rc2+ 9034215 7971818 13667135 15677465 > > > this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4% > > > > > > What the test does is: starts $nr_cpu processes and each will repeatedly > > > do the following for 5 minutes: > > > 1 mmap 128M anonymouse space; > > > 2 write access to that space; > > > 3 munmap. > > > The score is the aggregated iteration. > > > > But it's a loss for uniprocessor systems: it adds more code and adds an > > additional pass across a list. > > Performance wise, I assume the loss is pretty small and can not > be measured. > > On my Sandybridge desktop, with will-it-scale/page_fault1/single process > run to emulate uniprocessor system, the score is(average of 3 runs): > > base(patch 1/3): 649710 > this patch: 653554 +0.6% Does that mean we got faster or slower? > prefetch(patch 3/3): 650336 (in noise range compared to base) > > On 4 sockets Intel Broadwell with will-it-scale/page_fault1/single > process run: > > base(patch 1/3): 498649 > this patch: 504171 +1.1% > prefetch(patch 3/3): 506334 +1.5% (compared to base) > > It looks like we don't need to worry too much about performance for > uniprocessor system. Well. We can say that of hundreds of patches. And we end up with a fatter and slower kernel than we otherwise would. Please take a look, see if there's a tidy way of avoiding this. Probably there isn't, in which case oh well. But let's at least try. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-03-02 21:23 UTC|newest] Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-01 6:28 [PATCH v4 0/3] mm: improve zone->lock scalability Aaron Lu 2018-03-01 6:28 ` Aaron Lu 2018-03-01 6:28 ` [PATCH v4 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu 2018-03-01 6:28 ` Aaron Lu 2018-03-01 12:11 ` David Rientjes 2018-03-01 12:11 ` David Rientjes 2018-03-01 13:45 ` Michal Hocko 2018-03-01 13:45 ` Michal Hocko 2018-03-12 13:22 ` Vlastimil Babka 2018-03-13 2:11 ` Aaron Lu 2018-03-01 6:28 ` [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu 2018-03-01 6:28 ` Aaron Lu 2018-03-01 13:55 ` Michal Hocko 2018-03-01 13:55 ` Michal Hocko 2018-03-02 7:15 ` Aaron Lu 2018-03-02 7:15 ` Aaron Lu 2018-03-02 15:34 ` Dave Hansen 2018-03-02 15:34 ` Dave Hansen 2018-03-02 7:31 ` Huang, Ying 2018-03-02 7:31 ` Huang, Ying 2018-03-02 0:01 ` Andrew Morton 2018-03-02 0:01 ` Andrew Morton 2018-03-02 8:01 ` Aaron Lu 2018-03-02 8:01 ` Aaron Lu 2018-03-02 21:23 ` Andrew Morton [this message] 2018-03-02 21:23 ` Andrew Morton 2018-03-02 21:25 ` Dave Hansen 2018-03-02 21:25 ` Dave Hansen 2018-03-12 14:22 ` Vlastimil Babka 2018-03-13 3:34 ` Aaron Lu 2018-03-22 15:17 ` Matthew Wilcox 2018-03-26 3:03 ` Aaron Lu 2018-03-01 6:28 ` [PATCH v4 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu 2018-03-01 6:28 ` Aaron Lu 2018-03-01 14:00 ` Michal Hocko 2018-03-01 14:00 ` Michal Hocko 2018-03-02 8:31 ` Aaron Lu 2018-03-02 8:31 ` Aaron Lu 2018-03-02 17:55 ` Vlastimil Babka 2018-03-02 17:55 ` Vlastimil Babka 2018-03-02 18:00 ` Dave Hansen 2018-03-02 18:00 ` Dave Hansen 2018-03-02 18:08 ` Vlastimil Babka 2018-03-02 18:08 ` Vlastimil Babka 2018-03-05 11:41 ` Aaron Lu 2018-03-05 11:41 ` Aaron Lu 2018-03-05 11:48 ` Aaron Lu 2018-03-05 11:48 ` Aaron Lu 2018-03-06 7:55 ` Vlastimil Babka 2018-03-06 7:55 ` Vlastimil Babka 2018-03-06 12:27 ` Aaron Lu 2018-03-06 12:27 ` Aaron Lu 2018-03-06 12:53 ` Matthew Wilcox 2018-03-06 12:53 ` Matthew Wilcox 2018-03-02 0:09 ` Andrew Morton 2018-03-02 0:09 ` Andrew Morton 2018-03-02 8:27 ` Aaron Lu 2018-03-02 8:27 ` Aaron Lu 2018-03-09 8:24 ` [PATCH v4 3/3 update] " Aaron Lu 2018-03-09 21:58 ` Andrew Morton 2018-03-10 14:46 ` Aaron Lu 2018-03-12 15:05 ` Vlastimil Babka 2018-03-12 17:32 ` Dave Hansen 2018-03-13 3:35 ` Aaron Lu 2018-03-13 7:04 ` Aaron Lu 2018-03-20 9:50 ` Vlastimil Babka 2018-03-20 11:31 ` [PATCH v4 3/3 update2] " Aaron Lu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180302132332.2c69559686ff24d15ff44ae8@linux-foundation.org \ --to=akpm@linux-foundation.org \ --cc=aaron.lu@intel.com \ --cc=ak@linux.intel.com \ --cc=dave.hansen@intel.com \ --cc=kemi.wang@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@suse.com \ --cc=rientjes@google.com \ --cc=tim.c.chen@linux.intel.com \ --cc=vbabka@suse.cz \ --cc=willy@infradead.org \ --cc=ying.huang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.