All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Aaron Lu <aaron.lu@intel.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Kemi Wang <kemi.wang@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	"Matthew Wilcox" <willy@infradead.org>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
Date: Fri, 02 Mar 2018 15:31:40 +0800	[thread overview]
Message-ID: <87r2p3c4rn.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20180301135518.GJ15057@dhcp22.suse.cz> (Michal Hocko's message of "Thu, 1 Mar 2018 14:55:18 +0100")

Michal Hocko <mhocko@kernel.org> writes:

> On Thu 01-03-18 14:28:44, Aaron Lu wrote:
>> When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy,
>> the zone->lock is held and then pages are chosen from PCP's migratetype
>> list. While there is actually no need to do this 'choose part' under
>> lock since it's PCP pages, the only CPU that can touch them is us and
>> irq is also disabled.
>> 
>> Moving this part outside could reduce lock held time and improve
>> performance. Test with will-it-scale/page_fault1 full load:
>> 
>> kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
>> v4.16-rc2+  9034215        7971818       13667135       15677465
>> this patch  9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
>> 
>> What the test does is: starts $nr_cpu processes and each will repeatedly
>> do the following for 5 minutes:
>> 1 mmap 128M anonymouse space;
>> 2 write access to that space;
>> 3 munmap.
>> The score is the aggregated iteration.
>
> Iteration count I assume. I am still quite surprised that this would
> have such a large impact.

The test is run with full load, this means near or more than 100
processes will allocate memory in parallel.  According to Amdahl's law,
the performance of a parallel program will be dominated by the serial
part.  For this case, the part protected by zone->lock.  So small
changes to code under zone->lock could make bigger changes to overall
score.

Best Regards,
Huang, Ying

WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Aaron Lu <aaron.lu@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Kemi Wang <kemi.wang@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
Date: Fri, 02 Mar 2018 15:31:40 +0800	[thread overview]
Message-ID: <87r2p3c4rn.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20180301135518.GJ15057@dhcp22.suse.cz> (Michal Hocko's message of "Thu, 1 Mar 2018 14:55:18 +0100")

Michal Hocko <mhocko@kernel.org> writes:

> On Thu 01-03-18 14:28:44, Aaron Lu wrote:
>> When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy,
>> the zone->lock is held and then pages are chosen from PCP's migratetype
>> list. While there is actually no need to do this 'choose part' under
>> lock since it's PCP pages, the only CPU that can touch them is us and
>> irq is also disabled.
>> 
>> Moving this part outside could reduce lock held time and improve
>> performance. Test with will-it-scale/page_fault1 full load:
>> 
>> kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
>> v4.16-rc2+  9034215        7971818       13667135       15677465
>> this patch  9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
>> 
>> What the test does is: starts $nr_cpu processes and each will repeatedly
>> do the following for 5 minutes:
>> 1 mmap 128M anonymouse space;
>> 2 write access to that space;
>> 3 munmap.
>> The score is the aggregated iteration.
>
> Iteration count I assume. I am still quite surprised that this would
> have such a large impact.

The test is run with full load, this means near or more than 100
processes will allocate memory in parallel.  According to Amdahl's law,
the performance of a parallel program will be dominated by the serial
part.  For this case, the part protected by zone->lock.  So small
changes to code under zone->lock could make bigger changes to overall
score.

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2018-03-02  7:31 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-01  6:28 [PATCH v4 0/3] mm: improve zone->lock scalability Aaron Lu
2018-03-01  6:28 ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 12:11   ` David Rientjes
2018-03-01 12:11     ` David Rientjes
2018-03-01 13:45   ` Michal Hocko
2018-03-01 13:45     ` Michal Hocko
2018-03-12 13:22   ` Vlastimil Babka
2018-03-13  2:11     ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 13:55   ` Michal Hocko
2018-03-01 13:55     ` Michal Hocko
2018-03-02  7:15     ` Aaron Lu
2018-03-02  7:15       ` Aaron Lu
2018-03-02 15:34       ` Dave Hansen
2018-03-02 15:34         ` Dave Hansen
2018-03-02  7:31     ` Huang, Ying [this message]
2018-03-02  7:31       ` Huang, Ying
2018-03-02  0:01   ` Andrew Morton
2018-03-02  0:01     ` Andrew Morton
2018-03-02  8:01     ` Aaron Lu
2018-03-02  8:01       ` Aaron Lu
2018-03-02 21:23       ` Andrew Morton
2018-03-02 21:23         ` Andrew Morton
2018-03-02 21:25         ` Dave Hansen
2018-03-02 21:25           ` Dave Hansen
2018-03-12 14:22   ` Vlastimil Babka
2018-03-13  3:34     ` Aaron Lu
2018-03-22 15:17       ` Matthew Wilcox
2018-03-26  3:03         ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 14:00   ` Michal Hocko
2018-03-01 14:00     ` Michal Hocko
2018-03-02  8:31     ` Aaron Lu
2018-03-02  8:31       ` Aaron Lu
2018-03-02 17:55     ` Vlastimil Babka
2018-03-02 17:55       ` Vlastimil Babka
2018-03-02 18:00       ` Dave Hansen
2018-03-02 18:00         ` Dave Hansen
2018-03-02 18:08         ` Vlastimil Babka
2018-03-02 18:08           ` Vlastimil Babka
2018-03-05 11:41       ` Aaron Lu
2018-03-05 11:41         ` Aaron Lu
2018-03-05 11:48         ` Aaron Lu
2018-03-05 11:48           ` Aaron Lu
2018-03-06  7:55         ` Vlastimil Babka
2018-03-06  7:55           ` Vlastimil Babka
2018-03-06 12:27           ` Aaron Lu
2018-03-06 12:27             ` Aaron Lu
2018-03-06 12:53             ` Matthew Wilcox
2018-03-06 12:53               ` Matthew Wilcox
2018-03-02  0:09   ` Andrew Morton
2018-03-02  0:09     ` Andrew Morton
2018-03-02  8:27     ` Aaron Lu
2018-03-02  8:27       ` Aaron Lu
2018-03-09  8:24       ` [PATCH v4 3/3 update] " Aaron Lu
2018-03-09 21:58         ` Andrew Morton
2018-03-10 14:46           ` Aaron Lu
2018-03-12 15:05           ` Vlastimil Babka
2018-03-12 17:32         ` Dave Hansen
2018-03-13  3:35           ` Aaron Lu
2018-03-13  7:04             ` Aaron Lu
2018-03-20  9:50               ` Vlastimil Babka
2018-03-20 11:31                 ` [PATCH v4 3/3 update2] " Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r2p3c4rn.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.