All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Kemi Wang <kemi.wang@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>
Subject: [PATCH v4 3/3 update] mm/free_pcppages_bulk: prefetch buddy while not holding lock
Date: Fri, 9 Mar 2018 16:24:31 +0800	[thread overview]
Message-ID: <20180309082431.GB30868@intel.com> (raw)
In-Reply-To: <20180302082756.GC6356@intel.com>

On Fri, Mar 02, 2018 at 04:27:56PM +0800, Aaron Lu wrote:
> With this said, the count here could be pcp->count when pcp's pages
> need to be all drained and though pcp->count's default value is
> (6*pcp->batch)=186, user can increase that value through the above
> mentioned procfs interface and the resulting pcp->count could be too
> big for prefetch. Ying also mentioned this today and suggested adding
> an upper limit here to avoid prefetching too much. Perhaps just prefetch
> the last pcp->batch pages if count here > pcp->batch? Since pcp->batch
> has an upper limit, we won't need to worry prefetching too much.

The following patch adds an upper limit on prefetching, the upper limit
is set to pcp->batch since 1) it is the most likely value of input param
'count' in free_pcppages_bulk() and 2) it has an upper limit itself.

From: Aaron Lu <aaron.lu@intel.com>
Subject: [PATCH v4 3/3 update] mm/free_pcppages_bulk: prefetch buddy while not holding lock

When a page is freed back to the global pool, its buddy will be checked
to see if it's possible to do a merge. This requires accessing buddy's
page structure and that access could take a long time if it's cache cold.

This patch adds a prefetch to the to-be-freed page's buddy outside of
zone->lock in hope of accessing buddy's page structure later under
zone->lock will be faster. Since we *always* do buddy merging and check
an order-0 page's buddy to try to merge it when it goes into the main
allocator, the cacheline will always come in, i.e. the prefetched data
will never be unused.

Normally, the number of to-be-freed pages(i.e. count) equals to
pcp->batch (default=31 and has an upper limit of (PAGE_SHIFT * 8)=96 on
x86_64) but in the case of pcp's pages getting all drained, it will be
pcp->count which has an upper limit of pcp->high. pcp->high, although
has a default value of 186 (pcp->batch=31 * 6), can be changed by user
through /proc/sys/vm/percpu_pagelist_fraction and there is no software
upper limit so could be large, like several thousand. For this reason,
only the last pcp->batch number of page's buddy structure is prefetched
to avoid excessive prefetching. pcp-batch is used because:
1 most often, count == pcp->batch;
2 it has an upper limit itself so we won't prefetch excessively.

Considering the possible large value of pcp->high, it also makes
sense to free the last added page first for cache hot's reason.
That's where the change of list_add_tail() to list_add() comes in
as we will free them from head to tail one by one.

In the meantime, there are two concerns:
1 the prefetch could potentially evict existing cachelines, especially
  for L1D cache since it is not huge;
2 there is some additional instruction overhead, namely calculating
  buddy pfn twice.

For 1, it's hard to say, this microbenchmark though shows good result but
the actual benefit of this patch will be workload/CPU dependant;
For 2, since the calculation is a XOR on two local variables, it's expected
in many cases that cycles spent will be offset by reduced memory latency
later. This is especially true for NUMA machines where multiple CPUs are
contending on zone->lock and the most time consuming part under zone->lock
is the wait of 'struct page' cacheline of the to-be-freed pages and their
buddies.

Test with will-it-scale/page_fault1 full load:

kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
v4.16-rc2+  9034215        7971818       13667135       15677465
patch2/3    9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
this patch 10180856 +6.8%  8506369 +2.3% 14756865 +4.9% 17325324 +3.9%
Note: this patch's performance improvement percent is against patch2/3.

(Changelog stolen from Dave Hansen and Mel Gorman's comments at
http://lkml.kernel.org/r/148a42d8-8306-2f2f-7f7c-86bc118f8ccd@intel.com)

Link: http://lkml.kernel.org/r/20180301062845.26038-4-aaron.lu@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Suggested-by: Ying Huang <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/page_alloc.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dafdcdec9c1f..5f31f7bab583 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1141,6 +1141,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			batch_free = count;
 
 		do {
+			unsigned long pfn, buddy_pfn;
+			struct page *buddy;
+
 			page = list_last_entry(list, struct page, lru);
 			/* must delete to avoid corrupting pcp list */
 			list_del(&page->lru);
@@ -1149,7 +1152,23 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			if (bulkfree_pcp_prepare(page))
 				continue;
 
-			list_add_tail(&page->lru, &head);
+			list_add(&page->lru, &head);
+
+			/*
+			 * We are going to put the page back to the global
+			 * pool, prefetch its buddy to speed up later access
+			 * under zone->lock. It is believed the overhead of
+			 * an additional test and calculating buddy_pfn here
+			 * can be offset by reduced memory latency later. To
+			 * avoid excessive prefetching due to large count, only
+			 * prefetch buddy for the last pcp->batch nr of pages.
+			 */
+			if (count > pcp->batch)
+				continue;
+			pfn = page_to_pfn(page);
+			buddy_pfn = __find_buddy_pfn(pfn, 0);
+			buddy = page + (buddy_pfn - pfn);
+			prefetch(buddy);
 		} while (--count && --batch_free && !list_empty(list));
 	}
 
-- 
2.14.3

  reply	other threads:[~2018-03-09  8:23 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-01  6:28 [PATCH v4 0/3] mm: improve zone->lock scalability Aaron Lu
2018-03-01  6:28 ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 12:11   ` David Rientjes
2018-03-01 12:11     ` David Rientjes
2018-03-01 13:45   ` Michal Hocko
2018-03-01 13:45     ` Michal Hocko
2018-03-12 13:22   ` Vlastimil Babka
2018-03-13  2:11     ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 13:55   ` Michal Hocko
2018-03-01 13:55     ` Michal Hocko
2018-03-02  7:15     ` Aaron Lu
2018-03-02  7:15       ` Aaron Lu
2018-03-02 15:34       ` Dave Hansen
2018-03-02 15:34         ` Dave Hansen
2018-03-02  7:31     ` Huang, Ying
2018-03-02  7:31       ` Huang, Ying
2018-03-02  0:01   ` Andrew Morton
2018-03-02  0:01     ` Andrew Morton
2018-03-02  8:01     ` Aaron Lu
2018-03-02  8:01       ` Aaron Lu
2018-03-02 21:23       ` Andrew Morton
2018-03-02 21:23         ` Andrew Morton
2018-03-02 21:25         ` Dave Hansen
2018-03-02 21:25           ` Dave Hansen
2018-03-12 14:22   ` Vlastimil Babka
2018-03-13  3:34     ` Aaron Lu
2018-03-22 15:17       ` Matthew Wilcox
2018-03-26  3:03         ` Aaron Lu
2018-03-01  6:28 ` [PATCH v4 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu
2018-03-01  6:28   ` Aaron Lu
2018-03-01 14:00   ` Michal Hocko
2018-03-01 14:00     ` Michal Hocko
2018-03-02  8:31     ` Aaron Lu
2018-03-02  8:31       ` Aaron Lu
2018-03-02 17:55     ` Vlastimil Babka
2018-03-02 17:55       ` Vlastimil Babka
2018-03-02 18:00       ` Dave Hansen
2018-03-02 18:00         ` Dave Hansen
2018-03-02 18:08         ` Vlastimil Babka
2018-03-02 18:08           ` Vlastimil Babka
2018-03-05 11:41       ` Aaron Lu
2018-03-05 11:41         ` Aaron Lu
2018-03-05 11:48         ` Aaron Lu
2018-03-05 11:48           ` Aaron Lu
2018-03-06  7:55         ` Vlastimil Babka
2018-03-06  7:55           ` Vlastimil Babka
2018-03-06 12:27           ` Aaron Lu
2018-03-06 12:27             ` Aaron Lu
2018-03-06 12:53             ` Matthew Wilcox
2018-03-06 12:53               ` Matthew Wilcox
2018-03-02  0:09   ` Andrew Morton
2018-03-02  0:09     ` Andrew Morton
2018-03-02  8:27     ` Aaron Lu
2018-03-02  8:27       ` Aaron Lu
2018-03-09  8:24       ` Aaron Lu [this message]
2018-03-09 21:58         ` [PATCH v4 3/3 update] " Andrew Morton
2018-03-10 14:46           ` Aaron Lu
2018-03-12 15:05           ` Vlastimil Babka
2018-03-12 17:32         ` Dave Hansen
2018-03-13  3:35           ` Aaron Lu
2018-03-13  7:04             ` Aaron Lu
2018-03-20  9:50               ` Vlastimil Babka
2018-03-20 11:31                 ` [PATCH v4 3/3 update2] " Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180309082431.GB30868@intel.com \
    --to=aaron.lu@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.