linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] Optimise page alloc/free fast paths v2
@ 2016-04-12 10:12 Mel Gorman
  2016-04-12 10:12 ` [PATCH 01/24] mm, page_alloc: Only check PageCompound for high-order pages Mel Gorman
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

Sorry for the quick resend. One patch had a warning still and while I
was there, I added a few patches in the bulk pcp free path.

Changelog since v1
o Fix an unused variable warning
o Throw in a few optimisations in the bulk pcp free path
o Rebase to 4.6-rc3

Another year, another round of page allocator optimisations focusing this
time on the alloc and free fast paths. This should be of help to workloads
that are allocator-intensive from kernel space where the cost of zeroing
is not nceessraily incurred.

The series is motivated by the observation that page alloc microbenchmarks
on multiple machines regressed between 3.12.44 and 4.4. Second, there is
discussions before LSF/MM considering the possibility of adding another
page allocator which is potentially hazardous but a patch series improving
performance is better than whining.

After the series is applied, there are still hazards.  In the free paths,
the debugging checking and page zone/pageblock lookups dominate but
there was not an obvious solution to that. In the alloc path, the major
contributers are dealing with zonelists, new page preperation, the fair
zone allocation and numerous statistic updates. The fair zone allocator
is removed by the per-node LRU series if that gets merged so it's nor a
major concern at the moment.

On normal userspace benchmarks, there is little impact as the zeroing cost
is significant but it's visible

aim9
                               4.6.0-rc2             4.6.0-rc2
                                 vanilla          cpuset-v1r20
Min      page_test   864733.33 (  0.00%)   922986.67 (  6.74%)
Min      brk_test   6212191.87 (  0.00%)  6271866.67 (  0.96%)
Min      exec_test     1294.67 (  0.00%)     1306.00 (  0.88%)
Min      fork_test    12644.90 (  0.00%)    12713.33 (  0.54%)

The overall impact on a page allocator microbenchmark for a range of orders
and number of pages allocated in a batch is

                                           4.6.0-rc3                  4.6.0-rc2
                                             vanilla                   micro-v2
Min      alloc-odr0-1               428.00 (  0.00%)           343.00 ( 19.86%)
Min      alloc-odr0-2               314.00 (  0.00%)           252.00 ( 19.75%)
Min      alloc-odr0-4               256.00 (  0.00%)           209.00 ( 18.36%)
Min      alloc-odr0-8               223.00 (  0.00%)           182.00 ( 18.39%)
Min      alloc-odr0-16              207.00 (  0.00%)           168.00 ( 18.84%)
Min      alloc-odr0-32              197.00 (  0.00%)           162.00 ( 17.77%)
Min      alloc-odr0-64              193.00 (  0.00%)           159.00 ( 17.62%)
Min      alloc-odr0-128             191.00 (  0.00%)           157.00 ( 17.80%)
Min      alloc-odr0-256             200.00 (  0.00%)           167.00 ( 16.50%)
Min      alloc-odr0-512             212.00 (  0.00%)           179.00 ( 15.57%)
Min      alloc-odr0-1024            220.00 (  0.00%)           184.00 ( 16.36%)
Min      alloc-odr0-2048            226.00 (  0.00%)           190.00 ( 15.93%)
Min      alloc-odr0-4096            233.00 (  0.00%)           197.00 ( 15.45%)
Min      alloc-odr0-8192            235.00 (  0.00%)           199.00 ( 15.32%)
Min      alloc-odr0-16384           235.00 (  0.00%)           199.00 ( 15.32%)
Min      alloc-odr1-1               519.00 (  0.00%)           461.00 ( 11.18%)
Min      alloc-odr1-2               391.00 (  0.00%)           344.00 ( 12.02%)
Min      alloc-odr1-4               312.00 (  0.00%)           276.00 ( 11.54%)
Min      alloc-odr1-8               276.00 (  0.00%)           238.00 ( 13.77%)
Min      alloc-odr1-16              256.00 (  0.00%)           220.00 ( 14.06%)
Min      alloc-odr1-32              247.00 (  0.00%)           211.00 ( 14.57%)
Min      alloc-odr1-64              242.00 (  0.00%)           208.00 ( 14.05%)
Min      alloc-odr1-128             245.00 (  0.00%)           206.00 ( 15.92%)
Min      alloc-odr1-256             244.00 (  0.00%)           206.00 ( 15.57%)
Min      alloc-odr1-512             245.00 (  0.00%)           209.00 ( 14.69%)
Min      alloc-odr1-1024            246.00 (  0.00%)           211.00 ( 14.23%)
Min      alloc-odr1-2048            253.00 (  0.00%)           220.00 ( 13.04%)
Min      alloc-odr1-4096            258.00 (  0.00%)           224.00 ( 13.18%)
Min      alloc-odr1-8192            261.00 (  0.00%)           226.00 ( 13.41%)
Min      alloc-odr2-1               560.00 (  0.00%)           480.00 ( 14.29%)
Min      alloc-odr2-2               422.00 (  0.00%)           366.00 ( 13.27%)
Min      alloc-odr2-4               339.00 (  0.00%)           289.00 ( 14.75%)
Min      alloc-odr2-8               297.00 (  0.00%)           250.00 ( 15.82%)
Min      alloc-odr2-16              277.00 (  0.00%)           233.00 ( 15.88%)
Min      alloc-odr2-32              268.00 (  0.00%)           223.00 ( 16.79%)
Min      alloc-odr2-64              266.00 (  0.00%)           219.00 ( 17.67%)
Min      alloc-odr2-128             264.00 (  0.00%)           218.00 ( 17.42%)
Min      alloc-odr2-256             265.00 (  0.00%)           219.00 ( 17.36%)
Min      alloc-odr2-512             270.00 (  0.00%)           224.00 ( 17.04%)
Min      alloc-odr2-1024            279.00 (  0.00%)           234.00 ( 16.13%)
Min      alloc-odr2-2048            284.00 (  0.00%)           239.00 ( 15.85%)
Min      alloc-odr2-4096            285.00 (  0.00%)           239.00 ( 16.14%)
Min      alloc-odr3-1               629.00 (  0.00%)           526.00 ( 16.38%)
Min      alloc-odr3-2               471.00 (  0.00%)           395.00 ( 16.14%)
Min      alloc-odr3-4               382.00 (  0.00%)           315.00 ( 17.54%)
Min      alloc-odr3-8               466.00 (  0.00%)           279.00 ( 40.13%)
Min      alloc-odr3-16              316.00 (  0.00%)           259.00 ( 18.04%)
Min      alloc-odr3-32              307.00 (  0.00%)           251.00 ( 18.24%)
Min      alloc-odr3-64              305.00 (  0.00%)           248.00 ( 18.69%)
Min      alloc-odr3-128             308.00 (  0.00%)           248.00 ( 19.48%)
Min      alloc-odr3-256             317.00 (  0.00%)           256.00 ( 19.24%)
Min      alloc-odr3-512             327.00 (  0.00%)           262.00 ( 19.88%)
Min      alloc-odr3-1024            332.00 (  0.00%)           268.00 ( 19.28%)
Min      alloc-odr3-2048            333.00 (  0.00%)           269.00 ( 19.22%)
Min      alloc-odr4-1               764.00 (  0.00%)           607.00 ( 20.55%)
Min      alloc-odr4-2               577.00 (  0.00%)           459.00 ( 20.45%)
Min      alloc-odr4-4               473.00 (  0.00%)           370.00 ( 21.78%)
Min      alloc-odr4-8               420.00 (  0.00%)           327.00 ( 22.14%)
Min      alloc-odr4-16              397.00 (  0.00%)           309.00 ( 22.17%)
Min      alloc-odr4-32              391.00 (  0.00%)           303.00 ( 22.51%)
Min      alloc-odr4-64              395.00 (  0.00%)           302.00 ( 23.54%)
Min      alloc-odr4-128             408.00 (  0.00%)           311.00 ( 23.77%)
Min      alloc-odr4-256             421.00 (  0.00%)           326.00 ( 22.57%)
Min      alloc-odr4-512             428.00 (  0.00%)           333.00 ( 22.20%)
Min      alloc-odr4-1024            429.00 (  0.00%)           330.00 ( 23.08%)
Min      free-odr0-1                216.00 (  0.00%)           193.00 ( 10.65%)
Min      free-odr0-2                152.00 (  0.00%)           137.00 (  9.87%)
Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
Min      free-odr0-8                106.00 (  0.00%)            95.00 ( 10.38%)
Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
Min      free-odr0-32                92.00 (  0.00%)            82.00 ( 10.87%)
Min      free-odr0-64                89.00 (  0.00%)            80.00 ( 10.11%)
Min      free-odr0-128               89.00 (  0.00%)            79.00 ( 11.24%)
Min      free-odr0-256              102.00 (  0.00%)            94.00 (  7.84%)
Min      free-odr0-512              117.00 (  0.00%)           110.00 (  5.98%)
Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
Min      free-odr0-2048             131.00 (  0.00%)           123.00 (  6.11%)
Min      free-odr0-4096             136.00 (  0.00%)           126.00 (  7.35%)
Min      free-odr0-8192             136.00 (  0.00%)           127.00 (  6.62%)
Min      free-odr0-16384            137.00 (  0.00%)           127.00 (  7.30%)
Min      free-odr1-1                317.00 (  0.00%)           292.00 (  7.89%)
Min      free-odr1-2                228.00 (  0.00%)           210.00 (  7.89%)
Min      free-odr1-4                182.00 (  0.00%)           169.00 (  7.14%)
Min      free-odr1-8                162.00 (  0.00%)           148.00 (  8.64%)
Min      free-odr1-16               152.00 (  0.00%)           138.00 (  9.21%)
Min      free-odr1-32               144.00 (  0.00%)           132.00 (  8.33%)
Min      free-odr1-64               143.00 (  0.00%)           131.00 (  8.39%)
Min      free-odr1-128              148.00 (  0.00%)           136.00 (  8.11%)
Min      free-odr1-256              150.00 (  0.00%)           141.00 (  6.00%)
Min      free-odr1-512              151.00 (  0.00%)           144.00 (  4.64%)
Min      free-odr1-1024             155.00 (  0.00%)           147.00 (  5.16%)
Min      free-odr1-2048             157.00 (  0.00%)           150.00 (  4.46%)
Min      free-odr1-4096             156.00 (  0.00%)           147.00 (  5.77%)
Min      free-odr1-8192             156.00 (  0.00%)           146.00 (  6.41%)
Min      free-odr2-1                363.00 (  0.00%)           315.00 ( 13.22%)
Min      free-odr2-2                256.00 (  0.00%)           229.00 ( 10.55%)
Min      free-odr2-4                209.00 (  0.00%)           189.00 (  9.57%)
Min      free-odr2-8                182.00 (  0.00%)           162.00 ( 10.99%)
Min      free-odr2-16               171.00 (  0.00%)           154.00 (  9.94%)
Min      free-odr2-32               165.00 (  0.00%)           152.00 (  7.88%)
Min      free-odr2-64               166.00 (  0.00%)           153.00 (  7.83%)
Min      free-odr2-128              167.00 (  0.00%)           156.00 (  6.59%)
Min      free-odr2-256              170.00 (  0.00%)           159.00 (  6.47%)
Min      free-odr2-512              177.00 (  0.00%)           165.00 (  6.78%)
Min      free-odr2-1024             184.00 (  0.00%)           168.00 (  8.70%)
Min      free-odr2-2048             182.00 (  0.00%)           165.00 (  9.34%)
Min      free-odr2-4096             181.00 (  0.00%)           163.00 (  9.94%)
Min      free-odr3-1                442.00 (  0.00%)           376.00 ( 14.93%)
Min      free-odr3-2                310.00 (  0.00%)           272.00 ( 12.26%)
Min      free-odr3-4                253.00 (  0.00%)           215.00 ( 15.02%)
Min      free-odr3-8                285.00 (  0.00%)           193.00 ( 32.28%)
Min      free-odr3-16               207.00 (  0.00%)           179.00 ( 13.53%)
Min      free-odr3-32               207.00 (  0.00%)           180.00 ( 13.04%)
Min      free-odr3-64               212.00 (  0.00%)           184.00 ( 13.21%)
Min      free-odr3-128              216.00 (  0.00%)           189.00 ( 12.50%)
Min      free-odr3-256              224.00 (  0.00%)           197.00 ( 12.05%)
Min      free-odr3-512              231.00 (  0.00%)           201.00 ( 12.99%)
Min      free-odr3-1024             230.00 (  0.00%)           202.00 ( 12.17%)
Min      free-odr3-2048             229.00 (  0.00%)           199.00 ( 13.10%)
Min      free-odr4-1                559.00 (  0.00%)           460.00 ( 17.71%)
Min      free-odr4-2                406.00 (  0.00%)           333.00 ( 17.98%)
Min      free-odr4-4                336.00 (  0.00%)           272.00 ( 19.05%)
Min      free-odr4-8                298.00 (  0.00%)           240.00 ( 19.46%)
Min      free-odr4-16               283.00 (  0.00%)           235.00 ( 16.96%)
Min      free-odr4-32               291.00 (  0.00%)           239.00 ( 17.87%)
Min      free-odr4-64               297.00 (  0.00%)           242.00 ( 18.52%)
Min      free-odr4-128              309.00 (  0.00%)           257.00 ( 16.83%)
Min      free-odr4-256              322.00 (  0.00%)           275.00 ( 14.60%)
Min      free-odr4-512              326.00 (  0.00%)           279.00 ( 14.42%)
Min      free-odr4-1024             325.00 (  0.00%)           275.00 ( 15.38%)
Min      total-odr0-1               644.00 (  0.00%)           536.00 ( 16.77%)
Min      total-odr0-2               466.00 (  0.00%)           389.00 ( 16.52%)
Min      total-odr0-4               375.00 (  0.00%)           316.00 ( 15.73%)
Min      total-odr0-8               329.00 (  0.00%)           277.00 ( 15.81%)
Min      total-odr0-16              304.00 (  0.00%)           255.00 ( 16.12%)
Min      total-odr0-32              289.00 (  0.00%)           244.00 ( 15.57%)
Min      total-odr0-64              282.00 (  0.00%)           239.00 ( 15.25%)
Min      total-odr0-128             280.00 (  0.00%)           236.00 ( 15.71%)
Min      total-odr0-256             302.00 (  0.00%)           261.00 ( 13.58%)
Min      total-odr0-512             329.00 (  0.00%)           289.00 ( 12.16%)
Min      total-odr0-1024            345.00 (  0.00%)           302.00 ( 12.46%)
Min      total-odr0-2048            357.00 (  0.00%)           313.00 ( 12.32%)
Min      total-odr0-4096            369.00 (  0.00%)           323.00 ( 12.47%)
Min      total-odr0-8192            371.00 (  0.00%)           326.00 ( 12.13%)
Min      total-odr0-16384           372.00 (  0.00%)           326.00 ( 12.37%)
Min      total-odr1-1               836.00 (  0.00%)           754.00 (  9.81%)
Min      total-odr1-2               619.00 (  0.00%)           554.00 ( 10.50%)
Min      total-odr1-4               495.00 (  0.00%)           445.00 ( 10.10%)
Min      total-odr1-8               438.00 (  0.00%)           386.00 ( 11.87%)
Min      total-odr1-16              408.00 (  0.00%)           358.00 ( 12.25%)
Min      total-odr1-32              391.00 (  0.00%)           343.00 ( 12.28%)
Min      total-odr1-64              385.00 (  0.00%)           339.00 ( 11.95%)
Min      total-odr1-128             393.00 (  0.00%)           342.00 ( 12.98%)
Min      total-odr1-256             394.00 (  0.00%)           347.00 ( 11.93%)
Min      total-odr1-512             396.00 (  0.00%)           353.00 ( 10.86%)
Min      total-odr1-1024            401.00 (  0.00%)           358.00 ( 10.72%)
Min      total-odr1-2048            410.00 (  0.00%)           370.00 (  9.76%)
Min      total-odr1-4096            414.00 (  0.00%)           371.00 ( 10.39%)
Min      total-odr1-8192            417.00 (  0.00%)           372.00 ( 10.79%)
Min      total-odr2-1               923.00 (  0.00%)           795.00 ( 13.87%)
Min      total-odr2-2               678.00 (  0.00%)           595.00 ( 12.24%)
Min      total-odr2-4               548.00 (  0.00%)           478.00 ( 12.77%)
Min      total-odr2-8               480.00 (  0.00%)           412.00 ( 14.17%)
Min      total-odr2-16              448.00 (  0.00%)           387.00 ( 13.62%)
Min      total-odr2-32              433.00 (  0.00%)           375.00 ( 13.39%)
Min      total-odr2-64              432.00 (  0.00%)           372.00 ( 13.89%)
Min      total-odr2-128             431.00 (  0.00%)           374.00 ( 13.23%)
Min      total-odr2-256             436.00 (  0.00%)           378.00 ( 13.30%)
Min      total-odr2-512             447.00 (  0.00%)           389.00 ( 12.98%)
Min      total-odr2-1024            463.00 (  0.00%)           402.00 ( 13.17%)
Min      total-odr2-2048            466.00 (  0.00%)           404.00 ( 13.30%)
Min      total-odr2-4096            466.00 (  0.00%)           402.00 ( 13.73%)
Min      total-odr3-1              1071.00 (  0.00%)           904.00 ( 15.59%)
Min      total-odr3-2               781.00 (  0.00%)           667.00 ( 14.60%)
Min      total-odr3-4               636.00 (  0.00%)           531.00 ( 16.51%)
Min      total-odr3-8               751.00 (  0.00%)           472.00 ( 37.15%)
Min      total-odr3-16              523.00 (  0.00%)           438.00 ( 16.25%)
Min      total-odr3-32              514.00 (  0.00%)           431.00 ( 16.15%)
Min      total-odr3-64              517.00 (  0.00%)           432.00 ( 16.44%)
Min      total-odr3-128             524.00 (  0.00%)           437.00 ( 16.60%)
Min      total-odr3-256             541.00 (  0.00%)           453.00 ( 16.27%)
Min      total-odr3-512             558.00 (  0.00%)           463.00 ( 17.03%)
Min      total-odr3-1024            562.00 (  0.00%)           470.00 ( 16.37%)
Min      total-odr3-2048            562.00 (  0.00%)           468.00 ( 16.73%)
Min      total-odr4-1              1323.00 (  0.00%)          1067.00 ( 19.35%)
Min      total-odr4-2               983.00 (  0.00%)           792.00 ( 19.43%)
Min      total-odr4-4               809.00 (  0.00%)           642.00 ( 20.64%)
Min      total-odr4-8               718.00 (  0.00%)           567.00 ( 21.03%)
Min      total-odr4-16              680.00 (  0.00%)           544.00 ( 20.00%)
Min      total-odr4-32              682.00 (  0.00%)           542.00 ( 20.53%)
Min      total-odr4-64              692.00 (  0.00%)           544.00 ( 21.39%)
Min      total-odr4-128             717.00 (  0.00%)           568.00 ( 20.78%)
Min      total-odr4-256             743.00 (  0.00%)           601.00 ( 19.11%)
Min      total-odr4-512             754.00 (  0.00%)           612.00 ( 18.83%)
Min      total-odr4-1024            754.00 (  0.00%)           605.00 ( 19.76%)

 fs/buffer.c                |  10 +-
 include/linux/compaction.h |   6 +-
 include/linux/cpuset.h     |  42 ++++--
 include/linux/mm.h         |   5 +-
 include/linux/mmzone.h     |  34 +++--
 include/linux/page-flags.h |   7 +-
 include/linux/vmstat.h     |   2 -
 kernel/cpuset.c            |  14 +-
 mm/compaction.c            |  12 +-
 mm/internal.h              |   4 +-
 mm/mempolicy.c             |  19 +--
 mm/mmzone.c                |   2 +-
 mm/page_alloc.c            | 328 +++++++++++++++++++++++++++------------------
 mm/vmstat.c                |  25 ----
 14 files changed, 293 insertions(+), 217 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/24] mm, page_alloc: Only check PageCompound for high-order pages
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 02/24] mm, page_alloc: Use new PageAnonHead helper in the free page fast path Mel Gorman
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

order-0 pages by definition cannot be compound so avoid the check in the
fast path for those pages.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59de90d5d3a3..5d205bcfe10d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1024,24 +1024,33 @@ void __meminit reserve_bootmem_region(unsigned long start, unsigned long end)
 
 static bool free_pages_prepare(struct page *page, unsigned int order)
 {
-	bool compound = PageCompound(page);
-	int i, bad = 0;
+	int bad = 0;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
-	VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
 
 	trace_mm_page_free(page, order);
 	kmemcheck_free_shadow(page, order);
 	kasan_free_pages(page, order);
 
+	/*
+	 * Check tail pages before head page information is cleared to
+	 * avoid checking PageCompound for order-0 pages.
+	 */
+	if (order) {
+		bool compound = PageCompound(page);
+		int i;
+
+		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
+
+		for (i = 1; i < (1 << order); i++) {
+			if (compound)
+				bad += free_tail_pages_check(page, page + i);
+			bad += free_pages_check(page + i);
+		}
+	}
 	if (PageAnon(page))
 		page->mapping = NULL;
 	bad += free_pages_check(page);
-	for (i = 1; i < (1 << order); i++) {
-		if (compound)
-			bad += free_tail_pages_check(page, page + i);
-		bad += free_pages_check(page + i);
-	}
 	if (bad)
 		return false;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/24] mm, page_alloc: Use new PageAnonHead helper in the free page fast path
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
  2016-04-12 10:12 ` [PATCH 01/24] mm, page_alloc: Only check PageCompound for high-order pages Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 03/24] mm, page_alloc: Reduce branches in zone_statistics Mel Gorman
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The PageAnon check always checks for compound_head but this is a relatively
expensive check if the caller already knows the page is a head page. This
patch creates a helper and uses it in the page free path which only operates
on head pages.

With this patch and "Only check PageCompound for high-order pages", the
performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                             vanilla           nocompound-v1r20
Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)

There is a sizable boost to the free allocator performance. While there
is an apparent boost on the allocation side, it's likely a co-incidence
or due to the patches slightly reducing cache footprint.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/page-flags.h | 7 ++++++-
 mm/page_alloc.c            | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f4ed4f1b0c77..ccd04ee1ba2d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -371,10 +371,15 @@ PAGEFLAG(Idle, idle, PF_ANY)
 #define PAGE_MAPPING_KSM	2
 #define PAGE_MAPPING_FLAGS	(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
 
+static __always_inline int PageAnonHead(struct page *page)
+{
+	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+}
+
 static __always_inline int PageAnon(struct page *page)
 {
 	page = compound_head(page);
-	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+	return PageAnonHead(page);
 }
 
 #ifdef CONFIG_KSM
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d205bcfe10d..6812de41f698 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1048,7 +1048,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 			bad += free_pages_check(page + i);
 		}
 	}
-	if (PageAnon(page))
+	if (PageAnonHead(page))
 		page->mapping = NULL;
 	bad += free_pages_check(page);
 	if (bad)
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/24] mm, page_alloc: Reduce branches in zone_statistics
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
  2016-04-12 10:12 ` [PATCH 01/24] mm, page_alloc: Only check PageCompound for high-order pages Mel Gorman
  2016-04-12 10:12 ` [PATCH 02/24] mm, page_alloc: Use new PageAnonHead helper in the free page fast path Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 04/24] mm, page_alloc: Inline zone_statistics Mel Gorman
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

zone_statistics has more branches than it really needs to take an
unlikely GFP flag into account. Reduce the number and annotate
the unlikely flag.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    nocompound-v1r10           statbranch-v1r10
Min      alloc-odr0-1               417.00 (  0.00%)           419.00 ( -0.48%)
Min      alloc-odr0-2               308.00 (  0.00%)           305.00 (  0.97%)
Min      alloc-odr0-4               253.00 (  0.00%)           250.00 (  1.19%)
Min      alloc-odr0-8               221.00 (  0.00%)           219.00 (  0.90%)
Min      alloc-odr0-16              205.00 (  0.00%)           203.00 (  0.98%)
Min      alloc-odr0-32              199.00 (  0.00%)           195.00 (  2.01%)
Min      alloc-odr0-64              193.00 (  0.00%)           191.00 (  1.04%)
Min      alloc-odr0-128             191.00 (  0.00%)           189.00 (  1.05%)
Min      alloc-odr0-256             200.00 (  0.00%)           198.00 (  1.00%)
Min      alloc-odr0-512             212.00 (  0.00%)           210.00 (  0.94%)
Min      alloc-odr0-1024            219.00 (  0.00%)           216.00 (  1.37%)
Min      alloc-odr0-2048            225.00 (  0.00%)           221.00 (  1.78%)
Min      alloc-odr0-4096            231.00 (  0.00%)           227.00 (  1.73%)
Min      alloc-odr0-8192            234.00 (  0.00%)           232.00 (  0.85%)
Min      alloc-odr0-16384           234.00 (  0.00%)           232.00 (  0.85%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/vmstat.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5e4300482897..2e58ead9bcf5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -581,17 +581,21 @@ void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset)
  */
 void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags)
 {
-	if (z->zone_pgdat == preferred_zone->zone_pgdat) {
+	int local_nid = numa_node_id();
+	enum zone_stat_item local_stat = NUMA_LOCAL;
+
+	if (unlikely(flags & __GFP_OTHER_NODE)) {
+		local_stat = NUMA_OTHER;
+		local_nid = preferred_zone->node;
+	}
+
+	if (z->node == local_nid) {
 		__inc_zone_state(z, NUMA_HIT);
+		__inc_zone_state(z, local_stat);
 	} else {
 		__inc_zone_state(z, NUMA_MISS);
 		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
 	}
-	if (z->node == ((flags & __GFP_OTHER_NODE) ?
-			preferred_zone->node : numa_node_id()))
-		__inc_zone_state(z, NUMA_LOCAL);
-	else
-		__inc_zone_state(z, NUMA_OTHER);
 }
 
 /*
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/24] mm, page_alloc: Inline zone_statistics
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (2 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 03/24] mm, page_alloc: Reduce branches in zone_statistics Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 05/24] mm, page_alloc: Inline the fast path of the zonelist iterator Mel Gorman
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

zone_statistics has one call-site but it's a public function. Make
it static and inline.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    statbranch-v1r20           statinline-v1r20
Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/vmstat.h |  2 --
 mm/page_alloc.c        | 31 +++++++++++++++++++++++++++++++
 mm/vmstat.c            | 29 -----------------------------
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 73fae8c4a5fb..152d26b7f972 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -163,12 +163,10 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone,
 #ifdef CONFIG_NUMA
 
 extern unsigned long node_page_state(int node, enum zone_stat_item item);
-extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp);
 
 #else
 
 #define node_page_state(node, item) global_page_state(item)
-#define zone_statistics(_zl, _z, gfp) do { } while (0)
 
 #endif /* CONFIG_NUMA */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6812de41f698..b56c2b2911a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2352,6 +2352,37 @@ int split_free_page(struct page *page)
 }
 
 /*
+ * Update NUMA hit/miss statistics 
+ *
+ * Must be called with interrupts disabled.
+ *
+ * When __GFP_OTHER_NODE is set assume the node of the preferred
+ * zone is the local node. This is useful for daemons who allocate
+ * memory on behalf of other processes.
+ */
+static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
+								gfp_t flags)
+{
+#ifdef CONFIG_NUMA
+	int local_nid = numa_node_id();
+	enum zone_stat_item local_stat = NUMA_LOCAL;
+
+	if (unlikely(flags & __GFP_OTHER_NODE)) {
+		local_stat = NUMA_OTHER;
+		local_nid = preferred_zone->node;
+	}
+
+	if (z->node == local_nid) {
+		__inc_zone_state(z, NUMA_HIT);
+		__inc_zone_state(z, local_stat);
+	} else {
+		__inc_zone_state(z, NUMA_MISS);
+		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
+	}
+#endif
+}
+
+/*
  * Allocate a page from the given zone. Use pcplists for order-0 allocations.
  */
 static inline
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 2e58ead9bcf5..a4bda11eac8d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -570,35 +570,6 @@ void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset)
 
 #ifdef CONFIG_NUMA
 /*
- * zonelist = the list of zones passed to the allocator
- * z 	    = the zone from which the allocation occurred.
- *
- * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
- */
-void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags)
-{
-	int local_nid = numa_node_id();
-	enum zone_stat_item local_stat = NUMA_LOCAL;
-
-	if (unlikely(flags & __GFP_OTHER_NODE)) {
-		local_stat = NUMA_OTHER;
-		local_nid = preferred_zone->node;
-	}
-
-	if (z->node == local_nid) {
-		__inc_zone_state(z, NUMA_HIT);
-		__inc_zone_state(z, local_stat);
-	} else {
-		__inc_zone_state(z, NUMA_MISS);
-		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
-	}
-}
-
-/*
  * Determine the per node value of a stat item.
  */
 unsigned long node_page_state(int node, enum zone_stat_item item)
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/24] mm, page_alloc: Inline the fast path of the zonelist iterator
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (3 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 04/24] mm, page_alloc: Inline zone_statistics Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 06/24] mm, page_alloc: Use __dec_zone_state for order-0 page allocation Mel Gorman
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The page allocator iterates through a zonelist for zones that match
the addressing limitations and nodemask of the caller but many allocations
will not be restricted. Despite this, there is always functional call
overhead which builds up.

This patch inlines the optimistic basic case and only calls the
iterator function for the complex case. A hindrance was the fact that
cpuset_current_mems_allowed is used in the fastpath as the allowed nodemask
even though all nodes are allowed on most systems. The patch handles this
by only considering cpuset_current_mems_allowed if a cpuset exists. As well
as being faster in the fast-path, this removes some junk in the slowpath.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                    statinline-v1r20              optiter-v1r20
Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)

perf indicated that next_zones_zonelist disappeared in the profile and
__next_zones_zonelist did not appear. This is expected as the micro-benchmark
would hit the inlined fast-path every time.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/mmzone.h | 13 +++++++++++--
 mm/mmzone.c            |  2 +-
 mm/page_alloc.c        | 26 +++++++++-----------------
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c60df9257cc7..0c4d5ebb3849 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -922,6 +922,10 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
 #endif /* CONFIG_NUMA */
 }
 
+struct zoneref *__next_zones_zonelist(struct zoneref *z,
+					enum zone_type highest_zoneidx,
+					nodemask_t *nodes);
+
 /**
  * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point
  * @z - The cursor used as a starting point for the search
@@ -934,9 +938,14 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
  * being examined. It should be advanced by one before calling
  * next_zones_zonelist again.
  */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
+static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
 					enum zone_type highest_zoneidx,
-					nodemask_t *nodes);
+					nodemask_t *nodes)
+{
+	if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx))
+		return z;
+	return __next_zones_zonelist(z, highest_zoneidx, nodes);
+}
 
 /**
  * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 52687fb4de6f..5652be858e5e 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -52,7 +52,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
 }
 
 /* Returns the next zone at or below highest_zoneidx in a zonelist */
-struct zoneref *next_zones_zonelist(struct zoneref *z,
+struct zoneref *__next_zones_zonelist(struct zoneref *z,
 					enum zone_type highest_zoneidx,
 					nodemask_t *nodes)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b56c2b2911a2..e9acc0b0f787 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3193,17 +3193,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
-	/*
-	 * Find the true preferred zone if the allocation is unconstrained by
-	 * cpusets.
-	 */
-	if (!(alloc_flags & ALLOC_CPUSET) && !ac->nodemask) {
-		struct zoneref *preferred_zoneref;
-		preferred_zoneref = first_zones_zonelist(ac->zonelist,
-				ac->high_zoneidx, NULL, &ac->preferred_zone);
-		ac->classzone_idx = zonelist_zone_idx(preferred_zoneref);
-	}
-
 	/* This is the last chance, in general, before the goto nopage. */
 	page = get_page_from_freelist(gfp_mask, order,
 				alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
@@ -3359,14 +3348,21 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	struct zoneref *preferred_zoneref;
 	struct page *page = NULL;
 	unsigned int cpuset_mems_cookie;
-	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
+	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = {
 		.high_zoneidx = gfp_zone(gfp_mask),
+		.zonelist = zonelist,
 		.nodemask = nodemask,
 		.migratetype = gfpflags_to_migratetype(gfp_mask),
 	};
 
+	if (cpusets_enabled()) {
+		alloc_flags |= ALLOC_CPUSET;
+		if (!ac.nodemask)
+			ac.nodemask = &cpuset_current_mems_allowed;
+	}
+
 	gfp_mask &= gfp_allowed_mask;
 
 	lockdep_trace_alloc(gfp_mask);
@@ -3390,16 +3386,12 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
-	/* We set it here, as __alloc_pages_slowpath might have changed it */
-	ac.zonelist = zonelist;
-
 	/* Dirty zone balancing only done in the fast path */
 	ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
 
 	/* The preferred zone is used for statistics later */
 	preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
-				ac.nodemask ? : &cpuset_current_mems_allowed,
-				&ac.preferred_zone);
+				ac.nodemask, &ac.preferred_zone);
 	if (!ac.preferred_zone)
 		goto out;
 	ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/24] mm, page_alloc: Use __dec_zone_state for order-0 page allocation
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (4 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 05/24] mm, page_alloc: Inline the fast path of the zonelist iterator Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 07/24] mm, page_alloc: Avoid unnecessary zone lookups during pageblock operations Mel Gorman
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

__dec_zone_state is cheaper to use for removing an order-0 page as it
has fewer conditions to check.

The performance difference on a page allocator microbenchmark is;

                                           4.6.0-rc2                  4.6.0-rc2
                                       optiter-v1r20              decstat-v1r20
Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e9acc0b0f787..ab16560b76e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2414,6 +2414,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 		else
 			page = list_first_entry(list, struct page, lru);
 
+		__dec_zone_state(zone, NR_ALLOC_BATCH);
 		list_del(&page->lru);
 		pcp->count--;
 	} else {
@@ -2435,11 +2436,11 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
+		__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
 		__mod_zone_freepage_state(zone, -(1 << order),
 					  get_pcppage_migratetype(page));
 	}
 
-	__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
 	if (atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]) <= 0 &&
 	    !test_bit(ZONE_FAIR_DEPLETED, &zone->flags))
 		set_bit(ZONE_FAIR_DEPLETED, &zone->flags);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/24] mm, page_alloc: Avoid unnecessary zone lookups during pageblock operations
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (5 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 06/24] mm, page_alloc: Use __dec_zone_state for order-0 page allocation Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 08/24] mm, page_alloc: Convert alloc_flags to unsigned Mel Gorman
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

Pageblocks have an associated bitmap to store migrate types and whether
the pageblock should be skipped during compaction. The bitmap may be
associated with a memory section or a zone but the zone is looked up
unconditionally. The compiler should optimise this away automatically so
this is a cosmetic patch only in many cases.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ab16560b76e6..d00847bb1612 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6759,23 +6759,23 @@ void *__init alloc_large_system_hash(const char *tablename,
 }
 
 /* Return a pointer to the bitmap storing bits affecting a block of pages */
-static inline unsigned long *get_pageblock_bitmap(struct zone *zone,
+static inline unsigned long *get_pageblock_bitmap(struct page *page,
 							unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
 	return __pfn_to_section(pfn)->pageblock_flags;
 #else
-	return zone->pageblock_flags;
+	return page_zone(page)->pageblock_flags;
 #endif /* CONFIG_SPARSEMEM */
 }
 
-static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn)
+static inline int pfn_to_bitidx(struct page *page, unsigned long pfn)
 {
 #ifdef CONFIG_SPARSEMEM
 	pfn &= (PAGES_PER_SECTION-1);
 	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #else
-	pfn = pfn - round_down(zone->zone_start_pfn, pageblock_nr_pages);
+	pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages);
 	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
 #endif /* CONFIG_SPARSEMEM */
 }
@@ -6793,14 +6793,12 @@ unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn,
 					unsigned long end_bitidx,
 					unsigned long mask)
 {
-	struct zone *zone;
 	unsigned long *bitmap;
 	unsigned long bitidx, word_bitidx;
 	unsigned long word;
 
-	zone = page_zone(page);
-	bitmap = get_pageblock_bitmap(zone, pfn);
-	bitidx = pfn_to_bitidx(zone, pfn);
+	bitmap = get_pageblock_bitmap(page, pfn);
+	bitidx = pfn_to_bitidx(page, pfn);
 	word_bitidx = bitidx / BITS_PER_LONG;
 	bitidx &= (BITS_PER_LONG-1);
 
@@ -6822,20 +6820,18 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
 					unsigned long end_bitidx,
 					unsigned long mask)
 {
-	struct zone *zone;
 	unsigned long *bitmap;
 	unsigned long bitidx, word_bitidx;
 	unsigned long old_word, word;
 
 	BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
 
-	zone = page_zone(page);
-	bitmap = get_pageblock_bitmap(zone, pfn);
-	bitidx = pfn_to_bitidx(zone, pfn);
+	bitmap = get_pageblock_bitmap(page, pfn);
+	bitidx = pfn_to_bitidx(page, pfn);
 	word_bitidx = bitidx / BITS_PER_LONG;
 	bitidx &= (BITS_PER_LONG-1);
 
-	VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page);
+	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
 
 	bitidx += end_bitidx;
 	mask <<= (BITS_PER_LONG - bitidx - 1);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/24] mm, page_alloc: Convert alloc_flags to unsigned
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (6 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 07/24] mm, page_alloc: Avoid unnecessary zone lookups during pageblock operations Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 09/24] mm, page_alloc: Convert nr_fair_skipped to bool Mel Gorman
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler. Even
without an impact, it makes more sense that this be unsigned.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/compaction.h |  6 +++---
 include/linux/mmzone.h     |  3 ++-
 mm/compaction.c            | 12 +++++++-----
 mm/internal.h              |  2 +-
 mm/page_alloc.c            | 26 ++++++++++++++------------
 5 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index d7c8de583a23..242b660f64e6 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -39,12 +39,12 @@ extern int sysctl_compact_unevictable_allowed;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
-			int alloc_flags, const struct alloc_context *ac,
-			enum migrate_mode mode, int *contended);
+		unsigned int alloc_flags, const struct alloc_context *ac,
+		enum migrate_mode mode, int *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern unsigned long compaction_suitable(struct zone *zone, int order,
-					int alloc_flags, int classzone_idx);
+		unsigned int alloc_flags, int classzone_idx);
 
 extern void defer_compaction(struct zone *zone, int order);
 extern bool compaction_deferred(struct zone *zone, int order);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0c4d5ebb3849..f49bb9add372 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -747,7 +747,8 @@ extern struct mutex zonelists_mutex;
 void build_all_zonelists(pg_data_t *pgdat, struct zone *zone);
 void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx);
 bool zone_watermark_ok(struct zone *z, unsigned int order,
-		unsigned long mark, int classzone_idx, int alloc_flags);
+		unsigned long mark, int classzone_idx,
+		unsigned int alloc_flags);
 bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 		unsigned long mark, int classzone_idx);
 enum memmap_context {
diff --git a/mm/compaction.c b/mm/compaction.c
index ccf97b02b85f..244bb669b5a6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1259,7 +1259,8 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
  *   COMPACT_CONTINUE - If compaction should run now
  */
 static unsigned long __compaction_suitable(struct zone *zone, int order,
-					int alloc_flags, int classzone_idx)
+					unsigned int alloc_flags,
+					int classzone_idx)
 {
 	int fragindex;
 	unsigned long watermark;
@@ -1304,7 +1305,8 @@ static unsigned long __compaction_suitable(struct zone *zone, int order,
 }
 
 unsigned long compaction_suitable(struct zone *zone, int order,
-					int alloc_flags, int classzone_idx)
+					unsigned int alloc_flags,
+					int classzone_idx)
 {
 	unsigned long ret;
 
@@ -1464,7 +1466,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 static unsigned long compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum migrate_mode mode, int *contended,
-		int alloc_flags, int classzone_idx)
+		unsigned int alloc_flags, int classzone_idx)
 {
 	unsigned long ret;
 	struct compact_control cc = {
@@ -1505,8 +1507,8 @@ int sysctl_extfrag_threshold = 500;
  * This is the main entry point for direct page compaction.
  */
 unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
-			int alloc_flags, const struct alloc_context *ac,
-			enum migrate_mode mode, int *contended)
+		unsigned int alloc_flags, const struct alloc_context *ac,
+		enum migrate_mode mode, int *contended)
 {
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
diff --git a/mm/internal.h b/mm/internal.h
index b79abb6721cf..f6d0a5875ec4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -175,7 +175,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	int order;			/* order a direct compactor needs */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	const int alloc_flags;		/* alloc flags of a direct compactor */
+	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	struct zone *zone;
 	int contended;			/* Signal need_sched() or lock
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d00847bb1612..4bce6298dd07 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1526,7 +1526,7 @@ static inline bool free_pages_prezeroed(bool poisoned)
 }
 
 static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
-								int alloc_flags)
+							unsigned int alloc_flags)
 {
 	int i;
 	bool poisoned = true;
@@ -2388,7 +2388,8 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
 static inline
 struct page *buffered_rmqueue(struct zone *preferred_zone,
 			struct zone *zone, unsigned int order,
-			gfp_t gfp_flags, int alloc_flags, int migratetype)
+			gfp_t gfp_flags, unsigned int alloc_flags,
+			int migratetype)
 {
 	unsigned long flags;
 	struct page *page;
@@ -2542,12 +2543,13 @@ static inline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
  * to check in the allocation paths if no pages are free.
  */
 static bool __zone_watermark_ok(struct zone *z, unsigned int order,
-			unsigned long mark, int classzone_idx, int alloc_flags,
+			unsigned long mark, int classzone_idx,
+			unsigned int alloc_flags,
 			long free_pages)
 {
 	long min = mark;
 	int o;
-	const int alloc_harder = (alloc_flags & ALLOC_HARDER);
+	const bool alloc_harder = (alloc_flags & ALLOC_HARDER);
 
 	/* free_pages may go negative - that's OK */
 	free_pages -= (1 << order) - 1;
@@ -2610,7 +2612,7 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 }
 
 bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
-		      int classzone_idx, int alloc_flags)
+		      int classzone_idx, unsigned int alloc_flags)
 {
 	return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
 					zone_page_state(z, NR_FREE_PAGES));
@@ -2958,7 +2960,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 /* Try memory compaction for high-order allocations before reclaim */
 static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
-		int alloc_flags, const struct alloc_context *ac,
+		unsigned int alloc_flags, const struct alloc_context *ac,
 		enum migrate_mode mode, int *contended_compaction,
 		bool *deferred_compaction)
 {
@@ -3014,7 +3016,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 #else
 static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
-		int alloc_flags, const struct alloc_context *ac,
+		unsigned int alloc_flags, const struct alloc_context *ac,
 		enum migrate_mode mode, int *contended_compaction,
 		bool *deferred_compaction)
 {
@@ -3054,7 +3056,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 /* The really slow allocator path where we enter direct reclaim */
 static inline struct page *
 __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
-		int alloc_flags, const struct alloc_context *ac,
+		unsigned int alloc_flags, const struct alloc_context *ac,
 		unsigned long *did_some_progress)
 {
 	struct page *page = NULL;
@@ -3093,10 +3095,10 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
 		wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
 }
 
-static inline int
+static inline unsigned int
 gfp_to_alloc_flags(gfp_t gfp_mask)
 {
-	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
+	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
 	/* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
 	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
@@ -3157,7 +3159,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
 	struct page *page = NULL;
-	int alloc_flags;
+	unsigned int alloc_flags;
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
 	enum migrate_mode migration_mode = MIGRATE_ASYNC;
@@ -3349,7 +3351,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	struct zoneref *preferred_zoneref;
 	struct page *page = NULL;
 	unsigned int cpuset_mems_cookie;
-	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
+	unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = {
 		.high_zoneidx = gfp_zone(gfp_mask),
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/24] mm, page_alloc: Convert nr_fair_skipped to bool
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (7 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 08/24] mm, page_alloc: Convert alloc_flags to unsigned Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 10/24] mm, page_alloc: Remove unnecessary local variable in get_page_from_freelist Mel Gorman
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The number of zones skipped to a zone expiring its fair zone allocation quota
is irrelevant. Convert to bool.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4bce6298dd07..e778485a64c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2677,7 +2677,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	struct zoneref *z;
 	struct page *page = NULL;
 	struct zone *zone;
-	int nr_fair_skipped = 0;
+	bool fair_skipped;
 	bool zonelist_rescan;
 
 zonelist_scan:
@@ -2705,7 +2705,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 			if (!zone_local(ac->preferred_zone, zone))
 				break;
 			if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
-				nr_fair_skipped++;
+				fair_skipped = true;
 				continue;
 			}
 		}
@@ -2798,7 +2798,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	 */
 	if (alloc_flags & ALLOC_FAIR) {
 		alloc_flags &= ~ALLOC_FAIR;
-		if (nr_fair_skipped) {
+		if (fair_skipped) {
 			zonelist_rescan = true;
 			reset_alloc_batches(ac->preferred_zone);
 		}
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/24] mm, page_alloc: Remove unnecessary local variable in get_page_from_freelist
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (8 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 09/24] mm, page_alloc: Convert nr_fair_skipped to bool Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 11/24] mm, page_alloc: Remove unnecessary initialisation " Mel Gorman
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

zonelist here is a copy of a struct field that is used once. Ditch it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e778485a64c1..313db1c43839 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2673,7 +2673,6 @@ static struct page *
 get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 						const struct alloc_context *ac)
 {
-	struct zonelist *zonelist = ac->zonelist;
 	struct zoneref *z;
 	struct page *page = NULL;
 	struct zone *zone;
@@ -2687,7 +2686,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	 * Scan zonelist, looking for a zone with enough free.
 	 * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
 	 */
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
+	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
 		unsigned long mark;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/24] mm, page_alloc: Remove unnecessary initialisation in get_page_from_freelist
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (9 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 10/24] mm, page_alloc: Remove unnecessary local variable in get_page_from_freelist Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 12/24] mm, page_alloc: Remove unnecessary initialisation from __alloc_pages_nodemask() Mel Gorman
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

See subject.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 313db1c43839..f5ddb342c967 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2674,7 +2674,6 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 						const struct alloc_context *ac)
 {
 	struct zoneref *z;
-	struct page *page = NULL;
 	struct zone *zone;
 	bool fair_skipped;
 	bool zonelist_rescan;
@@ -2688,6 +2687,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	 */
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
+		struct page *page;
 		unsigned long mark;
 
 		if (cpusets_enabled() &&
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/24] mm, page_alloc: Remove unnecessary initialisation from __alloc_pages_nodemask()
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (10 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 11/24] mm, page_alloc: Remove unnecessary initialisation " Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 13/24] mm, page_alloc: Remove redundant check for empty zonelist Mel Gorman
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

page is guaranteed to be set before it is read with or without the
initialisation.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f5ddb342c967..df03ccc7f07c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3348,7 +3348,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 			struct zonelist *zonelist, nodemask_t *nodemask)
 {
 	struct zoneref *preferred_zoneref;
-	struct page *page = NULL;
+	struct page *page;
 	unsigned int cpuset_mems_cookie;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/24] mm, page_alloc: Remove redundant check for empty zonelist
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (11 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 12/24] mm, page_alloc: Remove unnecessary initialisation from __alloc_pages_nodemask() Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 14/24] mm, page_alloc: Simplify last cpupid reset Mel Gorman
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

A check is made for an empty zonelist early in the page allocator fast
path but it's unnecessary. The check after first_zones_zonelist call
should catch that situation. Removing the first check is slower for
machines with memoryless nodes but that is a corner case that can
live with the overhead.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df03ccc7f07c..e50e754ec9eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3374,14 +3374,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
-	/*
-	 * Check the zones suitable for the gfp_mask contain at least one
-	 * valid zone. It's possible to have an empty zonelist as a result
-	 * of __GFP_THISNODE and a memoryless node
-	 */
-	if (unlikely(!zonelist->_zonerefs->zone))
-		return NULL;
-
 	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/24] mm, page_alloc: Simplify last cpupid reset
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (12 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 13/24] mm, page_alloc: Remove redundant check for empty zonelist Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 15/24] mm, page_alloc: Move might_sleep_if check to the allocator slowpath Mel Gorman
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The current reset unnecessarily clears flags and makes pointless calculations.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/mm.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffcff53e3b2b..60656db00abd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -837,10 +837,7 @@ extern int page_cpupid_xchg_last(struct page *page, int cpupid);
 
 static inline void page_cpupid_reset_last(struct page *page)
 {
-	int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
-
-	page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
-	page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
+	page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT;
 }
 #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
 #else /* !CONFIG_NUMA_BALANCING */
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/24] mm, page_alloc: Move might_sleep_if check to the allocator slowpath
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (13 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 14/24] mm, page_alloc: Simplify last cpupid reset Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 16/24] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath Mel Gorman
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

There is a debugging check for callers that specify __GFP_DIRECT_RECLAIM
from a context that cannot sleep. Triggering this is almost certainly
a bug but it's also overhead in the fast path. Move the check to the slow
path. It'll be harder to trigger as it'll only be checked when watermarks
are depleted but it'll also only be checked in a path that can sleep.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e50e754ec9eb..73dc0413e997 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3176,6 +3176,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 	}
 
+	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
 	/*
 	 * We also sanity check to catch abuse of atomic reserves being used by
 	 * callers that are not in atomic context.
@@ -3369,8 +3371,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
 	lockdep_trace_alloc(gfp_mask);
 
-	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/24] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (14 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 15/24] mm, page_alloc: Move might_sleep_if check to the allocator slowpath Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 17/24] mm, page_alloc: Reduce cost of fair zone allocation policy retry Mel Gorman
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

__GFP_HARDWALL only has meaning in the context of cpusets but the fast path
always applies the flag on the first attempt. Move the manipulations into
the cpuset paths where they will be masked by a static branch in the common
case.

With the other micro-optimisations in this series combined, the impact on
a page allocator microbenchmark is

                                           4.6.0-rc2                  4.6.0-rc2
                                       decstat-v1r20                micro-v1r20
Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 73dc0413e997..219e0d05ed88 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3353,7 +3353,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	struct page *page;
 	unsigned int cpuset_mems_cookie;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
-	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+	gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = {
 		.high_zoneidx = gfp_zone(gfp_mask),
 		.zonelist = zonelist,
@@ -3362,6 +3362,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	};
 
 	if (cpusets_enabled()) {
+		alloc_mask |= __GFP_HARDWALL;
 		alloc_flags |= ALLOC_CPUSET;
 		if (!ac.nodemask)
 			ac.nodemask = &cpuset_current_mems_allowed;
@@ -3391,7 +3392,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
 	/* First allocation attempt */
-	alloc_mask = gfp_mask|__GFP_HARDWALL;
 	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
 	if (unlikely(!page)) {
 		/*
@@ -3417,8 +3417,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	 * the mask is being updated. If a page allocation is about to fail,
 	 * check if the cpuset changed during allocation and if so, retry.
 	 */
-	if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
+	if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) {
+		alloc_mask = gfp_mask;
 		goto retry_cpuset;
+	}
 
 	return page;
 }
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/24] mm, page_alloc: Reduce cost of fair zone allocation policy retry
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (15 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 16/24] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 18/24] mm, page_alloc: Shortcut watermark checks for order-0 pages Mel Gorman
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The fair zone allocation policy is not without cost but it can be reduced
slightly. This patch removes an unnecessary local variable, checks the
likely conditions of the fair zone policy first, uses a bool instead of
a flags check and falls through when a remote node is encountered instead
of doing a full restart. The benefit is marginal but it's there

                                           4.6.0-rc2                  4.6.0-rc2
                                       decstat-v1r20              optfair-v1r20
Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)

The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not triggered
by this particular test so the benefit for some corner cases is understated.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 219e0d05ed88..25a8ab07b287 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2675,12 +2675,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 {
 	struct zoneref *z;
 	struct zone *zone;
-	bool fair_skipped;
-	bool zonelist_rescan;
+	bool fair_skipped = false;
+	bool apply_fair = (alloc_flags & ALLOC_FAIR);
 
 zonelist_scan:
-	zonelist_rescan = false;
-
 	/*
 	 * Scan zonelist, looking for a zone with enough free.
 	 * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
@@ -2700,13 +2698,16 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		 * page was allocated in should have no effect on the
 		 * time the page has in memory before being reclaimed.
 		 */
-		if (alloc_flags & ALLOC_FAIR) {
-			if (!zone_local(ac->preferred_zone, zone))
-				break;
+		if (apply_fair) {
 			if (test_bit(ZONE_FAIR_DEPLETED, &zone->flags)) {
 				fair_skipped = true;
 				continue;
 			}
+			if (!zone_local(ac->preferred_zone, zone)) {
+				if (fair_skipped)
+					goto reset_fair;
+				apply_fair = false;
+			}
 		}
 		/*
 		 * When allocating a page cache page for writing, we
@@ -2795,18 +2796,13 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	 * include remote zones now, before entering the slowpath and waking
 	 * kswapd: prefer spilling to a remote zone over swapping locally.
 	 */
-	if (alloc_flags & ALLOC_FAIR) {
-		alloc_flags &= ~ALLOC_FAIR;
-		if (fair_skipped) {
-			zonelist_rescan = true;
-			reset_alloc_batches(ac->preferred_zone);
-		}
-		if (nr_online_nodes > 1)
-			zonelist_rescan = true;
-	}
-
-	if (zonelist_rescan)
+	if (fair_skipped) {
+reset_fair:
+		apply_fair = false;
+		fair_skipped = false;
+		reset_alloc_batches(ac->preferred_zone);
 		goto zonelist_scan;
+	}
 
 	return NULL;
 }
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/24] mm, page_alloc: Shortcut watermark checks for order-0 pages
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (16 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 17/24] mm, page_alloc: Reduce cost of fair zone allocation policy retry Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 19/24] mm, page_alloc: Avoid looking up the first zone in a zonelist twice Mel Gorman
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

Watermarks have to be checked on every allocation including the number of
pages being allocated and whether reserves can be accessed. The reserves
only matter if memory is limited and the free_pages adjustment only applies
to high-order pages. This patch adds a shortcut for order-0 pages that avoids
numerous calculations if there is plenty of free memory yielding the following
performance difference in a page allocator microbenchmark;

                                           4.6.0-rc2                  4.6.0-rc2
                                       optfair-v1r20             fastmark-v1r20
Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 25a8ab07b287..c131218913e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2618,6 +2618,32 @@ bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 					zone_page_state(z, NR_FREE_PAGES));
 }
 
+static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
+		unsigned long mark, int classzone_idx, unsigned int alloc_flags)
+{
+	long free_pages = zone_page_state(z, NR_FREE_PAGES);
+	long cma_pages = 0;
+
+#ifdef CONFIG_CMA
+	/* If allocation can't use CMA areas don't use free CMA pages */
+	if (!(alloc_flags & ALLOC_CMA))
+		cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
+#endif
+
+	/*
+	 * Fast check for order-0 only. If this fails then the reserves
+	 * need to be calculated. There is a corner case where the check
+	 * passes but only the high-order atomic reserve are free. If
+	 * the caller is !atomic then it'll uselessly search the free
+	 * list. That corner case is then slower but it is harmless.
+	 */
+	if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+		return true;
+
+	return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+					free_pages);
+}
+
 bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 			unsigned long mark, int classzone_idx)
 {
@@ -2739,7 +2765,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 			continue;
 
 		mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
-		if (!zone_watermark_ok(zone, order, mark,
+		if (!zone_watermark_fast(zone, order, mark,
 				       ac->classzone_idx, alloc_flags)) {
 			int ret;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/24] mm, page_alloc: Avoid looking up the first zone in a zonelist twice
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (17 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 18/24] mm, page_alloc: Shortcut watermark checks for order-0 pages Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 20/24] mm, page_alloc: Check multiple page fields with a single branch Mel Gorman
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The allocator fast path looks up the first usable zone in a zonelist
and then get_page_from_freelist does the same job in the zonelist
iterator. This patch preserves the necessary information.

                                           4.6.0-rc2                  4.6.0-rc2
                                      fastmark-v1r20             initonce-v1r20
Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)

The benefit is negligible and the results are within the noise but each
cycle counts.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 fs/buffer.c            | 10 +++++-----
 include/linux/mmzone.h | 18 +++++++++++-------
 mm/internal.h          |  2 +-
 mm/mempolicy.c         | 19 ++++++++++---------
 mm/page_alloc.c        | 34 ++++++++++++++++------------------
 5 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index af0d9a82a8ed..754813a6962b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -255,17 +255,17 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
  */
 static void free_more_memory(void)
 {
-	struct zone *zone;
+	struct zoneref *z;
 	int nid;
 
 	wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM);
 	yield();
 
 	for_each_online_node(nid) {
-		(void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
-						gfp_zone(GFP_NOFS), NULL,
-						&zone);
-		if (zone)
+
+		z = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
+						gfp_zone(GFP_NOFS), NULL);
+		if (z->zone)
 			try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
 						GFP_NOFS, NULL);
 	}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f49bb9add372..bf153ed097d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -962,13 +962,10 @@ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
  */
 static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
 					enum zone_type highest_zoneidx,
-					nodemask_t *nodes,
-					struct zone **zone)
+					nodemask_t *nodes)
 {
-	struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
+	return next_zones_zonelist(zonelist->_zonerefs,
 							highest_zoneidx, nodes);
-	*zone = zonelist_zone(z);
-	return z;
 }
 
 /**
@@ -983,10 +980,17 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
  * within a given nodemask
  */
 #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
-	for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone);	\
+	for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z);	\
 		zone;							\
 		z = next_zones_zonelist(++z, highidx, nodemask),	\
-			zone = zonelist_zone(z))			\
+			zone = zonelist_zone(z))
+
+#define for_next_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
+	for (zone = z->zone;	\
+		zone;							\
+		z = next_zones_zonelist(++z, highidx, nodemask),	\
+			zone = zonelist_zone(z))
+
 
 /**
  * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
diff --git a/mm/internal.h b/mm/internal.h
index f6d0a5875ec4..4c2396cd514c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -102,7 +102,7 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
 struct alloc_context {
 	struct zonelist *zonelist;
 	nodemask_t *nodemask;
-	struct zone *preferred_zone;
+	struct zoneref *preferred_zoneref;
 	int classzone_idx;
 	int migratetype;
 	enum zone_type high_zoneidx;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 36cc01bc950a..66d73efba370 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1744,18 +1744,18 @@ unsigned int mempolicy_slab_node(void)
 		return interleave_nodes(policy);
 
 	case MPOL_BIND: {
+		struct zoneref *z;
+
 		/*
 		 * Follow bind policy behavior and start allocation at the
 		 * first node.
 		 */
 		struct zonelist *zonelist;
-		struct zone *zone;
 		enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
 		zonelist = &NODE_DATA(node)->node_zonelists[0];
-		(void)first_zones_zonelist(zonelist, highest_zoneidx,
-							&policy->v.nodes,
-							&zone);
-		return zone ? zone->node : node;
+		z = first_zones_zonelist(zonelist, highest_zoneidx,
+							&policy->v.nodes);
+		return z->zone ? z->zone->node : node;
 	}
 
 	default:
@@ -2284,7 +2284,7 @@ static void sp_free(struct sp_node *n)
 int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr)
 {
 	struct mempolicy *pol;
-	struct zone *zone;
+	struct zoneref *z;
 	int curnid = page_to_nid(page);
 	unsigned long pgoff;
 	int thiscpu = raw_smp_processor_id();
@@ -2316,6 +2316,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 		break;
 
 	case MPOL_BIND:
+
 		/*
 		 * allows binding to multiple nodes.
 		 * use current page if in policy nodemask,
@@ -2324,11 +2325,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
 		 */
 		if (node_isset(curnid, pol->v.nodes))
 			goto out;
-		(void)first_zones_zonelist(
+		z = first_zones_zonelist(
 				node_zonelist(numa_node_id(), GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
-				&pol->v.nodes, &zone);
-		polnid = zone->node;
+				&pol->v.nodes);
+		polnid = z->zone->node;
 		break;
 
 	default:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c131218913e8..4019dfe26b11 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2699,7 +2699,7 @@ static struct page *
 get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 						const struct alloc_context *ac)
 {
-	struct zoneref *z;
+	struct zoneref *z = ac->preferred_zoneref;
 	struct zone *zone;
 	bool fair_skipped = false;
 	bool apply_fair = (alloc_flags & ALLOC_FAIR);
@@ -2709,7 +2709,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 	 * Scan zonelist, looking for a zone with enough free.
 	 * See also __cpuset_node_allowed() comment in kernel/cpuset.c.
 	 */
-	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
+	for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
 		struct page *page;
 		unsigned long mark;
@@ -2729,7 +2729,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 				fair_skipped = true;
 				continue;
 			}
-			if (!zone_local(ac->preferred_zone, zone)) {
+			if (!zone_local(ac->preferred_zoneref->zone, zone)) {
 				if (fair_skipped)
 					goto reset_fair;
 				apply_fair = false;
@@ -2775,7 +2775,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 				goto try_this_zone;
 
 			if (zone_reclaim_mode == 0 ||
-			    !zone_allows_reclaim(ac->preferred_zone, zone))
+			    !zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
 				continue;
 
 			ret = zone_reclaim(zone, gfp_mask, order);
@@ -2797,7 +2797,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		}
 
 try_this_zone:
-		page = buffered_rmqueue(ac->preferred_zone, zone, order,
+		page = buffered_rmqueue(ac->preferred_zoneref->zone, zone, order,
 				gfp_mask, alloc_flags, ac->migratetype);
 		if (page) {
 			if (prep_new_page(page, order, gfp_mask, alloc_flags))
@@ -2826,7 +2826,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 reset_fair:
 		apply_fair = false;
 		fair_skipped = false;
-		reset_alloc_batches(ac->preferred_zone);
+		reset_alloc_batches(ac->preferred_zoneref->zone);
 		goto zonelist_scan;
 	}
 
@@ -3113,7 +3113,7 @@ static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
 
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
 						ac->high_zoneidx, ac->nodemask)
-		wakeup_kswapd(zone, order, zone_idx(ac->preferred_zone));
+		wakeup_kswapd(zone, order, zonelist_zone_idx(ac->preferred_zoneref));
 }
 
 static inline unsigned int
@@ -3333,7 +3333,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
 	    ((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {
 		/* Wait for some write requests to complete then retry */
-		wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
+		wait_iff_congested(ac->preferred_zoneref->zone, BLK_RW_ASYNC, HZ/50);
 		goto retry;
 	}
 
@@ -3371,7 +3371,6 @@ struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 			struct zonelist *zonelist, nodemask_t *nodemask)
 {
-	struct zoneref *preferred_zoneref;
 	struct page *page;
 	unsigned int cpuset_mems_cookie;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW|ALLOC_FAIR;
@@ -3407,11 +3406,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
 
 	/* The preferred zone is used for statistics later */
-	preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
-				ac.nodemask, &ac.preferred_zone);
-	if (!ac.preferred_zone)
+	ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx,
+				ac.nodemask);
+	if (!ac.preferred_zoneref->zone)
 		goto out;
-	ac.classzone_idx = zonelist_zone_idx(preferred_zoneref);
+	ac.classzone_idx = zonelist_zone_idx(ac.preferred_zoneref);
 
 	/* First allocation attempt */
 	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
@@ -4440,13 +4439,12 @@ static void build_zonelists(pg_data_t *pgdat)
  */
 int local_memory_node(int node)
 {
-	struct zone *zone;
+	struct zoneref *z;
 
-	(void)first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
+	z = first_zones_zonelist(node_zonelist(node, GFP_KERNEL),
 				   gfp_zone(GFP_KERNEL),
-				   NULL,
-				   &zone);
-	return zone->node;
+				   NULL);
+	return z->zone->node;
 }
 #endif
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/24] mm, page_alloc: Check multiple page fields with a single branch
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (18 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 19/24] mm, page_alloc: Avoid looking up the first zone in a zonelist twice Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 21/24] cpuset: use static key better and convert to new API Mel Gorman
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later.  A bad page could be due to a number of
fields. Instead of using multiple branches, this patch combines multiple
fields into a single branch. A detailed check is only necessary if that
check fails.

                                           4.6.0-rc2                  4.6.0-rc2
                                      initonce-v1r20            multcheck-v1r20
Min      alloc-odr0-1               359.00 (  0.00%)           348.00 (  3.06%)
Min      alloc-odr0-2               260.00 (  0.00%)           254.00 (  2.31%)
Min      alloc-odr0-4               214.00 (  0.00%)           213.00 (  0.47%)
Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
Min      alloc-odr0-32              165.00 (  0.00%)           166.00 ( -0.61%)
Min      alloc-odr0-64              162.00 (  0.00%)           162.00 (  0.00%)
Min      alloc-odr0-128             161.00 (  0.00%)           160.00 (  0.62%)
Min      alloc-odr0-256             170.00 (  0.00%)           169.00 (  0.59%)
Min      alloc-odr0-512             181.00 (  0.00%)           180.00 (  0.55%)
Min      alloc-odr0-1024            190.00 (  0.00%)           188.00 (  1.05%)
Min      alloc-odr0-2048            196.00 (  0.00%)           194.00 (  1.02%)
Min      alloc-odr0-4096            202.00 (  0.00%)           199.00 (  1.49%)
Min      alloc-odr0-8192            205.00 (  0.00%)           202.00 (  1.46%)
Min      alloc-odr0-16384           205.00 (  0.00%)           203.00 (  0.98%)

Again, the benefit is marginal but avoiding excessive branches is
important. Ideally the paths would not have to check these conditions at
all but regrettably abandoning the tests would make use-after-free bugs
much harder to detect.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4019dfe26b11..0100609f6510 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -784,10 +784,42 @@ static inline void __free_one_page(struct page *page,
 	zone->free_area[order].nr_free++;
 }
 
+/*
+ * A bad page could be due to a number of fields. Instead of multiple branches,
+ * try and check multiple fields with one check. The caller must do a detailed
+ * check if necessary.
+ */
+static inline bool page_expected_state(struct page *page,
+					unsigned long check_flags)
+{
+	if (unlikely(atomic_read(&page->_mapcount) != -1))
+		return false;
+
+	if (unlikely((unsigned long)page->mapping |
+			page_ref_count(page) |
+#ifdef CONFIG_MEMCG
+			(unsigned long)page->mem_cgroup |
+#endif
+			(page->flags & check_flags)))
+		return false;
+
+	return true;
+}
+
 static inline int free_pages_check(struct page *page)
 {
-	const char *bad_reason = NULL;
-	unsigned long bad_flags = 0;
+	const char *bad_reason;
+	unsigned long bad_flags;
+
+	if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
+		page_cpupid_reset_last(page);
+		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+		return 0;
+	}
+
+	/* Something has gone sideways, find it */
+	bad_reason = NULL;
+	bad_flags = 0;
 
 	if (unlikely(atomic_read(&page->_mapcount) != -1))
 		bad_reason = "nonzero mapcount";
@@ -803,14 +835,8 @@ static inline int free_pages_check(struct page *page)
 	if (unlikely(page->mem_cgroup))
 		bad_reason = "page still charged to cgroup";
 #endif
-	if (unlikely(bad_reason)) {
-		bad_page(page, bad_reason, bad_flags);
-		return 1;
-	}
-	page_cpupid_reset_last(page);
-	if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
-		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
-	return 0;
+	bad_page(page, bad_reason, bad_flags);
+	return 1;
 }
 
 /*
@@ -1491,9 +1517,14 @@ static inline void expand(struct zone *zone, struct page *page,
  */
 static inline int check_new_page(struct page *page)
 {
-	const char *bad_reason = NULL;
-	unsigned long bad_flags = 0;
+	const char *bad_reason;
+	unsigned long bad_flags;
+
+	if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))
+		return 0;
 
+	bad_reason = NULL;
+	bad_flags = 0;
 	if (unlikely(atomic_read(&page->_mapcount) != -1))
 		bad_reason = "nonzero mapcount";
 	if (unlikely(page->mapping != NULL))
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 21/24] cpuset: use static key better and convert to new API
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (19 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 20/24] mm, page_alloc: Check multiple page fields with a single branch Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 22/24] mm, page_alloc: Check once if a zone has isolated pageblocks Mel Gorman
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

From: Vlastimil Babka <vbabka@suse.cz>

An important function for cpusets is cpuset_node_allowed(), which optimizes on
the fact if there's a single root CPU set, it must be trivially allowed. But
the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key
the right way where static keys eliminate branching overhead with jump labels.

This patch converts it so that static key is used properly. It's also switched
to the new static key API and the checking functions are converted to return
bool instead of int. We also provide a new variant __cpuset_zone_allowed()
which expects that the static key check was already done and they key was
enabled. This is needed for get_page_from_freelist() where we want to also
avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags.

The impact on the page allocator microbenchmark is less than expected but the
cleanup in itself is worthwhile.

                                           4.6.0-rc2                  4.6.0-rc2
                                     multcheck-v1r20               cpuset-v1r20
Min      alloc-odr0-1               348.00 (  0.00%)           348.00 (  0.00%)
Min      alloc-odr0-2               254.00 (  0.00%)           254.00 (  0.00%)
Min      alloc-odr0-4               213.00 (  0.00%)           213.00 (  0.00%)
Min      alloc-odr0-8               186.00 (  0.00%)           183.00 (  1.61%)
Min      alloc-odr0-16              173.00 (  0.00%)           171.00 (  1.16%)
Min      alloc-odr0-32              166.00 (  0.00%)           163.00 (  1.81%)
Min      alloc-odr0-64              162.00 (  0.00%)           159.00 (  1.85%)
Min      alloc-odr0-128             160.00 (  0.00%)           157.00 (  1.88%)
Min      alloc-odr0-256             169.00 (  0.00%)           166.00 (  1.78%)
Min      alloc-odr0-512             180.00 (  0.00%)           180.00 (  0.00%)
Min      alloc-odr0-1024            188.00 (  0.00%)           187.00 (  0.53%)
Min      alloc-odr0-2048            194.00 (  0.00%)           193.00 (  0.52%)
Min      alloc-odr0-4096            199.00 (  0.00%)           198.00 (  0.50%)
Min      alloc-odr0-8192            202.00 (  0.00%)           201.00 (  0.50%)
Min      alloc-odr0-16384           203.00 (  0.00%)           202.00 (  0.49%)

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/cpuset.h | 42 ++++++++++++++++++++++++++++--------------
 kernel/cpuset.c        | 14 +++++++-------
 mm/page_alloc.c        |  2 +-
 3 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index fea160ee5803..054c734d0170 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -16,26 +16,26 @@
 
 #ifdef CONFIG_CPUSETS
 
-extern struct static_key cpusets_enabled_key;
+extern struct static_key_false cpusets_enabled_key;
 static inline bool cpusets_enabled(void)
 {
-	return static_key_false(&cpusets_enabled_key);
+	return static_branch_unlikely(&cpusets_enabled_key);
 }
 
 static inline int nr_cpusets(void)
 {
 	/* jump label reference count + the top-level cpuset */
-	return static_key_count(&cpusets_enabled_key) + 1;
+	return static_key_count(&cpusets_enabled_key.key) + 1;
 }
 
 static inline void cpuset_inc(void)
 {
-	static_key_slow_inc(&cpusets_enabled_key);
+	static_branch_inc(&cpusets_enabled_key);
 }
 
 static inline void cpuset_dec(void)
 {
-	static_key_slow_dec(&cpusets_enabled_key);
+	static_branch_dec(&cpusets_enabled_key);
 }
 
 extern int cpuset_init(void);
@@ -48,16 +48,25 @@ extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 void cpuset_init_current_mems_allowed(void);
 int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask);
 
-extern int __cpuset_node_allowed(int node, gfp_t gfp_mask);
+extern bool __cpuset_node_allowed(int node, gfp_t gfp_mask);
 
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
 {
-	return nr_cpusets() <= 1 || __cpuset_node_allowed(node, gfp_mask);
+	if (cpusets_enabled())
+		return __cpuset_node_allowed(node, gfp_mask);
+	return true;
 }
 
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
 {
-	return cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+	return __cpuset_node_allowed(zone_to_nid(z), gfp_mask);
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+	if (cpusets_enabled())
+		return __cpuset_zone_allowed(z, gfp_mask);
+	return true;
 }
 
 extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
@@ -174,14 +183,19 @@ static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
 	return 1;
 }
 
-static inline int cpuset_node_allowed(int node, gfp_t gfp_mask)
+static inline bool cpuset_node_allowed(int node, gfp_t gfp_mask)
 {
-	return 1;
+	return true;
 }
 
-static inline int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+static inline bool __cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
 {
-	return 1;
+	return true;
+}
+
+static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
+{
+	return true;
 }
 
 static inline int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 00ab5c2b7c5b..37a0b44d101f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -62,7 +62,7 @@
 #include <linux/cgroup.h>
 #include <linux/wait.h>
 
-struct static_key cpusets_enabled_key __read_mostly = STATIC_KEY_INIT_FALSE;
+DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
 
 /* See "Frequency meter" comments, below. */
 
@@ -2528,27 +2528,27 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs)
  *	GFP_KERNEL   - any node in enclosing hardwalled cpuset ok
  *	GFP_USER     - only nodes in current tasks mems allowed ok.
  */
-int __cpuset_node_allowed(int node, gfp_t gfp_mask)
+bool __cpuset_node_allowed(int node, gfp_t gfp_mask)
 {
 	struct cpuset *cs;		/* current cpuset ancestors */
 	int allowed;			/* is allocation in zone z allowed? */
 	unsigned long flags;
 
 	if (in_interrupt())
-		return 1;
+		return true;
 	if (node_isset(node, current->mems_allowed))
-		return 1;
+		return true;
 	/*
 	 * Allow tasks that have access to memory reserves because they have
 	 * been OOM killed to get memory anywhere.
 	 */
 	if (unlikely(test_thread_flag(TIF_MEMDIE)))
-		return 1;
+		return true;
 	if (gfp_mask & __GFP_HARDWALL)	/* If hardwall request, stop here */
-		return 0;
+		return false;
 
 	if (current->flags & PF_EXITING) /* Let dying task have memory */
-		return 1;
+		return true;
 
 	/* Not hardwall and node outside mems_allowed: scan up cpusets */
 	spin_lock_irqsave(&callback_lock, flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0100609f6510..3fd8489b3055 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2747,7 +2747,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 
 		if (cpusets_enabled() &&
 			(alloc_flags & ALLOC_CPUSET) &&
-			!cpuset_zone_allowed(zone, gfp_mask))
+			!__cpuset_zone_allowed(zone, gfp_mask))
 				continue;
 		/*
 		 * Distribute pages in proportion to the individual
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 22/24] mm, page_alloc: Check once if a zone has isolated pageblocks
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (20 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 21/24] cpuset: use static key better and convert to new API Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 23/24] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk Mel Gorman
  2016-04-12 10:12 ` [PATCH 24/24] mm, page_alloc: Do not lookup pcp migratetype during bulk free Mel Gorman
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

When bulk freeing pages from the per-cpu lists the zone is checked
for isolated pageblocks on every release. This patch checks it once
per drain. Technically this is race-prone but so is the existing
code.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3fd8489b3055..854925c99c23 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -857,6 +857,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int batch_free = 0;
 	int to_free = count;
 	unsigned long nr_scanned;
+	bool isolated_pageblocks = has_isolate_pageblock(zone);
 
 	spin_lock(&zone->lock);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
@@ -896,7 +897,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* MIGRATE_ISOLATE page should not go to pcplists */
 			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
 			/* Pageblock could have been isolated meanwhile */
-			if (unlikely(has_isolate_pageblock(zone)))
+			if (unlikely(isolated_pageblocks))
 				mt = get_pageblock_migratetype(page);
 
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 23/24] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (21 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 22/24] mm, page_alloc: Check once if a zone has isolated pageblocks Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  2016-04-12 10:12 ` [PATCH 24/24] mm, page_alloc: Do not lookup pcp migratetype during bulk free Mel Gorman
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

The original count is never reused so it can be removed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 854925c99c23..1b1553c1156c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -855,7 +855,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 {
 	int migratetype = 0;
 	int batch_free = 0;
-	int to_free = count;
 	unsigned long nr_scanned;
 	bool isolated_pageblocks = has_isolate_pageblock(zone);
 
@@ -864,7 +863,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
 
-	while (to_free) {
+	while (count) {
 		struct page *page;
 		struct list_head *list;
 
@@ -884,7 +883,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 
 		/* This is the only non-empty list. Free them all. */
 		if (batch_free == MIGRATE_PCPTYPES)
-			batch_free = to_free;
+			batch_free = count;
 
 		do {
 			int mt;	/* migratetype of the to-be-freed page */
@@ -902,7 +901,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
-		} while (--to_free && --batch_free && !list_empty(list));
+		} while (--count && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
 }
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 24/24] mm, page_alloc: Do not lookup pcp migratetype during bulk free
  2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
                   ` (22 preceding siblings ...)
  2016-04-12 10:12 ` [PATCH 23/24] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk Mel Gorman
@ 2016-04-12 10:12 ` Mel Gorman
  23 siblings, 0 replies; 25+ messages in thread
From: Mel Gorman @ 2016-04-12 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vlastimil Babka, Linux-MM, LKML, Mel Gorman

During bulk free, the pcp type of the page is known as it was removed
from a specific list. It only needs to be rechecked if an isolated
pageblock exists. This patch removes an unnecessary variable in
the process. The impact is that the round-robin freeing of PCP
lists is distorted when an isolated pageblock is encountered but
that is a rare and harmless corner-case.

The impact on the page allocator microbench of the bulk free patches is
visible for higher batch counts when the bulk free paths are hit.

pagealloc
                                           4.6.0-rc3                  4.6.0-rc3
                                         cpuset-v2r2                 micro-v2r2
Min      free-odr0-1                191.00 (  0.00%)           195.00 ( -2.09%)
Min      free-odr0-2                136.00 (  0.00%)           136.00 (  0.00%)
Min      free-odr0-4                107.00 (  0.00%)           107.00 (  0.00%)
Min      free-odr0-8                 95.00 (  0.00%)            95.00 (  0.00%)
Min      free-odr0-16                87.00 (  0.00%)            87.00 (  0.00%)
Min      free-odr0-32                82.00 (  0.00%)            82.00 (  0.00%)
Min      free-odr0-64                80.00 (  0.00%)            80.00 (  0.00%)
Min      free-odr0-128               79.00 (  0.00%)            79.00 (  0.00%)
Min      free-odr0-256               94.00 (  0.00%)            97.00 ( -3.19%)
Min      free-odr0-512              112.00 (  0.00%)           109.00 (  2.68%)
Min      free-odr0-1024             118.00 (  0.00%)           118.00 (  0.00%)
Min      free-odr0-2048             123.00 (  0.00%)           121.00 (  1.63%)
Min      free-odr0-4096             127.00 (  0.00%)           125.00 (  1.57%)
Min      free-odr0-8192             129.00 (  0.00%)           127.00 (  1.55%)
Min      free-odr0-16384            128.00 (  0.00%)           127.00 (  0.78%)

It's tiny but the patches are trivial.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b1553c1156c..4d4079309760 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -876,7 +876,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 		 */
 		do {
 			batch_free++;
-			if (++migratetype == MIGRATE_PCPTYPES)
+			if (++migratetype >= MIGRATE_PCPTYPES)
 				migratetype = 0;
 			list = &pcp->lists[migratetype];
 		} while (list_empty(list));
@@ -886,21 +886,16 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			batch_free = count;
 
 		do {
-			int mt;	/* migratetype of the to-be-freed page */
-
 			page = list_last_entry(list, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
 
-			mt = get_pcppage_migratetype(page);
-			/* MIGRATE_ISOLATE page should not go to pcplists */
-			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
 			/* Pageblock could have been isolated meanwhile */
 			if (unlikely(isolated_pageblocks))
-				mt = get_pageblock_migratetype(page);
+				migratetype = get_pageblock_migratetype(page);
 
-			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
-			trace_mm_page_pcpu_drain(page, 0, mt);
+			__free_one_page(page, page_to_pfn(page), zone, 0, migratetype);
+			trace_mm_page_pcpu_drain(page, 0, migratetype);
 		} while (--count && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-04-12 10:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 10:12 [PATCH 00/24] Optimise page alloc/free fast paths v2 Mel Gorman
2016-04-12 10:12 ` [PATCH 01/24] mm, page_alloc: Only check PageCompound for high-order pages Mel Gorman
2016-04-12 10:12 ` [PATCH 02/24] mm, page_alloc: Use new PageAnonHead helper in the free page fast path Mel Gorman
2016-04-12 10:12 ` [PATCH 03/24] mm, page_alloc: Reduce branches in zone_statistics Mel Gorman
2016-04-12 10:12 ` [PATCH 04/24] mm, page_alloc: Inline zone_statistics Mel Gorman
2016-04-12 10:12 ` [PATCH 05/24] mm, page_alloc: Inline the fast path of the zonelist iterator Mel Gorman
2016-04-12 10:12 ` [PATCH 06/24] mm, page_alloc: Use __dec_zone_state for order-0 page allocation Mel Gorman
2016-04-12 10:12 ` [PATCH 07/24] mm, page_alloc: Avoid unnecessary zone lookups during pageblock operations Mel Gorman
2016-04-12 10:12 ` [PATCH 08/24] mm, page_alloc: Convert alloc_flags to unsigned Mel Gorman
2016-04-12 10:12 ` [PATCH 09/24] mm, page_alloc: Convert nr_fair_skipped to bool Mel Gorman
2016-04-12 10:12 ` [PATCH 10/24] mm, page_alloc: Remove unnecessary local variable in get_page_from_freelist Mel Gorman
2016-04-12 10:12 ` [PATCH 11/24] mm, page_alloc: Remove unnecessary initialisation " Mel Gorman
2016-04-12 10:12 ` [PATCH 12/24] mm, page_alloc: Remove unnecessary initialisation from __alloc_pages_nodemask() Mel Gorman
2016-04-12 10:12 ` [PATCH 13/24] mm, page_alloc: Remove redundant check for empty zonelist Mel Gorman
2016-04-12 10:12 ` [PATCH 14/24] mm, page_alloc: Simplify last cpupid reset Mel Gorman
2016-04-12 10:12 ` [PATCH 15/24] mm, page_alloc: Move might_sleep_if check to the allocator slowpath Mel Gorman
2016-04-12 10:12 ` [PATCH 16/24] mm, page_alloc: Move __GFP_HARDWALL modifications out of the fastpath Mel Gorman
2016-04-12 10:12 ` [PATCH 17/24] mm, page_alloc: Reduce cost of fair zone allocation policy retry Mel Gorman
2016-04-12 10:12 ` [PATCH 18/24] mm, page_alloc: Shortcut watermark checks for order-0 pages Mel Gorman
2016-04-12 10:12 ` [PATCH 19/24] mm, page_alloc: Avoid looking up the first zone in a zonelist twice Mel Gorman
2016-04-12 10:12 ` [PATCH 20/24] mm, page_alloc: Check multiple page fields with a single branch Mel Gorman
2016-04-12 10:12 ` [PATCH 21/24] cpuset: use static key better and convert to new API Mel Gorman
2016-04-12 10:12 ` [PATCH 22/24] mm, page_alloc: Check once if a zone has isolated pageblocks Mel Gorman
2016-04-12 10:12 ` [PATCH 23/24] mm, page_alloc: Remove unnecessary variable from free_pcppages_bulk Mel Gorman
2016-04-12 10:12 ` [PATCH 24/24] mm, page_alloc: Do not lookup pcp migratetype during bulk free Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).