linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/8] mm: freshen percpu pageset code
@ 2013-10-15 20:35 Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions Dave Hansen
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen

The percpu pageset (pcp) code is looking a little old and
neglected these days.  This set does a couple of these things (in
order of importance, not order of implementation in the series):

1. Change the default pageset pcp->high value from 744kB
   to 512k.  (see "consolidate high-to-batch ratio code")
2. Allow setting of vm.percpu_pagelist_fraction=0, which
   takes you back to the boot-time behavior
3. Resolve inconsistencies in the way the boot-time and
   sysctl pcp code works.
4. Clarify some function names and code comments.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-17  1:32   ` David Rientjes
  2013-10-15 20:35 ` [RFC][PATCH 2/8] mm: pcp: consolidate percpu_pagelist_fraction code Dave Hansen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

The per-cpu-pageset code has two distinct ways of being set up:
 1. The boot-time code (the defaults that everybody runs with)
    calculates a batch size, then sets pcp->high to 6x that
    batch size.
 2. The percpu_pagelist_fraction sysctl code takes a pcp->high
    value in from userspace and sets pcp->batch value to 1/4
    of the ->high value.

The crummy part is that those are called pageset_set_batch() and
pageset_set_high(), respectively.  Those names make it sound
awfully like high *OR* batch is being set, when actually both
are being set.

This patch renames those two setup functions to be more clear in
what they are doing:
 1. pageset_setup_from_batch_size(batch)
 2. pageset_setup_from_high_mark(high)

The "max(1UL, 1 * batch)" construct was from Christoph Lameter in
commit 2caaad41.  I'm not quite sure what the purpose of the
"1 * batch" is.  Considering that 'batch' is unsigned, the only
value the max() could be correcting is 0.  Just make the check a
plain old if() so that it is a bit less obtuse.

Note: pageset_setup_from_high_mark() does not survive this
series.  I change it here for clarity and parity with its twin
even though I eventually kill it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/page_alloc.c |   33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff -puN mm/page_alloc.c~rename-pageset-functions mm/page_alloc.c
--- linux.git/mm/page_alloc.c~rename-pageset-functions	2013-10-15 09:57:05.870612107 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:05.875612329 -0700
@@ -4136,10 +4136,18 @@ static void pageset_update(struct per_cp
 	pcp->batch = batch;
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+/*
+ * Set the batch size for hot per_cpu_pagelist, and derive
+ * the high water mark from the batch size.
+ */
+static void pageset_setup_from_batch_size(struct per_cpu_pageset *p,
+					unsigned long batch)
 {
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
+	unsigned long high;
+	high = 6 * batch;
+	if (!batch)
+		batch = 1;
+	pageset_update(&p->pcp, high, batch);
 }
 
 static void pageset_init(struct per_cpu_pageset *p)
@@ -4158,15 +4166,15 @@ static void pageset_init(struct per_cpu_
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	pageset_init(p);
-	pageset_set_batch(p, batch);
+	pageset_setup_from_batch_size(p, batch);
 }
 
 /*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
+ * Set the high water mark for the per_cpu_pagelist, and derive
+ * the batch size from this high mark.
  */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
+static void pageset_setup_from_high_mark(struct per_cpu_pageset *p,
+					unsigned long high)
 {
 	unsigned long batch = max(1UL, high / 4);
 	if ((high / 4) > (PAGE_SHIFT * 8))
@@ -4179,11 +4187,11 @@ static void __meminit pageset_set_high_a
 		struct per_cpu_pageset *pcp)
 {
 	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
+		pageset_setup_from_high_mark(pcp,
 			(zone->managed_pages /
 				percpu_pagelist_fraction));
 	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
+		pageset_setup_from_batch_size(pcp, zone_batchsize(zone));
 }
 
 static void __meminit zone_pageset_init(struct zone *zone, int cpu)
@@ -5781,8 +5789,9 @@ int percpu_pagelist_fraction_sysctl_hand
 		unsigned long  high;
 		high = zone->managed_pages / percpu_pagelist_fraction;
 		for_each_possible_cpu(cpu)
-			pageset_set_high(per_cpu_ptr(zone->pageset, cpu),
-					 high);
+			pageset_setup_from_high_mark(
+					per_cpu_ptr(zone->pageset, cpu),
+					high);
 	}
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 2/8] mm: pcp: consolidate percpu_pagelist_fraction code
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 3/8] mm: pcp: separate pageset update code from sysctl code Dave Hansen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

pageset_set_high_and_batch() and percpu_pagelist_fraction_sysctl_handler()
both do the same calculation for establishing pcp->high:

	high = zone->managed_pages / percpu_pagelist_fraction;

pageset_set_high_and_batch() also knows when it should be
using the sysctl-provided value or the boot-time default
behavior.  There's no reason to keep
percpu_pagelist_fraction_sysctl_handler()'s copy separate.
So, consolidate them.

The only bummer here is that pageset_set_high_and_batch() is
currently __meminit.  So, axe that and make it available at
runtime.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/page_alloc.c |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff -puN mm/page_alloc.c~consolidate-percpu_pagelist_fraction-code mm/page_alloc.c
--- linux.git/mm/page_alloc.c~consolidate-percpu_pagelist_fraction-code	2013-10-15 09:57:06.143624213 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:06.148624435 -0700
@@ -4183,7 +4183,7 @@ static void pageset_setup_from_high_mark
 	pageset_update(&p->pcp, high, batch);
 }
 
-static void __meminit pageset_set_high_and_batch(struct zone *zone,
+static void pageset_set_high_and_batch(struct zone *zone,
 		struct per_cpu_pageset *pcp)
 {
 	if (percpu_pagelist_fraction)
@@ -5785,14 +5785,10 @@ int percpu_pagelist_fraction_sysctl_hand
 		return ret;
 
 	mutex_lock(&pcp_batch_high_lock);
-	for_each_populated_zone(zone) {
-		unsigned long  high;
-		high = zone->managed_pages / percpu_pagelist_fraction;
+	for_each_populated_zone(zone)
 		for_each_possible_cpu(cpu)
-			pageset_setup_from_high_mark(
-					per_cpu_ptr(zone->pageset, cpu),
-					high);
-	}
+			pageset_set_high_and_batch(zone,
+					per_cpu_ptr(zone->pageset, cpu));
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
 }
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 3/8] mm: pcp: separate pageset update code from sysctl code
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 2/8] mm: pcp: consolidate percpu_pagelist_fraction code Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 4/8] mm: pcp: move pageset sysctl code to sysctl.c Dave Hansen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This begins the work of moving the percpu pageset sysctl code
out of page_alloc.c.  update_all_zone_pageset_limits() is the
now the only interface that the sysctl code *really* needs out
of page_alloc.c.

This helps make it very clear what the interactions are between
the actual sysctl code and the core page alloc code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/page_alloc.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff -puN mm/page_alloc.c~separate-pageset-code-from-sysctl mm/page_alloc.c
--- linux.git/mm/page_alloc.c~separate-pageset-code-from-sysctl	2013-10-15 09:57:06.415636275 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:06.421636541 -0700
@@ -5768,6 +5768,19 @@ int lowmem_reserve_ratio_sysctl_handler(
 	return 0;
 }
 
+void update_all_zone_pageset_limits(void)
+{
+	struct zone *zone;
+	unsigned int cpu;
+
+	mutex_lock(&pcp_batch_high_lock);
+	for_each_populated_zone(zone)
+		for_each_possible_cpu(cpu)
+			pageset_set_high_and_batch(zone,
+					per_cpu_ptr(zone->pageset, cpu));
+	mutex_unlock(&pcp_batch_high_lock);
+}
+
 /*
  * percpu_pagelist_fraction - changes the pcp->high for each zone on each
  * cpu.  It is the fraction of total pages in each zone that a hot per cpu
@@ -5776,20 +5789,12 @@ int lowmem_reserve_ratio_sysctl_handler(
 int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
-	struct zone *zone;
-	unsigned int cpu;
-	int ret;
-
-	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+	int ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
 	if (!write || (ret < 0))
 		return ret;
 
-	mutex_lock(&pcp_batch_high_lock);
-	for_each_populated_zone(zone)
-		for_each_possible_cpu(cpu)
-			pageset_set_high_and_batch(zone,
-					per_cpu_ptr(zone->pageset, cpu));
-	mutex_unlock(&pcp_batch_high_lock);
+	update_all_zone_pageset_limits();
+
 	return 0;
 }
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 4/8] mm: pcp: move pageset sysctl code to sysctl.c
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
                   ` (2 preceding siblings ...)
  2013-10-15 20:35 ` [RFC][PATCH 3/8] mm: pcp: separate pageset update code from sysctl code Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 5/8] mm: pcp: make percpu_pagelist_fraction sysctl undoable Dave Hansen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

The percpu_pagelist_fraction_sysctl_handler() code is currently
in page_alloc.c, probably because it uses some functions static
to that file.  Now that it is smaller and its interactions with
the rest of the allocator code are confined to
update_all_zone_pageset_limits(), it is much less bound to that
file.

We will replace proc_dointvec_minmax() with a function private
to sysctl.c in the next patch.  We are stuck either exporting
that (ugly) function in the sysctl header, or exporting
update_all_zone_pageset_limits() from the mm headers.  I chose
to export from the mm headers since the function is simpler and
much less likely to get used in bad ways.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/include/linux/gfp.h    |    1 +
 linux.git-davehans/include/linux/mmzone.h |    2 --
 linux.git-davehans/kernel/sysctl.c        |   20 ++++++++++++++++++++
 linux.git-davehans/mm/page_alloc.c        |   17 -----------------
 4 files changed, 21 insertions(+), 19 deletions(-)

diff -puN include/linux/gfp.h~move-pageset-sysctl-code include/linux/gfp.h
--- linux.git/include/linux/gfp.h~move-pageset-sysctl-code	2013-10-15 09:57:06.691648515 -0700
+++ linux.git-davehans/include/linux/gfp.h	2013-10-15 09:57:06.700648914 -0700
@@ -374,6 +374,7 @@ extern void free_memcg_kmem_pages(unsign
 #define free_page(addr) free_pages((addr), 0)
 
 void page_alloc_init(void);
+void update_all_zone_pageset_limits(void);
 void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(void);
 void drain_local_pages(void *dummy);
diff -puN include/linux/mmzone.h~move-pageset-sysctl-code include/linux/mmzone.h
--- linux.git/include/linux/mmzone.h~move-pageset-sysctl-code	2013-10-15 09:57:06.693648603 -0700
+++ linux.git-davehans/include/linux/mmzone.h	2013-10-15 09:57:06.701648958 -0700
@@ -894,8 +894,6 @@ int min_free_kbytes_sysctl_handler(struc
 extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
-int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
-					void __user *, size_t *, loff_t *);
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
 			void __user *, size_t *, loff_t *);
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
diff -puN kernel/sysctl.c~move-pageset-sysctl-code kernel/sysctl.c
--- linux.git/kernel/sysctl.c~move-pageset-sysctl-code	2013-10-15 09:57:06.694648648 -0700
+++ linux.git-davehans/kernel/sysctl.c	2013-10-15 09:57:06.702649002 -0700
@@ -176,6 +176,9 @@ static int proc_taint(struct ctl_table *
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
 #endif
 
+static int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
+			void __user *buffer, size_t *length, loff_t *ppos);
+
 #ifdef CONFIG_PRINTK
 static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -2455,6 +2458,23 @@ static int proc_do_cad_pid(struct ctl_ta
 	return 0;
 }
 
+/*
+ * percpu_pagelist_fraction - changes the pcp->high for each zone on each
+ * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
+ * can have before it gets flushed back to buddy allocator.
+ */
+static int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
+	void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+	if (!write || (ret < 0))
+		return ret;
+
+	update_all_zone_pageset_limits();
+
+	return 0;
+}
+
 /**
  * proc_do_large_bitmap - read/write from/to a large bitmap
  * @table: the sysctl table
diff -puN mm/page_alloc.c~move-pageset-sysctl-code mm/page_alloc.c
--- linux.git/mm/page_alloc.c~move-pageset-sysctl-code	2013-10-15 09:57:06.697648781 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:06.704649091 -0700
@@ -5781,23 +5781,6 @@ void update_all_zone_pageset_limits(void
 	mutex_unlock(&pcp_batch_high_lock);
 }
 
-/*
- * percpu_pagelist_fraction - changes the pcp->high for each zone on each
- * cpu.  It is the fraction of total pages in each zone that a hot per cpu
- * pagelist can have before it gets flushed back to buddy allocator.
- */
-int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
-	void __user *buffer, size_t *length, loff_t *ppos)
-{
-	int ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
-	if (!write || (ret < 0))
-		return ret;
-
-	update_all_zone_pageset_limits();
-
-	return 0;
-}
-
 int hashdist = HASHDIST_DEFAULT;
 
 #ifdef CONFIG_NUMA
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 5/8] mm: pcp: make percpu_pagelist_fraction sysctl undoable
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
                   ` (3 preceding siblings ...)
  2013-10-15 20:35 ` [RFC][PATCH 4/8] mm: pcp: move pageset sysctl code to sysctl.c Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 6/8] mm: pcp: consolidate high-to-batch ratio code Dave Hansen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

The kernel has two methods of setting the sizes of the percpu
pagesets:

 1. The default, according to a page_alloc.c comment is "set to
    around 1000th of the size of the zone.  But no more than 1/2
    of a meg."
 2. After boot, vm.percpu_pagelist_fraction can be set to
    override the default.

However, the trip from 1->2 is a one-way street.  There's no way
to get back.  You can get either the 'high' or 'batch' values to
match the boot-time value, but since the relationship between the
two is different in the two different modes, you can never get
back _exactly_ to where you were.  This kinda sucks if you are
trying to do performance testing to find optimal values.

Note that we remove the .extra1 argument to the sysctl structure.
The bounding behavior is now open-coded in the handler.

Since we are now able to go back to the boot-time values, we
need the boot-time function zone_batchsize() to be available
at runtime, so remove its __meminit.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/Documentation/sysctl/vm.txt |    6 +++---
 linux.git-davehans/kernel/sysctl.c             |   25 +++++++++++++++++++++----
 linux.git-davehans/mm/page_alloc.c             |    2 +-
 3 files changed, 25 insertions(+), 8 deletions(-)

diff -puN Documentation/sysctl/vm.txt~make-percpu_pagelist_fraction-sysctl-undoable Documentation/sysctl/vm.txt
--- linux.git/Documentation/sysctl/vm.txt~make-percpu_pagelist_fraction-sysctl-undoable	2013-10-15 09:57:07.004662395 -0700
+++ linux.git-davehans/Documentation/sysctl/vm.txt	2013-10-15 09:57:07.011662705 -0700
@@ -653,6 +653,9 @@ why oom happens. You can get snapshot.
 
 percpu_pagelist_fraction
 
+Set (at boot) to 0.  The kernel will size each percpu pagelist to around
+1/1000th of the size of the zone but limited to be around 0.75MB.
+
 This is the fraction of pages at most (high mark pcp->high) in each zone that
 are allocated for each per cpu page list.  The min value for this is 8.  It
 means that we don't allow more than 1/8th of pages in each zone to be
@@ -663,9 +666,6 @@ of hot per cpu pagelists.  User can spec
 The batch value of each per cpu pagelist is also updated as a result.  It is
 set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
 
-The initial value is zero.  Kernel does not use this value at boot time to set
-the high water marks for each per cpu page list.
-
 ==============================================================
 
 stat_interval
diff -puN kernel/sysctl.c~make-percpu_pagelist_fraction-sysctl-undoable kernel/sysctl.c
--- linux.git/kernel/sysctl.c~make-percpu_pagelist_fraction-sysctl-undoable	2013-10-15 09:57:07.005662439 -0700
+++ linux.git-davehans/kernel/sysctl.c	2013-10-15 09:57:07.012662750 -0700
@@ -138,7 +138,6 @@ static unsigned long dirty_bytes_min = 2
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
 static int minolduid;
-static int min_percpu_pagelist_fract = 8;
 
 static int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
@@ -1289,7 +1288,6 @@ static struct ctl_table vm_table[] = {
 		.maxlen		= sizeof(percpu_pagelist_fraction),
 		.mode		= 0644,
 		.proc_handler	= percpu_pagelist_fraction_sysctl_handler,
-		.extra1		= &min_percpu_pagelist_fract,
 	},
 #ifdef CONFIG_MMU
 	{
@@ -1910,7 +1908,7 @@ static int do_proc_dointvec_conv(bool *n
 
 static const char proc_wspace_sep[] = { ' ', '\t', '\n' };
 
-static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
+int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
 		  int write, void __user *buffer,
 		  size_t *lenp, loff_t *ppos,
 		  int (*conv)(bool *negp, unsigned long *lvalp, int *valp,
@@ -2466,7 +2464,26 @@ static int proc_do_cad_pid(struct ctl_ta
 static int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
-	int ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+ 	int ret;
+	int tmp = percpu_pagelist_fraction;
+	int min_percpu_pagelist_fract = 8;
+
+	ret = __do_proc_dointvec(&tmp, table, write, buffer, length, ppos,
+		       NULL, NULL);
+	/*
+	 * We want values >= min_percpu_pagelist_fract, but we
+	 * also accept 0 to mean "stop using the fractions and
+	 * go back to the default behavior".
+	 */
+	if (write) {
+		if (tmp < 0)
+			return -EINVAL;
+		if ((tmp < min_percpu_pagelist_fract) &&
+		    (tmp != 0))
+			return -EINVAL;
+		percpu_pagelist_fraction = tmp;
+	}
+
 	if (!write || (ret < 0))
 		return ret;
 
diff -puN mm/page_alloc.c~make-percpu_pagelist_fraction-sysctl-undoable mm/page_alloc.c
--- linux.git/mm/page_alloc.c~make-percpu_pagelist_fraction-sysctl-undoable	2013-10-15 09:57:07.008662572 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:07.015662883 -0700
@@ -4059,7 +4059,7 @@ static void __meminit zone_init_free_lis
 	memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
-static int __meminit zone_batchsize(struct zone *zone)
+static int zone_batchsize(struct zone *zone)
 {
 #ifdef CONFIG_MMU
 	int batch;
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 6/8] mm: pcp: consolidate high-to-batch ratio code
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
                   ` (4 preceding siblings ...)
  2013-10-15 20:35 ` [RFC][PATCH 5/8] mm: pcp: make percpu_pagelist_fraction sysctl undoable Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 7/8] mm: pcp: move page coloring optimization away from pcp sizing Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 8/8] mm: pcp: create setup_boot_pageset() Dave Hansen
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Up until now in this patch set, we really should not have been
changing any behavior that users would notice.  This patch
potentially has performance implications for virtually all users
since it changes the kernel's default behavior.

The per-cpu-pageset code currently has hard-coded ratios which
relate the batch size to the high watermark.  However, the ratio
is different for the boot-time and sysctl-set variants, and I
believe this difference in the code was an accident.

This patch introduces a common variable to store this ratio, no
matter whether we are using the default or sysctl code.  It also
changes the default boot-time ratio from 6:1 to 4:1, since I
believe that we never intended to make it a 6:1 ratio.  As best
I can tell, that change came from e46a5e28c, and there is no
mention in that patch of doing this.  The *correct* thing in
that patch would have been to drop ->low from 2->0 and also drop
high from 6->4 to keep the average size of the pool the same.

BTW, I'm fairly ambivalent on whether the ratio should really
4:1 or 6:1.  We obviously intended it to be 4:1, but it's been
6:1 for 8 or so years.

I did quite a bit of testing on some large (160-cpu) and medium
(12-cpu) systems.  On the 12-cpu system, I ran several hundred
allyesconfig compiles varying the ->high watermark (x axis) to
see if there was a sweet spot for these values (y axis is seconds
to complete a kernel compile):

	http://sr71.net/~dave/intel/201310-pcp/pcp1.png

As you can see, the results are all over the map.  Doing a
running-average, things calm down a bit:

	http://sr71.net/~dave/intel/201310-pcp/pcp-runavg5.png

but still not enough for me to say that we can see any real
trends.

A little more investigation of the code follow, but it's probably
more than most readers care about.

---

Looking at the code, I can not really grok what this comment in
zone_batchsize() means:

	batch /= 4;             /* We effectively *= 4 below */

It surely can't refer to the:

	batch = rounddown_pow_of_two(batch + batch/2) - 1;

code in the same function since the round down code at *MOST*
does a *= 1.5 (but *averages* out to be just under 1).  I
*think* this comment refers to the code which is now in:

static void pageset_set_batch(struct per_cpu_pageset *p...
{
        pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
}

Where the 6*batch argument is the "high" mark.  Note that we do a
/=4, but then follow up with a 6*batch later.  These got
mismatched when the pcp->low code got removed.  The result is
that we now operate by default with a 6:1 high:batch ratio where
the percpu_pagelist_fraction sysctl code operates with a 4:1
ratio:

static void pageset_set_high(struct per_cpu_pageset *p...
{
        unsigned long batch = max(1UL, high / 4);

I would suspect that this ratio isn't all that important since
nobody seems to have ever noticed this, plus I wasn't able to
observe it _doing_ anything in my benchmarks.  Furthermore, the
*actual* ratio for the sysctl-set pagelist sizes is variable since
it clamps the batch size to <=PAGE_SHIFT*8 on matter how large
->high is.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/Documentation/sysctl/vm.txt |    2 +-
 linux.git-davehans/mm/page_alloc.c             |   10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff -puN Documentation/sysctl/vm.txt~fix-pcp-batch-calculation Documentation/sysctl/vm.txt
--- linux.git/Documentation/sysctl/vm.txt~fix-pcp-batch-calculation	2013-10-15 09:57:07.304675699 -0700
+++ linux.git-davehans/Documentation/sysctl/vm.txt	2013-10-15 09:57:07.309675920 -0700
@@ -654,7 +654,7 @@ why oom happens. You can get snapshot.
 percpu_pagelist_fraction
 
 Set (at boot) to 0.  The kernel will size each percpu pagelist to around
-1/1000th of the size of the zone but limited to be around 0.75MB.
+1/1000th of the size of the zone (but no larger than 512kB).
 
 This is the fraction of pages at most (high mark pcp->high) in each zone that
 are allocated for each per cpu page list.  The min value for this is 8.  It
diff -puN mm/page_alloc.c~fix-pcp-batch-calculation mm/page_alloc.c
--- linux.git/mm/page_alloc.c~fix-pcp-batch-calculation	2013-10-15 09:57:07.306675787 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:07.312676053 -0700
@@ -4059,6 +4059,8 @@ static void __meminit zone_init_free_lis
 	memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
+static int pcp_high_to_batch_ratio = 4;
+
 static int zone_batchsize(struct zone *zone)
 {
 #ifdef CONFIG_MMU
@@ -4073,7 +4075,7 @@ static int zone_batchsize(struct zone *z
 	batch = zone->managed_pages / 1024;
 	if (batch * PAGE_SIZE > 512 * 1024)
 		batch = (512 * 1024) / PAGE_SIZE;
-	batch /= 4;		/* We effectively *= 4 below */
+	batch /= pcp_high_to_batch_ratio;
 	if (batch < 1)
 		batch = 1;
 
@@ -4144,7 +4146,7 @@ static void pageset_setup_from_batch_siz
 					unsigned long batch)
 {
 	unsigned long high;
-	high = 6 * batch;
+	high = pcp_high_to_batch_ratio * batch;
 	if (!batch)
 		batch = 1;
 	pageset_update(&p->pcp, high, batch);
@@ -4176,8 +4178,8 @@ static void setup_pageset(struct per_cpu
 static void pageset_setup_from_high_mark(struct per_cpu_pageset *p,
 					unsigned long high)
 {
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
+	unsigned long batch = max(1UL, high / pcp_high_to_batch_ratio);
+	if ((high / pcp_high_to_batch_ratio) > (PAGE_SHIFT * 8))
 		batch = PAGE_SHIFT * 8;
 
 	pageset_update(&p->pcp, high, batch);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 7/8] mm: pcp: move page coloring optimization away from pcp sizing
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
                   ` (5 preceding siblings ...)
  2013-10-15 20:35 ` [RFC][PATCH 6/8] mm: pcp: consolidate high-to-batch ratio code Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  2013-10-15 20:35 ` [RFC][PATCH 8/8] mm: pcp: create setup_boot_pageset() Dave Hansen
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

The percpu pages calculations are a bit convoluted.  Right now,
zone_batchsize() claims to be calculating the ->batch size, but
what actually happens is:

1. Calculate how large we want the entire pcp set to be (->high)
2. Scale that down by the ratio that we want high:batch to be
3. Adjust ->batch for good cache-coloring behavior
4. Re-derive ->high by scaling back up by the (2) ratio

We actually feed the cache-coloring scaling back in to the ->high
value, when it really only *should* apply to the batch value.
That was probably unintentional, and it was one of the things
that led us to mismatching the high:batch ratio that we saw in
the previous patch.

This patch reorganizes the code.  It separates out the ->batch
and ->high calculations so that it's clear when we are
calculating each of them.  It also ensures that we always
calculate ->high _first_, then derive ->batch from it, finally
we adjust ->batch for good cache coloring behavior.

Since we are no longer calculating the batch size by itself, it
is not simple to print it out in zone_pcp_init() during boot.
We, instead, print out the 'high' value.  If anyone really misses
this, they can surely just read /proc/zoneinfo after boot.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/page_alloc.c |   54 ++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff -puN mm/page_alloc.c~rename-zone_batchsize mm/page_alloc.c
--- linux.git/mm/page_alloc.c~rename-zone_batchsize	2013-10-15 09:57:07.597688692 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:07.602688914 -0700
@@ -4061,10 +4061,10 @@ static void __meminit zone_init_free_lis
 
 static int pcp_high_to_batch_ratio = 4;
 
-static int zone_batchsize(struct zone *zone)
+static int calculate_zone_pcp_high(struct zone *zone)
 {
 #ifdef CONFIG_MMU
-	int batch;
+	int high;
 
 	/*
 	 * The per-cpu-pages pools are set to around 1000th of the
@@ -4072,26 +4072,13 @@ static int zone_batchsize(struct zone *z
 	 *
 	 * OK, so we don't know how big the cache is.  So guess.
 	 */
-	batch = zone->managed_pages / 1024;
-	if (batch * PAGE_SIZE > 512 * 1024)
-		batch = (512 * 1024) / PAGE_SIZE;
-	batch /= pcp_high_to_batch_ratio;
-	if (batch < 1)
-		batch = 1;
-
-	/*
-	 * Clamp the batch to a 2^n - 1 value. Having a power
-	 * of 2 value was found to be more likely to have
-	 * suboptimal cache aliasing properties in some cases.
-	 *
-	 * For example if 2 tasks are alternately allocating
-	 * batches of pages, one task can end up with a lot
-	 * of pages of one half of the possible page colors
-	 * and the other with pages of the other colors.
-	 */
-	batch = rounddown_pow_of_two(batch + batch/2) - 1;
+	high = zone->managed_pages / 1024;
+	if (high * PAGE_SIZE > 512 * 1024)
+		high = (512 * 1024) / PAGE_SIZE;
+	if (high < 1)
+		high = 1;
 
-	return batch;
+	return high;
 
 #else
 	/* The deferral and batching of frees should be suppressed under NOMMU
@@ -4181,6 +4168,19 @@ static void pageset_setup_from_high_mark
 	unsigned long batch = max(1UL, high / pcp_high_to_batch_ratio);
 	if ((high / pcp_high_to_batch_ratio) > (PAGE_SHIFT * 8))
 		batch = PAGE_SHIFT * 8;
+	/*
+	 * Clamp the batch to a 2^n - 1 value. Having a power
+	 * of 2 value was found to be more likely to have
+	 * suboptimal cache aliasing properties in some cases.
+	 *
+	 * For example if 2 tasks are alternately allocating
+	 * batches of pages, one task can end up with a lot
+	 * of pages of one half of the possible page colors
+	 * and the other with pages of the other colors.
+	 */
+	batch = rounddown_pow_of_two(batch + batch/2) - 1;
+	if (!batch)
+		batch = 1;
 
 	pageset_update(&p->pcp, high, batch);
 }
@@ -4188,12 +4188,12 @@ static void pageset_setup_from_high_mark
 static void pageset_set_high_and_batch(struct zone *zone,
 		struct per_cpu_pageset *pcp)
 {
+	int high;
 	if (percpu_pagelist_fraction)
-		pageset_setup_from_high_mark(pcp,
-			(zone->managed_pages /
-				percpu_pagelist_fraction));
+		high = (zone->managed_pages / percpu_pagelist_fraction);
 	else
-		pageset_setup_from_batch_size(pcp, zone_batchsize(zone));
+		high = calculate_zone_pcp_high(zone);
+	pageset_setup_from_high_mark(pcp, high);
 }
 
 static void __meminit zone_pageset_init(struct zone *zone, int cpu)
@@ -4277,9 +4277,9 @@ static __meminit void zone_pcp_init(stru
 	zone->pageset = &boot_pageset;
 
 	if (zone->present_pages)
-		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%u\n",
+		printk(KERN_DEBUG "  %s zone: %lu pages, pcp high:%d\n",
 			zone->name, zone->present_pages,
-					 zone_batchsize(zone));
+					 calculate_zone_pcp_high(zone));
 }
 
 int __meminit init_currently_empty_zone(struct zone *zone,
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCH 8/8] mm: pcp: create setup_boot_pageset()
  2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
                   ` (6 preceding siblings ...)
  2013-10-15 20:35 ` [RFC][PATCH 7/8] mm: pcp: move page coloring optimization away from pcp sizing Dave Hansen
@ 2013-10-15 20:35 ` Dave Hansen
  7 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-15 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Cody P Schafer, Andi Kleen, cl, Andrew Morton,
	Mel Gorman, Dave Hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

pageset_setup_from_batch_size() has one remaining call path:

__build_all_zonelists()
	-> setup_pageset()
		-> pageset_setup_from_batch_size()

And that one path is specialized.  It is meant to essentially
turn off the per-cpu-pagelists.  It's also questionably buggy.
It sets up a ->batch=1, but ->high=0, when called with batch=0
which is contrary to the comments in there that say:

	->batch must never be higher then ->high.

This patch creates a new function, setup_boot_pageset().  This
just (more) directly sets ->high=1 and ->batch=1.  It is
functionally equiavlent to the existing (->high=0 and ->batch=1)
code since high is really only used like this:

	pcp->count++;
        if (pcp->count >= pcp->high) {
                free_pcppages_bulk(zone, batch, pcp);
                pcp->count -= batch;
        }

Looking at that if() above, if pcp->count=1, then

	if (pcp->count >= 1)
and
	if (pcp->count >= 0)

are equivalent, so it does not matter whether we set ->high=0
or ->high=1.  I just find it much more intuitive to have
->high=1 since ->high=0 _looks_ invalid at first.

Also note that this ends up net removing code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 linux.git-davehans/mm/page_alloc.c |   29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff -puN mm/page_alloc.c~setup_pageset-specialize mm/page_alloc.c
--- linux.git/mm/page_alloc.c~setup_pageset-specialize	2013-10-15 09:57:07.869700754 -0700
+++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:07.874700976 -0700
@@ -3703,7 +3703,7 @@ static void build_zonelist_cache(pg_data
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_boot_pageset(struct per_cpu_pageset *p);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static void setup_zone_pageset(struct zone *zone);
 
@@ -3750,7 +3750,7 @@ static int __build_all_zonelists(void *d
 	 * (a chicken-egg dilemma).
 	 */
 	for_each_possible_cpu(cpu) {
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+		setup_boot_pageset(&per_cpu(boot_pageset, cpu));
 
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 		/*
@@ -4125,20 +4125,6 @@ static void pageset_update(struct per_cp
 	pcp->batch = batch;
 }
 
-/*
- * Set the batch size for hot per_cpu_pagelist, and derive
- * the high water mark from the batch size.
- */
-static void pageset_setup_from_batch_size(struct per_cpu_pageset *p,
-					unsigned long batch)
-{
-	unsigned long high;
-	high = pcp_high_to_batch_ratio * batch;
-	if (!batch)
-		batch = 1;
-	pageset_update(&p->pcp, high, batch);
-}
-
 static void pageset_init(struct per_cpu_pageset *p)
 {
 	struct per_cpu_pages *pcp;
@@ -4152,10 +4138,17 @@ static void pageset_init(struct per_cpu_
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+/*
+ * Turn off per-cpu-pages until we have a the
+ * full percpu allocator up.
+ */
+static void setup_boot_pageset(struct per_cpu_pageset *p)
 {
+	unsigned long batch = 1;
+	unsigned long high = 1;
+
 	pageset_init(p);
-	pageset_setup_from_batch_size(p, batch);
+	pageset_update(&p->pcp, high, batch);
 }
 
 /*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions
  2013-10-15 20:35 ` [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions Dave Hansen
@ 2013-10-17  1:32   ` David Rientjes
  2013-10-17 16:11     ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2013-10-17  1:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Cody P Schafer, Andi Kleen, cl,
	Andrew Morton, Mel Gorman

On Tue, 15 Oct 2013, Dave Hansen wrote:

> diff -puN mm/page_alloc.c~rename-pageset-functions mm/page_alloc.c
> --- linux.git/mm/page_alloc.c~rename-pageset-functions	2013-10-15 09:57:05.870612107 -0700
> +++ linux.git-davehans/mm/page_alloc.c	2013-10-15 09:57:05.875612329 -0700
> @@ -4136,10 +4136,18 @@ static void pageset_update(struct per_cp
>  	pcp->batch = batch;
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +/*
> + * Set the batch size for hot per_cpu_pagelist, and derive
> + * the high water mark from the batch size.
> + */
> +static void pageset_setup_from_batch_size(struct per_cpu_pageset *p,
> +					unsigned long batch)
>  {
> -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> +	unsigned long high;
> +	high = 6 * batch;
> +	if (!batch)
> +		batch = 1;

high = 6 * batch should be here?

> +	pageset_update(&p->pcp, high, batch);
>  }
>  
>  static void pageset_init(struct per_cpu_pageset *p)
> @@ -4158,15 +4166,15 @@ static void pageset_init(struct per_cpu_
>  static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>  {
>  	pageset_init(p);
> -	pageset_set_batch(p, batch);
> +	pageset_setup_from_batch_size(p, batch);
>  }
>  
>  /*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> + * Set the high water mark for the per_cpu_pagelist, and derive
> + * the batch size from this high mark.
>   */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> +static void pageset_setup_from_high_mark(struct per_cpu_pageset *p,
> +					unsigned long high)
>  {
>  	unsigned long batch = max(1UL, high / 4);
>  	if ((high / 4) > (PAGE_SHIFT * 8))
> @@ -4179,11 +4187,11 @@ static void __meminit pageset_set_high_a
>  		struct per_cpu_pageset *pcp)
>  {
>  	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> +		pageset_setup_from_high_mark(pcp,
>  			(zone->managed_pages /
>  				percpu_pagelist_fraction));
>  	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> +		pageset_setup_from_batch_size(pcp, zone_batchsize(zone));
>  }
>  
>  static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> @@ -5781,8 +5789,9 @@ int percpu_pagelist_fraction_sysctl_hand
>  		unsigned long  high;
>  		high = zone->managed_pages / percpu_pagelist_fraction;
>  		for_each_possible_cpu(cpu)
> -			pageset_set_high(per_cpu_ptr(zone->pageset, cpu),
> -					 high);
> +			pageset_setup_from_high_mark(
> +					per_cpu_ptr(zone->pageset, cpu),
> +					high);
>  	}
>  	mutex_unlock(&pcp_batch_high_lock);
>  	return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions
  2013-10-17  1:32   ` David Rientjes
@ 2013-10-17 16:11     ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-10-17 16:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Cody P Schafer, Andi Kleen, cl,
	Andrew Morton, Mel Gorman

On 10/16/2013 06:32 PM, David Rientjes wrote:
>> > +static void pageset_setup_from_batch_size(struct per_cpu_pageset *p,
>> > +					unsigned long batch)
>> >  {
>> > -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
>> > +	unsigned long high;
>> > +	high = 6 * batch;
>> > +	if (!batch)
>> > +		batch = 1;
> high = 6 * batch should be here?

Ahh, nice catch, thanks.  I'll fix that up and resend.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-10-17 16:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-15 20:35 [RFC][PATCH 0/8] mm: freshen percpu pageset code Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 1/8] mm: pcp: rename percpu pageset functions Dave Hansen
2013-10-17  1:32   ` David Rientjes
2013-10-17 16:11     ` Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 2/8] mm: pcp: consolidate percpu_pagelist_fraction code Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 3/8] mm: pcp: separate pageset update code from sysctl code Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 4/8] mm: pcp: move pageset sysctl code to sysctl.c Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 5/8] mm: pcp: make percpu_pagelist_fraction sysctl undoable Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 6/8] mm: pcp: consolidate high-to-batch ratio code Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 7/8] mm: pcp: move page coloring optimization away from pcp sizing Dave Hansen
2013-10-15 20:35 ` [RFC][PATCH 8/8] mm: pcp: create setup_boot_pageset() Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).