linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-09 23:28 Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 01/10] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

"Problems" with the current code:
 1. there is a lack of synchronization in setting ->high and ->batch in percpu_pagelist_fraction_sysctl_handler()
 2. stop_machine() in zone_pcp_update() is unnecissary.
 3. zone_pcp_update() does not consider the case where percpu_pagelist_fraction is non-zero

To fix:
 1. add memory barriers, a safe ->batch value, and an update side mutex when updating ->high and ->batch
 2. avoid draining pages in zone_pcp_update(), rely upon the memory barriers added to fix #1
 3. factor out quite a few functions, and then call the appropriate one.

Note that it results in a change to the behavior of zone_pcp_update(), which is
used by memory_hotplug. I'm rather certain that I've diserned (and preserved)
the essential behavior (changing ->high and ->batch), and only eliminated
unneeded actions (draining the per cpu pages), but this may not be the case.

Further note that the draining of pages that previously took place in
zone_pcp_update() occured after repeated draining when attempting to offline a
page, and after the offline has "succeeded". It appears that the draining was
added to zone_pcp_update() to avoid refactoring setup_pageset() into 2
funtions.

--
Changes since v1:

 - instead of using on_each_cpu(), use memory barriers (Gilad) and an update side mutex.
 - add "Problem" #3 above, and fix.
 - rename function to match naming style of similar function
 - move unrelated comment

Cody P Schafer (10):
  mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high
  mm/page_alloc: insert memory barriers to allow async update of pcp
    batch and high
  mm/page_alloc: convert zone_pcp_update() to rely on memory barriers
    instead of stop_machine()
  mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly
    recalulate high
  mm/page_alloc: factor setup_pageset() into pageset_init() and
    pageset_set_batch()
  mm/page_alloc: relocate comment to be directly above code it refers
    to.
  mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()
  mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()
  mm/page_alloc: rename setup_pagelist_highmark() to match naming of
    pageset_set_batch()

 mm/page_alloc.c | 124 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 73 insertions(+), 51 deletions(-)

-- 
1.8.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 01/10] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 02/10] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Creates pageset_set_batch() for use in setup_pageset().
pageset_set_batch() imitates the functionality of
setup_pagelist_highmark(), but uses the boot time
(percpu_pagelist_fraction == 0) calculations for determining ->high
based on ->batch.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..5877cf0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4004,6 +4004,14 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+/* a companion to setup_pagelist_highmark() */
+static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+{
+	struct per_cpu_pages *pcp = &p->pcp;
+	pcp->high = 6 * batch;
+	pcp->batch = max(1UL, 1 * batch);
+}
+
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -4013,8 +4021,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
+	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
@@ -4023,7 +4030,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-
 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 				unsigned long high)
 {
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 02/10] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 01/10] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Because we are going to rely upon a careful transision between old and
new ->high and ->batch values using memory barriers and will remove
stop_machine(), we need to prevent multiple updaters from interweaving
their memory writes.

Add a simple mutex to protect both update loops.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5877cf0..d259599 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -64,6 +64,9 @@
 #include <asm/div64.h>
 #include "internal.h"
 
+/* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
+static DEFINE_MUTEX(pcp_batch_high_lock);
+
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -5491,6 +5494,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
 	if (!write || (ret < 0))
 		return ret;
+
+	mutex_lock(&pcp_batch_high_lock);
 	for_each_populated_zone(zone) {
 		for_each_possible_cpu(cpu) {
 			unsigned long  high;
@@ -5499,6 +5504,7 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 				per_cpu_ptr(zone->pageset, cpu), high);
 		}
 	}
+	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
 }
 
@@ -6012,7 +6018,9 @@ static int __meminit __zone_pcp_update(void *data)
 
 void __meminit zone_pcp_update(struct zone *zone)
 {
+	mutex_lock(&pcp_batch_high_lock);
 	stop_machine(__zone_pcp_update, zone, NULL);
+	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
 
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 01/10] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 02/10] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-10  6:19   ` Gilad Ben-Yossef
  2013-04-09 23:28 ` [PATCH v2 04/10] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

In pageset_set_batch() and setup_pagelist_highmark(), ensure that batch
is always set to a safe value (1) prior to updating high, and ensure
that high is fully updated before setting the real value of batch.

Suggested by Gilad Ben-Yossef <gilad@benyossef.com> in this thread:

	https://lkml.org/lkml/2013/4/9/23

Also reproduces his proposed comment.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d259599..a07bd4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4007,11 +4007,26 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+static void pageset_update_prep(struct per_cpu_pages *pcp)
+{
+	/*
+	 * We're about to mess with PCP in an non atomic fashion.  Put an
+	 * intermediate safe value of batch and make sure it is visible before
+	 * any other change
+	 */
+	pcp->batch = 1;
+	smp_wmb();
+}
+
 /* a companion to setup_pagelist_highmark() */
 static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp = &p->pcp;
+	pageset_update_prep(pcp);
+
 	pcp->high = 6 * batch;
+	smp_wmb();
+
 	pcp->batch = max(1UL, 1 * batch);
 }
 
@@ -4039,7 +4054,11 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 	struct per_cpu_pages *pcp;
 
 	pcp = &p->pcp;
+	pageset_update_prep(pcp);
+
 	pcp->high = high;
+	smp_wmb();
+
 	pcp->batch = max(1UL, high/4);
 	if ((high/4) > (PAGE_SHIFT * 8))
 		pcp->batch = PAGE_SHIFT * 8;
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 04/10] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine()
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (2 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 05/10] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
percpu pageset based on a zone's ->managed_pages. We don't need to drain
the entire percpu pageset just to modify these fields.

This lets us avoid calling setup_pageset() (and the draining required to
call it) and instead allows simply setting the fields' values (with some
attention paid to memory barriers to prevent the relationship between
->batch and ->high from being thrown off).

This does change the behavior of zone_pcp_update() as the percpu
pagesets will not be drained when zone_pcp_update() is called (they will
end up being shrunk, not completely drained, later when a 0-order page
is freed in free_hot_cold_page()).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a07bd4c..4a03c56 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6012,33 +6012,18 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit __zone_pcp_update(void *data)
-{
-	struct zone *zone = data;
-	int cpu;
-	unsigned long batch = zone_batchsize(zone), flags;
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pset;
-		struct per_cpu_pages *pcp;
-
-		pset = per_cpu_ptr(zone->pageset, cpu);
-		pcp = &pset->pcp;
-
-		local_irq_save(flags);
-		if (pcp->count > 0)
-			free_pcppages_bulk(zone, pcp->count, pcp);
-		drain_zonestat(zone, pset);
-		setup_pageset(pset, batch);
-		local_irq_restore(flags);
-	}
-	return 0;
-}
-
+/*
+ * The zone indicated has a new number of managed_pages; batch sizes and percpu
+ * page high values need to be recalulated.
+ */
 void __meminit zone_pcp_update(struct zone *zone)
 {
+	unsigned cpu;
+	unsigned long batch;
 	mutex_lock(&pcp_batch_high_lock);
-	stop_machine(__zone_pcp_update, zone, NULL);
+	batch = zone_batchsize(zone);
+	for_each_possible_cpu(cpu)
+		pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 05/10] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (3 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 04/10] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 06/10] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Simply moves calculation of the new 'high' value outside the
for_each_possible_cpu() loop, as it does not depend on the cpu.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a03c56..50a277a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5502,7 +5502,6 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
  * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
  * can have before it gets flushed back to buddy allocator.
  */
-
 int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5516,12 +5515,11 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 
 	mutex_lock(&pcp_batch_high_lock);
 	for_each_populated_zone(zone) {
-		for_each_possible_cpu(cpu) {
-			unsigned long  high;
-			high = zone->managed_pages / percpu_pagelist_fraction;
+		unsigned long  high;
+		high = zone->managed_pages / percpu_pagelist_fraction;
+		for_each_possible_cpu(cpu)
 			setup_pagelist_highmark(
-				per_cpu_ptr(zone->pageset, cpu), high);
-		}
+					per_cpu_ptr(zone->pageset, cpu), high);
 	}
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 06/10] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch()
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (4 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 05/10] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 07/10] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 50a277a..a2f2207 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4030,7 +4030,7 @@ static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 	pcp->batch = max(1UL, 1 * batch);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_init(struct per_cpu_pageset *p)
 {
 	struct per_cpu_pages *pcp;
 	int migratetype;
@@ -4039,11 +4039,16 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
+static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+{
+	pageset_init(p);
+	pageset_set_batch(p, batch);
+}
+
 /*
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 07/10] mm/page_alloc: relocate comment to be directly above code it refers to.
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (5 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 06/10] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 08/10] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2f2207..6e52e67 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3680,12 +3680,12 @@ void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone)
 		mminit_verify_zonelist();
 		cpuset_init_current_mems_allowed();
 	} else {
-		/* we have to stop all cpus to guarantee there is no user
-		   of zonelist */
 #ifdef CONFIG_MEMORY_HOTPLUG
 		if (zone)
 			setup_zone_pageset(zone);
 #endif
+		/* we have to stop all cpus to guarantee there is no user
+		   of zonelist */
 		stop_machine(__build_all_zonelists, pgdat, NULL);
 		/* cpuset refresh routine should be here */
 	}
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 08/10] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset()
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (6 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 07/10] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 09/10] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 10/10] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e52e67..c663e62 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4069,22 +4069,25 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 		pcp->batch = PAGE_SHIFT * 8;
 }
 
+static void __meminit zone_pageset_init(struct zone *zone, int cpu)
+{
+	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+
+	pageset_init(pcp);
+	if (percpu_pagelist_fraction)
+		setup_pagelist_highmark(pcp,
+			(zone->managed_pages /
+				percpu_pagelist_fraction));
+	else
+		pageset_set_batch(pcp, zone_batchsize(zone));
+}
+
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
-
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
-
-		setup_pageset(pcp, zone_batchsize(zone));
-
-		if (percpu_pagelist_fraction)
-			setup_pagelist_highmark(pcp,
-				(zone->managed_pages /
-					percpu_pagelist_fraction));
-	}
+	for_each_possible_cpu(cpu)
+		zone_pageset_init(zone, cpu);
 }
 
 /*
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 09/10] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init()
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (7 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 08/10] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  2013-04-09 23:28 ` [PATCH v2 10/10] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Previously, zone_pcp_update() called pageset_set_batch() directly,
essentially assuming that percpu_pagelist_fraction == 0. Correct this by
calling zone_pageset_init(), which chooses the appropriate ->batch and
->high calculations.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c663e62..334387e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6025,11 +6025,9 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 void __meminit zone_pcp_update(struct zone *zone)
 {
 	unsigned cpu;
-	unsigned long batch;
 	mutex_lock(&pcp_batch_high_lock);
-	batch = zone_batchsize(zone);
 	for_each_possible_cpu(cpu)
-		pageset_set_batch(per_cpu_ptr(zone->pageset, cpu), batch);
+		zone_pageset_init(zone, cpu);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 10/10] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch()
  2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
                   ` (8 preceding siblings ...)
  2013-04-09 23:28 ` [PATCH v2 09/10] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
@ 2013-04-09 23:28 ` Cody P Schafer
  9 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-09 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gilad Ben-Yossef, Simon Jeons, KOSAKI Motohiro, Mel Gorman,
	Linux MM, LKML, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 334387e..66b8bc2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4018,7 +4018,7 @@ static void pageset_update_prep(struct per_cpu_pages *pcp)
 	smp_wmb();
 }
 
-/* a companion to setup_pagelist_highmark() */
+/* a companion to pageset_set_high() */
 static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp = &p->pcp;
@@ -4050,10 +4050,10 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 }
 
 /*
- * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
+ * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-static void setup_pagelist_highmark(struct per_cpu_pageset *p,
+static void pageset_set_high(struct per_cpu_pageset *p,
 				unsigned long high)
 {
 	struct per_cpu_pages *pcp;
@@ -4075,7 +4075,7 @@ static void __meminit zone_pageset_init(struct zone *zone, int cpu)
 
 	pageset_init(pcp);
 	if (percpu_pagelist_fraction)
-		setup_pagelist_highmark(pcp,
+		pageset_set_high(pcp,
 			(zone->managed_pages /
 				percpu_pagelist_fraction));
 	else
@@ -5526,8 +5526,8 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 		unsigned long  high;
 		high = zone->managed_pages / percpu_pagelist_fraction;
 		for_each_possible_cpu(cpu)
-			setup_pagelist_highmark(
-					per_cpu_ptr(zone->pageset, cpu), high);
+			pageset_set_high(per_cpu_ptr(zone->pageset, cpu),
+					 high);
 	}
 	mutex_unlock(&pcp_batch_high_lock);
 	return 0;
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high
  2013-04-09 23:28 ` [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
@ 2013-04-10  6:19   ` Gilad Ben-Yossef
  2013-04-10  6:22     ` Gilad Ben-Yossef
  0 siblings, 1 reply; 14+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-10  6:19 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Andrew Morton, Simon Jeons, KOSAKI Motohiro, Mel Gorman, Linux MM, LKML

On Wed, Apr 10, 2013 at 2:28 AM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
> In pageset_set_batch() and setup_pagelist_highmark(), ensure that batch
> is always set to a safe value (1) prior to updating high, and ensure
> that high is fully updated before setting the real value of batch.
>
> Suggested by Gilad Ben-Yossef <gilad@benyossef.com> in this thread:
>
>         https://lkml.org/lkml/2013/4/9/23
>
> Also reproduces his proposed comment.
>
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  mm/page_alloc.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d259599..a07bd4c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4007,11 +4007,26 @@ static int __meminit zone_batchsize(struct zone *zone)
>  #endif
>  }
>
> +static void pageset_update_prep(struct per_cpu_pages *pcp)
> +{
> +       /*
> +        * We're about to mess with PCP in an non atomic fashion.  Put an
> +        * intermediate safe value of batch and make sure it is visible before
> +        * any other change
> +        */
> +       pcp->batch = 1;
> +       smp_wmb();
> +}
> +
>  /* a companion to setup_pagelist_highmark() */
>  static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
>  {
>         struct per_cpu_pages *pcp = &p->pcp;
> +       pageset_update_prep(pcp);
> +
>         pcp->high = 6 * batch;
> +       smp_wmb();
> +
>         pcp->batch = max(1UL, 1 * batch);
>  }
>
> @@ -4039,7 +4054,11 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>         struct per_cpu_pages *pcp;
>
>         pcp = &p->pcp;
> +       pageset_update_prep(pcp);
> +
>         pcp->high = high;
> +       smp_wmb();
> +
>         pcp->batch = max(1UL, high/4);
>         if ((high/4) > (PAGE_SHIFT * 8))
>                 pcp->batch = PAGE_SHIFT * 8;
> --
> 1.8.2
>

That is very good.
However, now we've created a "protocol" for updating ->high and ->batch:

1. Call pageset_update_prep(pcp)
2. Update ->high
3. smp_wmb()
4. Update ->batch

But that protocol is not documented anywhere and someone  that reads
the code two
years from now will not be aware of it or why it is done like that.

How about if we create:

/*
* pcp->high and pcp->batch values are related and dependent on one another:
* ->batch must never be higher then ->high.
* The following function updates them in a safe manner without a
costly atomic transaction.
*/
static void pageset_update(struct per_cpu_pages *pcp, unsigned int
high, unsigned int batch)
{
       /* start with a fail safe value for batch */
       pcp->batch = 1;
       smp_wmb();

       /* Update high, then batch, in order */
       pcp->high = high;
       smp_wmb();
       pcp->batch = batch;
}

And use that at the update sites? then the protocol becomes explicit.

Thanks,
Gilad.

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high
  2013-04-10  6:19   ` Gilad Ben-Yossef
@ 2013-04-10  6:22     ` Gilad Ben-Yossef
  2013-04-10 18:31       ` Cody P Schafer
  0 siblings, 1 reply; 14+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-10  6:22 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Andrew Morton, Simon Jeons, KOSAKI Motohiro, Mel Gorman, Linux MM, LKML

On Wed, Apr 10, 2013 at 9:19 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> On Wed, Apr 10, 2013 at 2:28 AM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>> In pageset_set_batch() and setup_pagelist_highmark(), ensure that batch
>> is always set to a safe value (1) prior to updating high, and ensure
>> that high is fully updated before setting the real value of batch.
>>
>> Suggested by Gilad Ben-Yossef <gilad@benyossef.com> in this thread:
>>
>>         https://lkml.org/lkml/2013/4/9/23
>>
>> Also reproduces his proposed comment.
>>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>  mm/page_alloc.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d259599..a07bd4c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4007,11 +4007,26 @@ static int __meminit zone_batchsize(struct zone *zone)
>>  #endif
>>  }
>>
>> +static void pageset_update_prep(struct per_cpu_pages *pcp)
>> +{
>> +       /*
>> +        * We're about to mess with PCP in an non atomic fashion.  Put an
>> +        * intermediate safe value of batch and make sure it is visible before
>> +        * any other change
>> +        */
>> +       pcp->batch = 1;
>> +       smp_wmb();
>> +}
>> +
>>  /* a companion to setup_pagelist_highmark() */
>>  static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
>>  {
>>         struct per_cpu_pages *pcp = &p->pcp;
>> +       pageset_update_prep(pcp);
>> +
>>         pcp->high = 6 * batch;
>> +       smp_wmb();
>> +
>>         pcp->batch = max(1UL, 1 * batch);
>>  }
>>
>> @@ -4039,7 +4054,11 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>>         struct per_cpu_pages *pcp;
>>
>>         pcp = &p->pcp;
>> +       pageset_update_prep(pcp);
>> +
>>         pcp->high = high;
>> +       smp_wmb();
>> +
>>         pcp->batch = max(1UL, high/4);
>>         if ((high/4) > (PAGE_SHIFT * 8))
>>                 pcp->batch = PAGE_SHIFT * 8;
>> --
>> 1.8.2
>>
>
> That is very good.
> However, now we've created a "protocol" for updating ->high and ->batch:
>
> 1. Call pageset_update_prep(pcp)
> 2. Update ->high
> 3. smp_wmb()
> 4. Update ->batch
>
> But that protocol is not documented anywhere and someone  that reads
> the code two
> years from now will not be aware of it or why it is done like that.
>
> How about if we create:
>
> /*
> * pcp->high and pcp->batch values are related and dependent on one another:
> * ->batch must never be higher then ->high.
> * The following function updates them in a safe manner without a
> costly atomic transaction.
> */
> static void pageset_update(struct per_cpu_pages *pcp, unsigned int
> high, unsigned int batch)
> {
>        /* start with a fail safe value for batch */
>        pcp->batch = 1;
>        smp_wmb();
>
>        /* Update high, then batch, in order */
>        pcp->high = high;
>        smp_wmb();
>        pcp->batch = batch;
> }
>
> And use that at the update sites? then the protocol becomes explicit.

Oh, and other then that it looks good to me, so assuming you do that -

Reviewed-By: Gilad Ben-Yossef <gilad@benyossef.com>

Many thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high
  2013-04-10  6:22     ` Gilad Ben-Yossef
@ 2013-04-10 18:31       ` Cody P Schafer
  0 siblings, 0 replies; 14+ messages in thread
From: Cody P Schafer @ 2013-04-10 18:31 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: Andrew Morton, Simon Jeons, KOSAKI Motohiro, Mel Gorman, Linux MM, LKML

On 04/09/2013 11:22 PM, Gilad Ben-Yossef wrote:
> On Wed, Apr 10, 2013 at 9:19 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>> On Wed, Apr 10, 2013 at 2:28 AM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>>> In pageset_set_batch() and setup_pagelist_highmark(), ensure that batch
>>> is always set to a safe value (1) prior to updating high, and ensure
>>> that high is fully updated before setting the real value of batch.
>>>
>>> Suggested by Gilad Ben-Yossef <gilad@benyossef.com> in this thread:
>>>
>>>          https://lkml.org/lkml/2013/4/9/23
>>>
>>> Also reproduces his proposed comment.
>>>
>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>> ---
>>>   mm/page_alloc.c | 19 +++++++++++++++++++
>>>   1 file changed, 19 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index d259599..a07bd4c 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -4007,11 +4007,26 @@ static int __meminit zone_batchsize(struct zone *zone)
>>>   #endif
>>>   }
>>>
>>> +static void pageset_update_prep(struct per_cpu_pages *pcp)
>>> +{
>>> +       /*
>>> +        * We're about to mess with PCP in an non atomic fashion.  Put an
>>> +        * intermediate safe value of batch and make sure it is visible before
>>> +        * any other change
>>> +        */
>>> +       pcp->batch = 1;
>>> +       smp_wmb();
>>> +}
>>> +
>>>   /* a companion to setup_pagelist_highmark() */
>>>   static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
>>>   {
>>>          struct per_cpu_pages *pcp = &p->pcp;
>>> +       pageset_update_prep(pcp);
>>> +
>>>          pcp->high = 6 * batch;
>>> +       smp_wmb();
>>> +
>>>          pcp->batch = max(1UL, 1 * batch);
>>>   }
>>>
>>> @@ -4039,7 +4054,11 @@ static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>>>          struct per_cpu_pages *pcp;
>>>
>>>          pcp = &p->pcp;
>>> +       pageset_update_prep(pcp);
>>> +
>>>          pcp->high = high;
>>> +       smp_wmb();
>>> +
>>>          pcp->batch = max(1UL, high/4);
>>>          if ((high/4) > (PAGE_SHIFT * 8))
>>>                  pcp->batch = PAGE_SHIFT * 8;
>>> --
>>> 1.8.2
>>>
>>
>> That is very good.
>> However, now we've created a "protocol" for updating ->high and ->batch:
>>
>> 1. Call pageset_update_prep(pcp)
>> 2. Update ->high
>> 3. smp_wmb()
>> 4. Update ->batch
>>
>> But that protocol is not documented anywhere and someone  that reads
>> the code two
>> years from now will not be aware of it or why it is done like that.
>>
>> How about if we create:
>>
>> /*
>> * pcp->high and pcp->batch values are related and dependent on one another:
>> * ->batch must never be higher then ->high.
>> * The following function updates them in a safe manner without a
>> costly atomic transaction.
>> */
>> static void pageset_update(struct per_cpu_pages *pcp, unsigned int
>> high, unsigned int batch)
>> {
>>         /* start with a fail safe value for batch */
>>         pcp->batch = 1;
>>         smp_wmb();
>>
>>         /* Update high, then batch, in order */
>>         pcp->high = high;
>>         smp_wmb();
>>         pcp->batch = batch;
>> }
>>
>> And use that at the update sites? then the protocol becomes explicit.

Yep, this looks like exactly the right thing.

>
> Oh, and other then that it looks good to me, so assuming you do that -
>
> Reviewed-By: Gilad Ben-Yossef <gilad@benyossef.com>

I've added it only to the patch with pageset_update() in it, if you 
meant to apply it to more patches, feel free to reply to the v3 posting.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-04-10 18:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-09 23:28 [PATCH v2 00/10] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 01/10] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 02/10] mm/page_alloc: prevent concurrent updaters of pcp ->batch and ->high Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 03/10] mm/page_alloc: insert memory barriers to allow async update of pcp batch and high Cody P Schafer
2013-04-10  6:19   ` Gilad Ben-Yossef
2013-04-10  6:22     ` Gilad Ben-Yossef
2013-04-10 18:31       ` Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 04/10] mm/page_alloc: convert zone_pcp_update() to rely on memory barriers instead of stop_machine() Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 05/10] mm/page_alloc: when handling percpu_pagelist_fraction, don't unneedly recalulate high Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 06/10] mm/page_alloc: factor setup_pageset() into pageset_init() and pageset_set_batch() Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 07/10] mm/page_alloc: relocate comment to be directly above code it refers to Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 08/10] mm/page_alloc: factor zone_pageset_init() out of setup_zone_pageset() Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 09/10] mm/page_alloc: in zone_pcp_update(), uze zone_pageset_init() Cody P Schafer
2013-04-09 23:28 ` [PATCH v2 10/10] mm/page_alloc: rename setup_pagelist_highmark() to match naming of pageset_set_batch() Cody P Schafer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).