All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-05 20:33 ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML

In one case while modifying the ->high and ->batch fields of per cpu pagesets
we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
syncronization at all (patch 3).

This patchset fixes both of them.

Note that it results in a change to the behavior of zone_pcp_update(), which is
used by memory_hotplug. I _think_ that I've diserned (and preserved) the
essential behavior (changing ->high and ->batch), and only eliminated unneeded
actions (draining the per cpu pages), but this may not be the case.

--
 mm/page_alloc.c | 63 +++++++++++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 33 deletions(-)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-05 20:33 ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML

In one case while modifying the ->high and ->batch fields of per cpu pagesets
we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
syncronization at all (patch 3).

This patchset fixes both of them.

Note that it results in a change to the behavior of zone_pcp_update(), which is
used by memory_hotplug. I _think_ that I've diserned (and preserved) the
essential behavior (changing ->high and ->batch), and only eliminated unneeded
actions (draining the per cpu pages), but this may not be the case.

--
 mm/page_alloc.c | 63 +++++++++++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-04-05 20:33 ` Cody P Schafer
@ 2013-04-05 20:33   ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

Creates pageset_set_batch() for use in setup_pageset().
pageset_set_batch() imitates the functionality of
setup_pagelist_highmark(), but uses the boot time
(percpu_pagelist_fraction == 0) calculations for determining ->high
based on ->batch.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..5877cf0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4004,6 +4004,14 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+/* a companion to setup_pagelist_highmark() */
+static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+{
+	struct per_cpu_pages *pcp = &p->pcp;
+	pcp->high = 6 * batch;
+	pcp->batch = max(1UL, 1 * batch);
+}
+
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -4013,8 +4021,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
+	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
@@ -4023,7 +4030,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-
 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 				unsigned long high)
 {
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
@ 2013-04-05 20:33   ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

Creates pageset_set_batch() for use in setup_pageset().
pageset_set_batch() imitates the functionality of
setup_pagelist_highmark(), but uses the boot time
(percpu_pagelist_fraction == 0) calculations for determining ->high
based on ->batch.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..5877cf0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4004,6 +4004,14 @@ static int __meminit zone_batchsize(struct zone *zone)
 #endif
 }
 
+/* a companion to setup_pagelist_highmark() */
+static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+{
+	struct per_cpu_pages *pcp = &p->pcp;
+	pcp->high = 6 * batch;
+	pcp->batch = max(1UL, 1 * batch);
+}
+
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -4013,8 +4021,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 
 	pcp = &p->pcp;
 	pcp->count = 0;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
+	pageset_set_batch(p, batch);
 	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
@@ -4023,7 +4030,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
  * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
  * to the value high for the pageset p.
  */
-
 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
 				unsigned long high)
 {
-- 
1.8.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-05 20:33 ` Cody P Schafer
@ 2013-04-05 20:33   ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

No off-cpu users of the percpu pagesets exist.

zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
percpu pageset based on a zone's ->managed_pages. We don't need to drain
the entire percpu pageset just to modify these fields. Avoid calling
setup_pageset() (and the draining required to call it) and instead just
set the fields' values.

This does change the behavior of zone_pcp_update() as the percpu
pagesets will not be drained when zone_pcp_update() is called (they will
end up being shrunk, not completely drained, later when a 0-order page
is freed in free_hot_cold_page()).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5877cf0..48f2faa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5987,32 +5987,22 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit __zone_pcp_update(void *data)
+static void __meminit __zone_pcp_update(void *data)
 {
 	struct zone *zone = data;
-	int cpu;
-	unsigned long batch = zone_batchsize(zone), flags;
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pset;
-		struct per_cpu_pages *pcp;
-
-		pset = per_cpu_ptr(zone->pageset, cpu);
-		pcp = &pset->pcp;
-
-		local_irq_save(flags);
-		if (pcp->count > 0)
-			free_pcppages_bulk(zone, pcp->count, pcp);
-		drain_zonestat(zone, pset);
-		setup_pageset(pset, batch);
-		local_irq_restore(flags);
-	}
-	return 0;
+	unsigned long batch = zone_batchsize(zone);
+	struct per_cpu_pageset *pset =
+		per_cpu_ptr(zone->pageset, smp_processor_id());
+	pageset_set_batch(pset, batch);
 }
 
+/*
+ * The zone indicated has a new number of managed_pages; batch sizes and percpu
+ * page high values need to be recalulated.
+ */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	stop_machine(__zone_pcp_update, zone, NULL);
+	on_each_cpu(__zone_pcp_update, zone, true);
 }
 #endif
 
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-05 20:33   ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

No off-cpu users of the percpu pagesets exist.

zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
percpu pageset based on a zone's ->managed_pages. We don't need to drain
the entire percpu pageset just to modify these fields. Avoid calling
setup_pageset() (and the draining required to call it) and instead just
set the fields' values.

This does change the behavior of zone_pcp_update() as the percpu
pagesets will not be drained when zone_pcp_update() is called (they will
end up being shrunk, not completely drained, later when a 0-order page
is freed in free_hot_cold_page()).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5877cf0..48f2faa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5987,32 +5987,22 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __meminit __zone_pcp_update(void *data)
+static void __meminit __zone_pcp_update(void *data)
 {
 	struct zone *zone = data;
-	int cpu;
-	unsigned long batch = zone_batchsize(zone), flags;
-
-	for_each_possible_cpu(cpu) {
-		struct per_cpu_pageset *pset;
-		struct per_cpu_pages *pcp;
-
-		pset = per_cpu_ptr(zone->pageset, cpu);
-		pcp = &pset->pcp;
-
-		local_irq_save(flags);
-		if (pcp->count > 0)
-			free_pcppages_bulk(zone, pcp->count, pcp);
-		drain_zonestat(zone, pset);
-		setup_pageset(pset, batch);
-		local_irq_restore(flags);
-	}
-	return 0;
+	unsigned long batch = zone_batchsize(zone);
+	struct per_cpu_pageset *pset =
+		per_cpu_ptr(zone->pageset, smp_processor_id());
+	pageset_set_batch(pset, batch);
 }
 
+/*
+ * The zone indicated has a new number of managed_pages; batch sizes and percpu
+ * page high values need to be recalulated.
+ */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	stop_machine(__zone_pcp_update, zone, NULL);
+	on_each_cpu(__zone_pcp_update, zone, true);
 }
 #endif
 
-- 
1.8.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-05 20:33 ` Cody P Schafer
@ 2013-04-05 20:33   ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

In free_hot_cold_page(), we rely on pcp->batch remaining stable.
Updating it without being on the cpu owning the percpu pageset
potentially destroys this stability.

Change for_each_cpu() to on_each_cpu() to fix.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48f2faa..507db31 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5475,30 +5475,31 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+static void _zone_set_pageset_highmark(void *data)
+{
+	struct zone *zone = data;
+	unsigned long  high;
+	high = zone->managed_pages / percpu_pagelist_fraction;
+	setup_pagelist_highmark(
+			per_cpu_ptr(zone->pageset, smp_processor_id()), high);
+}
+
 /*
  * percpu_pagelist_fraction - changes the pcp->high for each zone on each
  * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
  * can have before it gets flushed back to buddy allocator.
  */
-
 int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
 	struct zone *zone;
-	unsigned int cpu;
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
 	if (!write || (ret < 0))
 		return ret;
-	for_each_populated_zone(zone) {
-		for_each_possible_cpu(cpu) {
-			unsigned long  high;
-			high = zone->managed_pages / percpu_pagelist_fraction;
-			setup_pagelist_highmark(
-				per_cpu_ptr(zone->pageset, cpu), high);
-		}
-	}
+	for_each_populated_zone(zone)
+		on_each_cpu(_zone_set_pageset_highmark, zone, true);
 	return 0;
 }
 
-- 
1.8.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-05 20:33   ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-05 20:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Linux MM, LKML, Cody P Schafer

In free_hot_cold_page(), we rely on pcp->batch remaining stable.
Updating it without being on the cpu owning the percpu pageset
potentially destroys this stability.

Change for_each_cpu() to on_each_cpu() to fix.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48f2faa..507db31 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5475,30 +5475,31 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+static void _zone_set_pageset_highmark(void *data)
+{
+	struct zone *zone = data;
+	unsigned long  high;
+	high = zone->managed_pages / percpu_pagelist_fraction;
+	setup_pagelist_highmark(
+			per_cpu_ptr(zone->pageset, smp_processor_id()), high);
+}
+
 /*
  * percpu_pagelist_fraction - changes the pcp->high for each zone on each
  * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
  * can have before it gets flushed back to buddy allocator.
  */
-
 int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
 	struct zone *zone;
-	unsigned int cpu;
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
 	if (!write || (ret < 0))
 		return ret;
-	for_each_populated_zone(zone) {
-		for_each_possible_cpu(cpu) {
-			unsigned long  high;
-			high = zone->managed_pages / percpu_pagelist_fraction;
-			setup_pagelist_highmark(
-				per_cpu_ptr(zone->pageset, cpu), high);
-		}
-	}
+	for_each_populated_zone(zone)
+		on_each_cpu(_zone_set_pageset_highmark, zone, true);
 	return 0;
 }
 
-- 
1.8.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-05 20:33 ` Cody P Schafer
@ 2013-04-07  1:32   ` Simon Jeons
  -1 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:32 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> In one case while modifying the ->high and ->batch fields of per cpu pagesets
> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
> syncronization at all (patch 3).

Do you mean stop_machine() is used for syncronization between each 
online cpu?

>
> This patchset fixes both of them.
>
> Note that it results in a change to the behavior of zone_pcp_update(), which is
> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
> essential behavior (changing ->high and ->batch), and only eliminated unneeded
> actions (draining the per cpu pages), but this may not be the case.
>
> --
>   mm/page_alloc.c | 63 +++++++++++++++++++++++++++------------------------------
>   1 file changed, 30 insertions(+), 33 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-07  1:32   ` Simon Jeons
  0 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:32 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> In one case while modifying the ->high and ->batch fields of per cpu pagesets
> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
> syncronization at all (patch 3).

Do you mean stop_machine() is used for syncronization between each 
online cpu?

>
> This patchset fixes both of them.
>
> Note that it results in a change to the behavior of zone_pcp_update(), which is
> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
> essential behavior (changing ->high and ->batch), and only eliminated unneeded
> actions (draining the per cpu pages), but this may not be the case.
>
> --
>   mm/page_alloc.c | 63 +++++++++++++++++++++++++++------------------------------
>   1 file changed, 30 insertions(+), 33 deletions(-)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-04-05 20:33   ` Cody P Schafer
@ 2013-04-07  1:37     ` Simon Jeons
  -1 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:37 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> Creates pageset_set_batch() for use in setup_pageset().
> pageset_set_batch() imitates the functionality of
> setup_pagelist_highmark(), but uses the boot time
> (percpu_pagelist_fraction == 0) calculations for determining ->high

Why need adjust pcp->high, pcp->batch during system running? What's the 
requirement?

> based on ->batch.
>
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>   mm/page_alloc.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..5877cf0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4004,6 +4004,14 @@ static int __meminit zone_batchsize(struct zone *zone)
>   #endif
>   }
>   
> +/* a companion to setup_pagelist_highmark() */
> +static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +{
> +	struct per_cpu_pages *pcp = &p->pcp;
> +	pcp->high = 6 * batch;
> +	pcp->batch = max(1UL, 1 * batch);
> +}
> +
>   static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>   {
>   	struct per_cpu_pages *pcp;
> @@ -4013,8 +4021,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>   
>   	pcp = &p->pcp;
>   	pcp->count = 0;
> -	pcp->high = 6 * batch;
> -	pcp->batch = max(1UL, 1 * batch);
> +	pageset_set_batch(p, batch);
>   	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
>   		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>   }
> @@ -4023,7 +4030,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>    * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
>    * to the value high for the pageset p.
>    */
> -
>   static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>   				unsigned long high)
>   {


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
@ 2013-04-07  1:37     ` Simon Jeons
  0 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:37 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> Creates pageset_set_batch() for use in setup_pageset().
> pageset_set_batch() imitates the functionality of
> setup_pagelist_highmark(), but uses the boot time
> (percpu_pagelist_fraction == 0) calculations for determining ->high

Why need adjust pcp->high, pcp->batch during system running? What's the 
requirement?

> based on ->batch.
>
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>   mm/page_alloc.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..5877cf0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4004,6 +4004,14 @@ static int __meminit zone_batchsize(struct zone *zone)
>   #endif
>   }
>   
> +/* a companion to setup_pagelist_highmark() */
> +static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +{
> +	struct per_cpu_pages *pcp = &p->pcp;
> +	pcp->high = 6 * batch;
> +	pcp->batch = max(1UL, 1 * batch);
> +}
> +
>   static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>   {
>   	struct per_cpu_pages *pcp;
> @@ -4013,8 +4021,7 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>   
>   	pcp = &p->pcp;
>   	pcp->count = 0;
> -	pcp->high = 6 * batch;
> -	pcp->batch = max(1UL, 1 * batch);
> +	pageset_set_batch(p, batch);
>   	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
>   		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>   }
> @@ -4023,7 +4030,6 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
>    * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
>    * to the value high for the pageset p.
>    */
> -
>   static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>   				unsigned long high)
>   {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-05 20:33   ` Cody P Schafer
@ 2013-04-07  1:56     ` Simon Jeons
  -1 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:56 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.

If cpu is off, can its pcp pageset be used in free_hot_code_page()?

>
> Change for_each_cpu() to on_each_cpu() to fix.
>
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>   mm/page_alloc.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48f2faa..507db31 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5475,30 +5475,31 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
>   	return 0;
>   }
>   
> +static void _zone_set_pageset_highmark(void *data)
> +{
> +	struct zone *zone = data;
> +	unsigned long  high;
> +	high = zone->managed_pages / percpu_pagelist_fraction;
> +	setup_pagelist_highmark(
> +			per_cpu_ptr(zone->pageset, smp_processor_id()), high);
> +}
> +
>   /*
>    * percpu_pagelist_fraction - changes the pcp->high for each zone on each
>    * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
>    * can have before it gets flushed back to buddy allocator.
>    */
> -
>   int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
>   	void __user *buffer, size_t *length, loff_t *ppos)
>   {
>   	struct zone *zone;
> -	unsigned int cpu;
>   	int ret;
>   
>   	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
>   	if (!write || (ret < 0))
>   		return ret;
> -	for_each_populated_zone(zone) {
> -		for_each_possible_cpu(cpu) {
> -			unsigned long  high;
> -			high = zone->managed_pages / percpu_pagelist_fraction;
> -			setup_pagelist_highmark(
> -				per_cpu_ptr(zone->pageset, cpu), high);
> -		}
> -	}
> +	for_each_populated_zone(zone)
> +		on_each_cpu(_zone_set_pageset_highmark, zone, true);
>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-07  1:56     ` Simon Jeons
  0 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-07  1:56 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/06/2013 04:33 AM, Cody P Schafer wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.

If cpu is off, can its pcp pageset be used in free_hot_code_page()?

>
> Change for_each_cpu() to on_each_cpu() to fix.
>
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>   mm/page_alloc.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48f2faa..507db31 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5475,30 +5475,31 @@ int lowmem_reserve_ratio_sysctl_handler(ctl_table *table, int write,
>   	return 0;
>   }
>   
> +static void _zone_set_pageset_highmark(void *data)
> +{
> +	struct zone *zone = data;
> +	unsigned long  high;
> +	high = zone->managed_pages / percpu_pagelist_fraction;
> +	setup_pagelist_highmark(
> +			per_cpu_ptr(zone->pageset, smp_processor_id()), high);
> +}
> +
>   /*
>    * percpu_pagelist_fraction - changes the pcp->high for each zone on each
>    * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
>    * can have before it gets flushed back to buddy allocator.
>    */
> -
>   int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
>   	void __user *buffer, size_t *length, loff_t *ppos)
>   {
>   	struct zone *zone;
> -	unsigned int cpu;
>   	int ret;
>   
>   	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
>   	if (!write || (ret < 0))
>   		return ret;
> -	for_each_populated_zone(zone) {
> -		for_each_possible_cpu(cpu) {
> -			unsigned long  high;
> -			high = zone->managed_pages / percpu_pagelist_fraction;
> -			setup_pagelist_highmark(
> -				per_cpu_ptr(zone->pageset, cpu), high);
> -		}
> -	}
> +	for_each_populated_zone(zone)
> +		on_each_cpu(_zone_set_pageset_highmark, zone, true);
>   	return 0;
>   }
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-05 20:33 ` Cody P Schafer
@ 2013-04-07 15:23   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:23 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> In one case while modifying the ->high and ->batch fields of per cpu pagesets
> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
> syncronization at all (patch 3).
> 
> This patchset fixes both of them.
> 
> Note that it results in a change to the behavior of zone_pcp_update(), which is
> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
> essential behavior (changing ->high and ->batch), and only eliminated unneeded
> actions (draining the per cpu pages), but this may not be the case.

at least, memory hotplug need to drain.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-07 15:23   ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:23 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> In one case while modifying the ->high and ->batch fields of per cpu pagesets
> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
> syncronization at all (patch 3).
> 
> This patchset fixes both of them.
> 
> Note that it results in a change to the behavior of zone_pcp_update(), which is
> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
> essential behavior (changing ->high and ->batch), and only eliminated unneeded
> actions (draining the per cpu pages), but this may not be the case.

at least, memory hotplug need to drain.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-05 20:33   ` Cody P Schafer
@ 2013-04-07 15:39     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:39 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> No off-cpu users of the percpu pagesets exist.
> 
> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
> percpu pageset based on a zone's ->managed_pages. We don't need to drain
> the entire percpu pageset just to modify these fields. Avoid calling
> setup_pageset() (and the draining required to call it) and instead just
> set the fields' values.
> 
> This does change the behavior of zone_pcp_update() as the percpu
> pagesets will not be drained when zone_pcp_update() is called (they will
> end up being shrunk, not completely drained, later when a 0-order page
> is freed in free_hot_cold_page()).
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

NAK.

1) zone_pcp_update() is only used from memory hotplug and it require page drain.
2) stop_machin is used for avoiding race. just removing it is insane.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-07 15:39     ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:39 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> No off-cpu users of the percpu pagesets exist.
> 
> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
> percpu pageset based on a zone's ->managed_pages. We don't need to drain
> the entire percpu pageset just to modify these fields. Avoid calling
> setup_pageset() (and the draining required to call it) and instead just
> set the fields' values.
> 
> This does change the behavior of zone_pcp_update() as the percpu
> pagesets will not be drained when zone_pcp_update() is called (they will
> end up being shrunk, not completely drained, later when a 0-order page
> is freed in free_hot_cold_page()).
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

NAK.

1) zone_pcp_update() is only used from memory hotplug and it require page drain.
2) stop_machin is used for avoiding race. just removing it is insane.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-05 20:33   ` Cody P Schafer
@ 2013-04-07 15:41     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:41 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.
> 
> Change for_each_cpu() to on_each_cpu() to fix.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-07 15:41     ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-07 15:41 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML, kosaki.motohiro

(4/5/13 4:33 PM), Cody P Schafer wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.
> 
> Change for_each_cpu() to on_each_cpu() to fix.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-05 20:33   ` Cody P Schafer
@ 2013-04-08 12:20     ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-08 12:20 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.
>
> Change for_each_cpu() to on_each_cpu() to fix.

Are you referring to this? -

1329         if (pcp->count >= pcp->high) {
1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
1331                 pcp->count -= pcp->batch;
1332         }

I'm probably missing the obvious but won't it be simpler to do this in
 free_hot_cold_page() -

1329         if (pcp->count >= pcp->high) {
1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
1331                 free_pcppages_bulk(zone, batch, pcp);
1332                 pcp->count -= batch;
1333         }

Now the batch value used is stable and you don't have to IPI every CPU
in the system just to change a config knob...

Thanks,
Gilad



--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a situation
where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 12:20     ` Gilad Ben-Yossef
  0 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-08 12:20 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
> Updating it without being on the cpu owning the percpu pageset
> potentially destroys this stability.
>
> Change for_each_cpu() to on_each_cpu() to fix.

Are you referring to this? -

1329         if (pcp->count >= pcp->high) {
1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
1331                 pcp->count -= pcp->batch;
1332         }

I'm probably missing the obvious but won't it be simpler to do this in
 free_hot_cold_page() -

1329         if (pcp->count >= pcp->high) {
1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
1331                 free_pcppages_bulk(zone, batch, pcp);
1332                 pcp->count -= batch;
1333         }

Now the batch value used is stable and you don't have to IPI every CPU
in the system just to change a config knob...

Thanks,
Gilad



--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a situation
where the homework eats your dog?"
 -- Jean-Baptiste Queru

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-07 15:23   ` KOSAKI Motohiro
@ 2013-04-08 17:16     ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:16 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
> (4/5/13 4:33 PM), Cody P Schafer wrote:
>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>> syncronization at all (patch 3).
>>
>> This patchset fixes both of them.
>>
>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>> actions (draining the per cpu pages), but this may not be the case.
> 
> at least, memory hotplug need to drain.

Could you explain why the drain is required here? From what I can tell,
after the stop_machine() completes, the per cpu page sets could be
repopulated at any point, making the combination of draining and
modifying ->batch & ->high uneeded.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-08 17:16     ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:16 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
> (4/5/13 4:33 PM), Cody P Schafer wrote:
>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>> syncronization at all (patch 3).
>>
>> This patchset fixes both of them.
>>
>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>> actions (draining the per cpu pages), but this may not be the case.
> 
> at least, memory hotplug need to drain.

Could you explain why the drain is required here? From what I can tell,
after the stop_machine() completes, the per cpu page sets could be
repopulated at any point, making the combination of draining and
modifying ->batch & ->high uneeded.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-08 12:20     ` Gilad Ben-Yossef
@ 2013-04-08 17:28       ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:28 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>>
>> Change for_each_cpu() to on_each_cpu() to fix.
>
> Are you referring to this? -

This was the case I noticed.

>
> 1329         if (pcp->count >= pcp->high) {
> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
> 1331                 pcp->count -= pcp->batch;
> 1332         }
>
> I'm probably missing the obvious but won't it be simpler to do this in
>   free_hot_cold_page() -
>
> 1329         if (pcp->count >= pcp->high) {
> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
> 1331                 free_pcppages_bulk(zone, batch, pcp);
> 1332                 pcp->count -= batch;
> 1333         }
>

Potentially, yes. Note that this was simply the one case I noticed, 
rather than certainly the only case.

I also wonder whether there could be unexpected interactions between 
->high and ->batch not changing together atomically. For example, could 
adjusting this knob cause ->batch to rise enough that it is greater than 
the previous ->high? If the code above then runs with the previous 
->high, ->count wouldn't be correct (checking this inside 
free_pcppages_bulk() might help on this one issue).

> Now the batch value used is stable and you don't have to IPI every CPU
> in the system just to change a config knob...

Is this really considered an issue? I wouldn't have expected someone to 
adjust the config knob often enough (or even more than once) to cause 
problems. Of course as a "It'd be nice" thing, I completely agree.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 17:28       ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:28 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>>
>> Change for_each_cpu() to on_each_cpu() to fix.
>
> Are you referring to this? -

This was the case I noticed.

>
> 1329         if (pcp->count >= pcp->high) {
> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
> 1331                 pcp->count -= pcp->batch;
> 1332         }
>
> I'm probably missing the obvious but won't it be simpler to do this in
>   free_hot_cold_page() -
>
> 1329         if (pcp->count >= pcp->high) {
> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
> 1331                 free_pcppages_bulk(zone, batch, pcp);
> 1332                 pcp->count -= batch;
> 1333         }
>

Potentially, yes. Note that this was simply the one case I noticed, 
rather than certainly the only case.

I also wonder whether there could be unexpected interactions between 
->high and ->batch not changing together atomically. For example, could 
adjusting this knob cause ->batch to rise enough that it is greater than 
the previous ->high? If the code above then runs with the previous 
->high, ->count wouldn't be correct (checking this inside 
free_pcppages_bulk() might help on this one issue).

> Now the batch value used is stable and you don't have to IPI every CPU
> in the system just to change a config knob...

Is this really considered an issue? I wouldn't have expected someone to 
adjust the config knob often enough (or even more than once) to cause 
problems. Of course as a "It'd be nice" thing, I completely agree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-07 15:39     ` KOSAKI Motohiro
@ 2013-04-08 17:32       ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:32 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
> (4/5/13 4:33 PM), Cody P Schafer wrote:
>> No off-cpu users of the percpu pagesets exist.
>>
>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>> the entire percpu pageset just to modify these fields. Avoid calling
>> setup_pageset() (and the draining required to call it) and instead just
>> set the fields' values.
>>
>> This does change the behavior of zone_pcp_update() as the percpu
>> pagesets will not be drained when zone_pcp_update() is called (they will
>> end up being shrunk, not completely drained, later when a 0-order page
>> is freed in free_hot_cold_page()).
>>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> 
> NAK.
> 
> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.

I'm looking at this code because I'm currently working on a patchset
which adds another interface which modifies zone sizes, so "only used
from memory hotplug" is a temporary thing (unless I discover that
zone_pcp_update() is not intended to do what I want it to do).

> 2) stop_machin is used for avoiding race. just removing it is insane.

What race? Is there a cross cpu access to ->high & ->batch that makes
using on_each_cpu() instead of stop_machine() inappropriate? It is
absolutely not just being removed.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-08 17:32       ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:32 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
> (4/5/13 4:33 PM), Cody P Schafer wrote:
>> No off-cpu users of the percpu pagesets exist.
>>
>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>> the entire percpu pageset just to modify these fields. Avoid calling
>> setup_pageset() (and the draining required to call it) and instead just
>> set the fields' values.
>>
>> This does change the behavior of zone_pcp_update() as the percpu
>> pagesets will not be drained when zone_pcp_update() is called (they will
>> end up being shrunk, not completely drained, later when a 0-order page
>> is freed in free_hot_cold_page()).
>>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> 
> NAK.
> 
> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.

I'm looking at this code because I'm currently working on a patchset
which adds another interface which modifies zone sizes, so "only used
from memory hotplug" is a temporary thing (unless I discover that
zone_pcp_update() is not intended to do what I want it to do).

> 2) stop_machin is used for avoiding race. just removing it is insane.

What race? Is there a cross cpu access to ->high & ->batch that makes
using on_each_cpu() instead of stop_machine() inappropriate? It is
absolutely not just being removed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-07  1:56     ` Simon Jeons
@ 2013-04-08 17:34       ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:34 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:56 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>
> If cpu is off, can its pcp pageset be used in free_hot_code_page()?
>

I don't expect it to be as we use this_cpu_ptr() to access the pcp 
pageset. Unless there is some crazy mode where we can override the cpu a 
task believes it is running on...


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 17:34       ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:34 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:56 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>
> If cpu is off, can its pcp pageset be used in free_hot_code_page()?
>

I don't expect it to be as we use this_cpu_ptr() to access the pcp 
pageset. Unless there is some crazy mode where we can override the cpu a 
task believes it is running on...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-04-07  1:37     ` Simon Jeons
@ 2013-04-08 17:39       ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:39 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:37 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> Creates pageset_set_batch() for use in setup_pageset().
>> pageset_set_batch() imitates the functionality of
>> setup_pagelist_highmark(), but uses the boot time
>> (percpu_pagelist_fraction == 0) calculations for determining ->high
>
> Why need adjust pcp->high, pcp->batch during system running? What's the
> requirement?
>

There is currently a sysctl (which I patch later in this series) which 
allows adjusting the ->high mark (and, indirectly, ->batch). 
Additionally, memory hotplug changes ->high and ->batch due to the zone 
size changing (essentially, zone->managed_pages and zone->present_pages 
have changed) , meaning that zone_batchsize(), which is used at boot to 
set ->batch and (indirectly) ->high has a different output.

Note that in addition to the 2 users of this functionality mentioned 
here, I'm currently working on anther resizer of zones (runtime NUMA 
reconfiguration).


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
@ 2013-04-08 17:39       ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 17:39 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:37 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> Creates pageset_set_batch() for use in setup_pageset().
>> pageset_set_batch() imitates the functionality of
>> setup_pagelist_highmark(), but uses the boot time
>> (percpu_pagelist_fraction == 0) calculations for determining ->high
>
> Why need adjust pcp->high, pcp->batch during system running? What's the
> requirement?
>

There is currently a sysctl (which I patch later in this series) which 
allows adjusting the ->high mark (and, indirectly, ->batch). 
Additionally, memory hotplug changes ->high and ->batch due to the zone 
size changing (essentially, zone->managed_pages and zone->present_pages 
have changed) , meaning that zone_batchsize(), which is used at boot to 
set ->batch and (indirectly) ->high has a different output.

Note that in addition to the 2 users of this functionality mentioned 
here, I'm currently working on anther resizer of zones (runtime NUMA 
reconfiguration).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-08 17:16     ` Cody P Schafer
@ 2013-04-08 19:08       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:08 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 1:16 PM), Cody P Schafer wrote:
> On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>>> syncronization at all (patch 3).
>>>
>>> This patchset fixes both of them.
>>>
>>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>>> actions (draining the per cpu pages), but this may not be the case.
>>
>> at least, memory hotplug need to drain.
> 
> Could you explain why the drain is required here? From what I can tell,
> after the stop_machine() completes, the per cpu page sets could be
> repopulated at any point, making the combination of draining and
> modifying ->batch & ->high uneeded.

Then, memory hotplug again and again try to drain. Moreover hotplug prevent repopulation
by using MIGRATE_ISOLATE.
pcp never be page count == 0 and it prevent memory hot remove.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-08 19:08       ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:08 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 1:16 PM), Cody P Schafer wrote:
> On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>>> syncronization at all (patch 3).
>>>
>>> This patchset fixes both of them.
>>>
>>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>>> actions (draining the per cpu pages), but this may not be the case.
>>
>> at least, memory hotplug need to drain.
> 
> Could you explain why the drain is required here? From what I can tell,
> after the stop_machine() completes, the per cpu page sets could be
> repopulated at any point, making the combination of draining and
> modifying ->batch & ->high uneeded.

Then, memory hotplug again and again try to drain. Moreover hotplug prevent repopulation
by using MIGRATE_ISOLATE.
pcp never be page count == 0 and it prevent memory hot remove.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-08 12:20     ` Gilad Ben-Yossef
@ 2013-04-08 19:16       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:16 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: Cody P Schafer, Andrew Morton, Mel Gorman, Linux MM, LKML,
	kosaki.motohiro

(4/8/13 8:20 AM), Gilad Ben-Yossef wrote:
> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>>
>> Change for_each_cpu() to on_each_cpu() to fix.
> 
> Are you referring to this? -
> 
> 1329         if (pcp->count >= pcp->high) {
> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
> 1331                 pcp->count -= pcp->batch;
> 1332         }
> 
> I'm probably missing the obvious but won't it be simpler to do this in
>  free_hot_cold_page() -
> 
> 1329         if (pcp->count >= pcp->high) {
> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
> 1331                 free_pcppages_bulk(zone, batch, pcp);
> 1332                 pcp->count -= batch;
> 1333         }
> 
> Now the batch value used is stable and you don't have to IPI every CPU
> in the system just to change a config knob...

OK, right. Your approach is much better.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 19:16       ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:16 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: Cody P Schafer, Andrew Morton, Mel Gorman, Linux MM, LKML,
	kosaki.motohiro

(4/8/13 8:20 AM), Gilad Ben-Yossef wrote:
> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>> Updating it without being on the cpu owning the percpu pageset
>> potentially destroys this stability.
>>
>> Change for_each_cpu() to on_each_cpu() to fix.
> 
> Are you referring to this? -
> 
> 1329         if (pcp->count >= pcp->high) {
> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
> 1331                 pcp->count -= pcp->batch;
> 1332         }
> 
> I'm probably missing the obvious but won't it be simpler to do this in
>  free_hot_cold_page() -
> 
> 1329         if (pcp->count >= pcp->high) {
> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
> 1331                 free_pcppages_bulk(zone, batch, pcp);
> 1332                 pcp->count -= batch;
> 1333         }
> 
> Now the batch value used is stable and you don't have to IPI every CPU
> in the system just to change a config knob...

OK, right. Your approach is much better.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-08 17:32       ` Cody P Schafer
@ 2013-04-08 19:26         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:26 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 1:32 PM), Cody P Schafer wrote:
> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>> No off-cpu users of the percpu pagesets exist.
>>>
>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>> the entire percpu pageset just to modify these fields. Avoid calling
>>> setup_pageset() (and the draining required to call it) and instead just
>>> set the fields' values.
>>>
>>> This does change the behavior of zone_pcp_update() as the percpu
>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>> end up being shrunk, not completely drained, later when a 0-order page
>>> is freed in free_hot_cold_page()).
>>>
>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>
>> NAK.
>>
>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
> 
> I'm looking at this code because I'm currently working on a patchset
> which adds another interface which modifies zone sizes, so "only used
> from memory hotplug" is a temporary thing (unless I discover that
> zone_pcp_update() is not intended to do what I want it to do).

maybe yes, maybe no. I don't know temporary or not. However the fact is, 
you must not break anywhere. You need to look all caller always.


>> 2) stop_machin is used for avoiding race. just removing it is insane.
> 
> What race? Is there a cross cpu access to ->high & ->batch that makes
> using on_each_cpu() instead of stop_machine() inappropriate? It is
> absolutely not just being removed.

OK, I missed that. however your code is still wrong.
However you can't call free_pcppages_bulk() from interrupt context and
then you can't use on_each_cpu() anyway.







^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-08 19:26         ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 19:26 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 1:32 PM), Cody P Schafer wrote:
> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>> No off-cpu users of the percpu pagesets exist.
>>>
>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>> the entire percpu pageset just to modify these fields. Avoid calling
>>> setup_pageset() (and the draining required to call it) and instead just
>>> set the fields' values.
>>>
>>> This does change the behavior of zone_pcp_update() as the percpu
>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>> end up being shrunk, not completely drained, later when a 0-order page
>>> is freed in free_hot_cold_page()).
>>>
>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>
>> NAK.
>>
>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
> 
> I'm looking at this code because I'm currently working on a patchset
> which adds another interface which modifies zone sizes, so "only used
> from memory hotplug" is a temporary thing (unless I discover that
> zone_pcp_update() is not intended to do what I want it to do).

maybe yes, maybe no. I don't know temporary or not. However the fact is, 
you must not break anywhere. You need to look all caller always.


>> 2) stop_machin is used for avoiding race. just removing it is insane.
> 
> What race? Is there a cross cpu access to ->high & ->batch that makes
> using on_each_cpu() instead of stop_machine() inappropriate? It is
> absolutely not just being removed.

OK, I missed that. however your code is still wrong.
However you can't call free_pcppages_bulk() from interrupt context and
then you can't use on_each_cpu() anyway.






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-07  1:32   ` Simon Jeons
@ 2013-04-08 19:37     ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:37 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:32 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> In one case while modifying the ->high and ->batch fields of per cpu
>> pagesets
>> we're unneededly using stop_machine() (patches 1 & 2), and in another
>> we don't have any
>> syncronization at all (patch 3).
>
> Do you mean stop_machine() is used for syncronization between each
> online cpu?
>

I mean that it looks like synchronization between cpus is unneeded 
because of how per cpu pagesets are used, so stop_machine() (which does 
provide syncro between all cpus) is unnecessarily "strong".


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-08 19:37     ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:37 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/06/2013 06:32 PM, Simon Jeons wrote:
> Hi Cody,
> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>> In one case while modifying the ->high and ->batch fields of per cpu
>> pagesets
>> we're unneededly using stop_machine() (patches 1 & 2), and in another
>> we don't have any
>> syncronization at all (patch 3).
>
> Do you mean stop_machine() is used for syncronization between each
> online cpu?
>

I mean that it looks like synchronization between cpus is unneeded 
because of how per cpu pagesets are used, so stop_machine() (which does 
provide syncro between all cpus) is unnecessarily "strong".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-08 19:26         ` KOSAKI Motohiro
@ 2013-04-08 19:49           ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:49 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 12:26 PM, KOSAKI Motohiro wrote:
> (4/8/13 1:32 PM), Cody P Schafer wrote:
>> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>> No off-cpu users of the percpu pagesets exist.
>>>>
>>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>>> the entire percpu pageset just to modify these fields. Avoid calling
>>>> setup_pageset() (and the draining required to call it) and instead just
>>>> set the fields' values.
>>>>
>>>> This does change the behavior of zone_pcp_update() as the percpu
>>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>>> end up being shrunk, not completely drained, later when a 0-order page
>>>> is freed in free_hot_cold_page()).
>>>>
>>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>>
>>> NAK.
>>>
>>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
>>
>> I'm looking at this code because I'm currently working on a patchset
>> which adds another interface which modifies zone sizes, so "only used
>> from memory hotplug" is a temporary thing (unless I discover that
>> zone_pcp_update() is not intended to do what I want it to do).
>
> maybe yes, maybe no. I don't know temporary or not. However the fact is,
> you must not break anywhere. You need to look all caller always.

Right, which is why I want to understand memory hotplug's actual 
requirements.

>>> 2) stop_machin is used for avoiding race. just removing it is insane.
>>
>> What race? Is there a cross cpu access to ->high & ->batch that makes
>> using on_each_cpu() instead of stop_machine() inappropriate? It is
>> absolutely not just being removed.
>
> OK, I missed that. however your code is still wrong.
> However you can't call free_pcppages_bulk() from interrupt context and
> then you can't use on_each_cpu() anyway.

Given drain_pages() implementation, I find that hard to believe (It uses 
on_each_cpu_mask() and eventually calls free_pcppages_bulk()).

Can you provide a reference backing up your statement?

If this turns out to be an issue, schedule_on_each_cpu() could be an 
alternative.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-08 19:49           ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:49 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 12:26 PM, KOSAKI Motohiro wrote:
> (4/8/13 1:32 PM), Cody P Schafer wrote:
>> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>> No off-cpu users of the percpu pagesets exist.
>>>>
>>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>>> the entire percpu pageset just to modify these fields. Avoid calling
>>>> setup_pageset() (and the draining required to call it) and instead just
>>>> set the fields' values.
>>>>
>>>> This does change the behavior of zone_pcp_update() as the percpu
>>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>>> end up being shrunk, not completely drained, later when a 0-order page
>>>> is freed in free_hot_cold_page()).
>>>>
>>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>>
>>> NAK.
>>>
>>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
>>
>> I'm looking at this code because I'm currently working on a patchset
>> which adds another interface which modifies zone sizes, so "only used
>> from memory hotplug" is a temporary thing (unless I discover that
>> zone_pcp_update() is not intended to do what I want it to do).
>
> maybe yes, maybe no. I don't know temporary or not. However the fact is,
> you must not break anywhere. You need to look all caller always.

Right, which is why I want to understand memory hotplug's actual 
requirements.

>>> 2) stop_machin is used for avoiding race. just removing it is insane.
>>
>> What race? Is there a cross cpu access to ->high & ->batch that makes
>> using on_each_cpu() instead of stop_machine() inappropriate? It is
>> absolutely not just being removed.
>
> OK, I missed that. however your code is still wrong.
> However you can't call free_pcppages_bulk() from interrupt context and
> then you can't use on_each_cpu() anyway.

Given drain_pages() implementation, I find that hard to believe (It uses 
on_each_cpu_mask() and eventually calls free_pcppages_bulk()).

Can you provide a reference backing up your statement?

If this turns out to be an issue, schedule_on_each_cpu() could be an 
alternative.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-08 17:28       ` Cody P Schafer
@ 2013-04-08 19:50         ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:50 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 10:28 AM, Cody P Schafer wrote:
> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer
>> <cody@linux.vnet.ibm.com> wrote:
>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>> Updating it without being on the cpu owning the percpu pageset
>>> potentially destroys this stability.
>>>
>>> Change for_each_cpu() to on_each_cpu() to fix.
>>
>> Are you referring to this? -
>
> This was the case I noticed.
>
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>> 1331                 pcp->count -= pcp->batch;
>> 1332         }
>>
>> I'm probably missing the obvious but won't it be simpler to do this in
>>   free_hot_cold_page() -
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>> 1332                 pcp->count -= batch;
>> 1333         }
>>
>
> Potentially, yes. Note that this was simply the one case I noticed,
> rather than certainly the only case.
>
> I also wonder whether there could be unexpected interactions between
> ->high and ->batch not changing together atomically. For example, could
> adjusting this knob cause ->batch to rise enough that it is greater than
> the previous ->high? If the code above then runs with the previous
> ->high, ->count wouldn't be correct (checking this inside
> free_pcppages_bulk() might help on this one issue).
>
>> Now the batch value used is stable and you don't have to IPI every CPU
>> in the system just to change a config knob...
>
> Is this really considered an issue? I wouldn't have expected someone to
> adjust the config knob often enough (or even more than once) to cause
> problems. Of course as a "It'd be nice" thing, I completely agree.

Would using schedule_on_each_cpu() instead of on_each_cpu() be an 
improvement, in your opinion?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 19:50         ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 19:50 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 10:28 AM, Cody P Schafer wrote:
> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer
>> <cody@linux.vnet.ibm.com> wrote:
>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>> Updating it without being on the cpu owning the percpu pageset
>>> potentially destroys this stability.
>>>
>>> Change for_each_cpu() to on_each_cpu() to fix.
>>
>> Are you referring to this? -
>
> This was the case I noticed.
>
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>> 1331                 pcp->count -= pcp->batch;
>> 1332         }
>>
>> I'm probably missing the obvious but won't it be simpler to do this in
>>   free_hot_cold_page() -
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>> 1332                 pcp->count -= batch;
>> 1333         }
>>
>
> Potentially, yes. Note that this was simply the one case I noticed,
> rather than certainly the only case.
>
> I also wonder whether there could be unexpected interactions between
> ->high and ->batch not changing together atomically. For example, could
> adjusting this knob cause ->batch to rise enough that it is greater than
> the previous ->high? If the code above then runs with the previous
> ->high, ->count wouldn't be correct (checking this inside
> free_pcppages_bulk() might help on this one issue).
>
>> Now the batch value used is stable and you don't have to IPI every CPU
>> in the system just to change a config knob...
>
> Is this really considered an issue? I wouldn't have expected someone to
> adjust the config knob often enough (or even more than once) to cause
> problems. Of course as a "It'd be nice" thing, I completely agree.

Would using schedule_on_each_cpu() instead of on_each_cpu() be an 
improvement, in your opinion?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
  2013-04-08 19:08       ` KOSAKI Motohiro
@ 2013-04-08 21:17         ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 21:17 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 12:08 PM, KOSAKI Motohiro wrote:
> (4/8/13 1:16 PM), Cody P Schafer wrote:
>> On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>>>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>>>> syncronization at all (patch 3).
>>>>
>>>> This patchset fixes both of them.
>>>>
>>>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>>>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>>>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>>>> actions (draining the per cpu pages), but this may not be the case.
>>>
>>> at least, memory hotplug need to drain.
>>
>> Could you explain why the drain is required here? From what I can tell,
>> after the stop_machine() completes, the per cpu page sets could be
>> repopulated at any point, making the combination of draining and
>> modifying ->batch & ->high uneeded.
> 
> Then, memory hotplug again and again try to drain. Moreover hotplug prevent repopulation
> by using MIGRATE_ISOLATE.
> pcp never be page count == 0 and it prevent memory hot remove.

zone_pcp_update() is not part of the drain retry loop, in the offline
pages case, it is only called following success in "removal" (online
pages also calls zone_pcp_update(), I don't see how it could benifit in
any way from draining the per cpu pagesets).

So I still don't see where the need for draining pages combined with
modifying ->high and ->batch came from.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch
@ 2013-04-08 21:17         ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-08 21:17 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 12:08 PM, KOSAKI Motohiro wrote:
> (4/8/13 1:16 PM), Cody P Schafer wrote:
>> On 04/07/2013 08:23 AM, KOSAKI Motohiro wrote:
>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>> In one case while modifying the ->high and ->batch fields of per cpu pagesets
>>>> we're unneededly using stop_machine() (patches 1 & 2), and in another we don't have any
>>>> syncronization at all (patch 3).
>>>>
>>>> This patchset fixes both of them.
>>>>
>>>> Note that it results in a change to the behavior of zone_pcp_update(), which is
>>>> used by memory_hotplug. I _think_ that I've diserned (and preserved) the
>>>> essential behavior (changing ->high and ->batch), and only eliminated unneeded
>>>> actions (draining the per cpu pages), but this may not be the case.
>>>
>>> at least, memory hotplug need to drain.
>>
>> Could you explain why the drain is required here? From what I can tell,
>> after the stop_machine() completes, the per cpu page sets could be
>> repopulated at any point, making the combination of draining and
>> modifying ->batch & ->high uneeded.
> 
> Then, memory hotplug again and again try to drain. Moreover hotplug prevent repopulation
> by using MIGRATE_ISOLATE.
> pcp never be page count == 0 and it prevent memory hot remove.

zone_pcp_update() is not part of the drain retry loop, in the offline
pages case, it is only called following success in "removal" (online
pages also calls zone_pcp_update(), I don't see how it could benifit in
any way from draining the per cpu pagesets).

So I still don't see where the need for draining pages combined with
modifying ->high and ->batch came from.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-08 19:49           ` Cody P Schafer
@ 2013-04-08 22:18             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 22:18 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 3:49 PM), Cody P Schafer wrote:
> On 04/08/2013 12:26 PM, KOSAKI Motohiro wrote:
>> (4/8/13 1:32 PM), Cody P Schafer wrote:
>>> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>>> No off-cpu users of the percpu pagesets exist.
>>>>>
>>>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>>>> the entire percpu pageset just to modify these fields. Avoid calling
>>>>> setup_pageset() (and the draining required to call it) and instead just
>>>>> set the fields' values.
>>>>>
>>>>> This does change the behavior of zone_pcp_update() as the percpu
>>>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>>>> end up being shrunk, not completely drained, later when a 0-order page
>>>>> is freed in free_hot_cold_page()).
>>>>>
>>>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>>>
>>>> NAK.
>>>>
>>>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
>>>
>>> I'm looking at this code because I'm currently working on a patchset
>>> which adds another interface which modifies zone sizes, so "only used
>>> from memory hotplug" is a temporary thing (unless I discover that
>>> zone_pcp_update() is not intended to do what I want it to do).
>>
>> maybe yes, maybe no. I don't know temporary or not. However the fact is,
>> you must not break anywhere. You need to look all caller always.
> 
> Right, which is why I want to understand memory hotplug's actual 
> requirements.
> 
>>>> 2) stop_machin is used for avoiding race. just removing it is insane.
>>>
>>> What race? Is there a cross cpu access to ->high & ->batch that makes
>>> using on_each_cpu() instead of stop_machine() inappropriate? It is
>>> absolutely not just being removed.
>>
>> OK, I missed that. however your code is still wrong.
>> However you can't call free_pcppages_bulk() from interrupt context and
>> then you can't use on_each_cpu() anyway.
> 
> Given drain_pages() implementation, I find that hard to believe (It uses 
> on_each_cpu_mask() and eventually calls free_pcppages_bulk()).
> 
> Can you provide a reference backing up your statement?

Grr. I missed again. OK you are right. go ahead.



> If this turns out to be an issue, schedule_on_each_cpu() could be an 
> alternative.

no way. schedule_on_each_cpu() is more problematic and it should be removed
in the future.
schedule_on_each_cpu() can only be used when caller task don't have any lock.
otherwise it may make deadlock.








^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-08 22:18             ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 22:18 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: KOSAKI Motohiro, Andrew Morton, Mel Gorman, Linux MM, LKML

(4/8/13 3:49 PM), Cody P Schafer wrote:
> On 04/08/2013 12:26 PM, KOSAKI Motohiro wrote:
>> (4/8/13 1:32 PM), Cody P Schafer wrote:
>>> On 04/07/2013 08:39 AM, KOSAKI Motohiro wrote:
>>>> (4/5/13 4:33 PM), Cody P Schafer wrote:
>>>>> No off-cpu users of the percpu pagesets exist.
>>>>>
>>>>> zone_pcp_update()'s goal is to adjust the ->high and ->mark members of a
>>>>> percpu pageset based on a zone's ->managed_pages. We don't need to drain
>>>>> the entire percpu pageset just to modify these fields. Avoid calling
>>>>> setup_pageset() (and the draining required to call it) and instead just
>>>>> set the fields' values.
>>>>>
>>>>> This does change the behavior of zone_pcp_update() as the percpu
>>>>> pagesets will not be drained when zone_pcp_update() is called (they will
>>>>> end up being shrunk, not completely drained, later when a 0-order page
>>>>> is freed in free_hot_cold_page()).
>>>>>
>>>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>>>
>>>> NAK.
>>>>
>>>> 1) zone_pcp_update() is only used from memory hotplug and it require page drain.
>>>
>>> I'm looking at this code because I'm currently working on a patchset
>>> which adds another interface which modifies zone sizes, so "only used
>>> from memory hotplug" is a temporary thing (unless I discover that
>>> zone_pcp_update() is not intended to do what I want it to do).
>>
>> maybe yes, maybe no. I don't know temporary or not. However the fact is,
>> you must not break anywhere. You need to look all caller always.
> 
> Right, which is why I want to understand memory hotplug's actual 
> requirements.
> 
>>>> 2) stop_machin is used for avoiding race. just removing it is insane.
>>>
>>> What race? Is there a cross cpu access to ->high & ->batch that makes
>>> using on_each_cpu() instead of stop_machine() inappropriate? It is
>>> absolutely not just being removed.
>>
>> OK, I missed that. however your code is still wrong.
>> However you can't call free_pcppages_bulk() from interrupt context and
>> then you can't use on_each_cpu() anyway.
> 
> Given drain_pages() implementation, I find that hard to believe (It uses 
> on_each_cpu_mask() and eventually calls free_pcppages_bulk()).
> 
> Can you provide a reference backing up your statement?

Grr. I missed again. OK you are right. go ahead.



> If this turns out to be an issue, schedule_on_each_cpu() could be an 
> alternative.

no way. schedule_on_each_cpu() is more problematic and it should be removed
in the future.
schedule_on_each_cpu() can only be used when caller task don't have any lock.
otherwise it may make deadlock.







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-08 19:50         ` Cody P Schafer
@ 2013-04-08 22:23           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 22:23 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Gilad Ben-Yossef, Andrew Morton, Mel Gorman, Linux MM, LKML,
	kosaki.motohiro

(4/8/13 3:50 PM), Cody P Schafer wrote:
> On 04/08/2013 10:28 AM, Cody P Schafer wrote:
>> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer
>>> <cody@linux.vnet.ibm.com> wrote:
>>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>>> Updating it without being on the cpu owning the percpu pageset
>>>> potentially destroys this stability.
>>>>
>>>> Change for_each_cpu() to on_each_cpu() to fix.
>>>
>>> Are you referring to this? -
>>
>> This was the case I noticed.
>>
>>>
>>> 1329         if (pcp->count >= pcp->high) {
>>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>>> 1331                 pcp->count -= pcp->batch;
>>> 1332         }
>>>
>>> I'm probably missing the obvious but won't it be simpler to do this in
>>>   free_hot_cold_page() -
>>>
>>> 1329         if (pcp->count >= pcp->high) {
>>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>>> 1332                 pcp->count -= batch;
>>> 1333         }
>>>
>>
>> Potentially, yes. Note that this was simply the one case I noticed,
>> rather than certainly the only case.
>>
>> I also wonder whether there could be unexpected interactions between
>> ->high and ->batch not changing together atomically. For example, could
>> adjusting this knob cause ->batch to rise enough that it is greater than
>> the previous ->high? If the code above then runs with the previous
>> ->high, ->count wouldn't be correct (checking this inside
>> free_pcppages_bulk() might help on this one issue).
>>
>>> Now the batch value used is stable and you don't have to IPI every CPU
>>> in the system just to change a config knob...
>>
>> Is this really considered an issue? I wouldn't have expected someone to
>> adjust the config knob often enough (or even more than once) to cause
>> problems. Of course as a "It'd be nice" thing, I completely agree.
> 
> Would using schedule_on_each_cpu() instead of on_each_cpu() be an 
> improvement, in your opinion?

No. As far as lightweight solusion work, we shouldn't introduce heavyweight
code never. on_each_cpu() is really heavy weight especially when number of 
cpus are much than a thousand.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-08 22:23           ` KOSAKI Motohiro
  0 siblings, 0 replies; 60+ messages in thread
From: KOSAKI Motohiro @ 2013-04-08 22:23 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Gilad Ben-Yossef, Andrew Morton, Mel Gorman, Linux MM, LKML,
	kosaki.motohiro

(4/8/13 3:50 PM), Cody P Schafer wrote:
> On 04/08/2013 10:28 AM, Cody P Schafer wrote:
>> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer
>>> <cody@linux.vnet.ibm.com> wrote:
>>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>>> Updating it without being on the cpu owning the percpu pageset
>>>> potentially destroys this stability.
>>>>
>>>> Change for_each_cpu() to on_each_cpu() to fix.
>>>
>>> Are you referring to this? -
>>
>> This was the case I noticed.
>>
>>>
>>> 1329         if (pcp->count >= pcp->high) {
>>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>>> 1331                 pcp->count -= pcp->batch;
>>> 1332         }
>>>
>>> I'm probably missing the obvious but won't it be simpler to do this in
>>>   free_hot_cold_page() -
>>>
>>> 1329         if (pcp->count >= pcp->high) {
>>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>>> 1332                 pcp->count -= batch;
>>> 1333         }
>>>
>>
>> Potentially, yes. Note that this was simply the one case I noticed,
>> rather than certainly the only case.
>>
>> I also wonder whether there could be unexpected interactions between
>> ->high and ->batch not changing together atomically. For example, could
>> adjusting this knob cause ->batch to rise enough that it is greater than
>> the previous ->high? If the code above then runs with the previous
>> ->high, ->count wouldn't be correct (checking this inside
>> free_pcppages_bulk() might help on this one issue).
>>
>>> Now the batch value used is stable and you don't have to IPI every CPU
>>> in the system just to change a config knob...
>>
>> Is this really considered an issue? I wouldn't have expected someone to
>> adjust the config knob often enough (or even more than once) to cause
>> problems. Of course as a "It'd be nice" thing, I completely agree.
> 
> Would using schedule_on_each_cpu() instead of on_each_cpu() be an 
> improvement, in your opinion?

No. As far as lightweight solusion work, we shouldn't introduce heavyweight
code never. on_each_cpu() is really heavy weight especially when number of 
cpus are much than a thousand.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
  2013-04-08 22:18             ` KOSAKI Motohiro
@ 2013-04-09  1:52               ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-09  1:52 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 03:18 PM, KOSAKI Motohiro wrote:
> (4/8/13 3:49 PM), Cody P Schafer wrote:>
>> If this turns out to be an issue, schedule_on_each_cpu() could be an
>> alternative.
>
> no way. schedule_on_each_cpu() is more problematic and it should be removed
> in the future.
> schedule_on_each_cpu() can only be used when caller task don't have any lock.
> otherwise it may make deadlock.

I wasn't aware of that. Just to be clear, the deadlock you're referring 
to isn't the same one refered to in

commit b71ab8c2025caef8db719aa41af0ed735dc543cd
Author: Tejun Heo <tj@kernel.org>
Date:   Tue Jun 29 10:07:14 2010 +0200
workqueue: increase max_active of keventd and kill current_is_keventd()

and

commit 65a64464349883891e21e74af16c05d6e1eeb4e9
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Oct 14 06:22:47 2009 +0200
HWPOISON: Allow schedule_on_each_cpu() from keventd

If you're referencing some other deadlock, could you please provide a 
link to the relevant discussion? (I'd really like to add a note to 
schedule_on_each_cpu()'s doc comment about it so others can avoid that 
pitfall).


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine()
@ 2013-04-09  1:52               ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-09  1:52 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 03:18 PM, KOSAKI Motohiro wrote:
> (4/8/13 3:49 PM), Cody P Schafer wrote:>
>> If this turns out to be an issue, schedule_on_each_cpu() could be an
>> alternative.
>
> no way. schedule_on_each_cpu() is more problematic and it should be removed
> in the future.
> schedule_on_each_cpu() can only be used when caller task don't have any lock.
> otherwise it may make deadlock.

I wasn't aware of that. Just to be clear, the deadlock you're referring 
to isn't the same one refered to in

commit b71ab8c2025caef8db719aa41af0ed735dc543cd
Author: Tejun Heo <tj@kernel.org>
Date:   Tue Jun 29 10:07:14 2010 +0200
workqueue: increase max_active of keventd and kill current_is_keventd()

and

commit 65a64464349883891e21e74af16c05d6e1eeb4e9
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Oct 14 06:22:47 2009 +0200
HWPOISON: Allow schedule_on_each_cpu() from keventd

If you're referencing some other deadlock, could you please provide a 
link to the relevant discussion? (I'd really like to add a note to 
schedule_on_each_cpu()'s doc comment about it so others can avoid that 
pitfall).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
  2013-04-08 17:39       ` Cody P Schafer
@ 2013-04-09  5:42         ` Simon Jeons
  -1 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-09  5:42 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/09/2013 01:39 AM, Cody P Schafer wrote:
> On 04/06/2013 06:37 PM, Simon Jeons wrote:
>> Hi Cody,
>> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>>> Creates pageset_set_batch() for use in setup_pageset().
>>> pageset_set_batch() imitates the functionality of
>>> setup_pagelist_highmark(), but uses the boot time
>>> (percpu_pagelist_fraction == 0) calculations for determining ->high
>>
>> Why need adjust pcp->high, pcp->batch during system running? What's the
>> requirement?
>>
>
> There is currently a sysctl (which I patch later in this series) which 
> allows adjusting the ->high mark (and, indirectly, ->batch). 
> Additionally, memory hotplug changes ->high and ->batch due to the 
> zone size changing (essentially, zone->managed_pages and 
> zone->present_pages have changed) , meaning that zone_batchsize(), 
> which is used at boot to set ->batch and (indirectly) ->high has a 
> different output.

Thanks for your explain. I'm curious about this sysctl, when need adjust 
the ->high, ->batch during system running except memory hotplug which 
will change zone size?

>
> Note that in addition to the 2 users of this functionality mentioned 
> here, I'm currently working on anther resizer of zones (runtime NUMA 
> reconfiguration).
>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch.
@ 2013-04-09  5:42         ` Simon Jeons
  0 siblings, 0 replies; 60+ messages in thread
From: Simon Jeons @ 2013-04-09  5:42 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

Hi Cody,
On 04/09/2013 01:39 AM, Cody P Schafer wrote:
> On 04/06/2013 06:37 PM, Simon Jeons wrote:
>> Hi Cody,
>> On 04/06/2013 04:33 AM, Cody P Schafer wrote:
>>> Creates pageset_set_batch() for use in setup_pageset().
>>> pageset_set_batch() imitates the functionality of
>>> setup_pagelist_highmark(), but uses the boot time
>>> (percpu_pagelist_fraction == 0) calculations for determining ->high
>>
>> Why need adjust pcp->high, pcp->batch during system running? What's the
>> requirement?
>>
>
> There is currently a sysctl (which I patch later in this series) which 
> allows adjusting the ->high mark (and, indirectly, ->batch). 
> Additionally, memory hotplug changes ->high and ->batch due to the 
> zone size changing (essentially, zone->managed_pages and 
> zone->present_pages have changed) , meaning that zone_batchsize(), 
> which is used at boot to set ->batch and (indirectly) ->high has a 
> different output.

Thanks for your explain. I'm curious about this sysctl, when need adjust 
the ->high, ->batch during system running except memory hotplug which 
will change zone size?

>
> Note that in addition to the 2 users of this functionality mentioned 
> here, I'm currently working on anther resizer of zones (runtime NUMA 
> reconfiguration).
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-08 17:28       ` Cody P Schafer
@ 2013-04-09  6:03         ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-09  6:03 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Mon, Apr 8, 2013 at 8:28 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>>
>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com>
>> wrote:
>>>
>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>> Updating it without being on the cpu owning the percpu pageset
>>> potentially destroys this stability.
>>>
>>> Change for_each_cpu() to on_each_cpu() to fix.
>>
>>
>> Are you referring to this? -
>
>
> This was the case I noticed.
>
>
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>> 1331                 pcp->count -= pcp->batch;
>> 1332         }
>>
>> I'm probably missing the obvious but won't it be simpler to do this in
>>   free_hot_cold_page() -
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>> 1332                 pcp->count -= batch;
>> 1333         }
>>
>
> Potentially, yes. Note that this was simply the one case I noticed, rather
> than certainly the only case.

OK, so perhaps the right thing to do is to understand what are (some of) the
other cases so that we may choose the right solution.

> I also wonder whether there could be unexpected interactions between ->high
> and ->batch not changing together atomically. For example, could adjusting
> this knob cause ->batch to rise enough that it is greater than the previous
> ->high? If the code above then runs with the previous ->high, ->count
> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
> this one issue).

You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:

3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
3994                                 unsigned long high)
3995 {
3996         struct per_cpu_pages *pcp;
                unsigned int batch;
3997
3998         pcp = &p->pcp;
                /* We're about to mess with PCP in an non atomic fashion.
                   Put an intermediate safe value of batch and make sure it
                   is visible before any other change */
                pcp->batch = 1UL;
                smb_mb();

3999         pcp->high = high;

4000         batch = max(1UL, high/4);
4001         if ((high/4) > (PAGE_SHIFT * 8))
4002                 batch = PAGE_SHIFT * 8;

               pcp->batch = batch;
4003 }

Or we could use an RCU here, but that might be an overkill.

>
>
>> Now the batch value used is stable and you don't have to IPI every CPU
>> in the system just to change a config knob...
>
>
> Is this really considered an issue? I wouldn't have expected someone to
> adjust the config knob often enough (or even more than once) to cause
> problems. Of course as a "It'd be nice" thing, I completely agree.

Well, interfering unconditionally with other CPUs either via IPIs or
scheduling work
on them is a major headache for people that run work on machines with 4k CPUs,
especially the HPC or RT or combos from the finance and networking
users.

If this was the only little knob or trigger that does this, then maybe
it wont be so bad,
but the problem is there is a list of these little knobs and items
that potentially cause
cross machine interference, and the poor sys admin has to keep them
all in his or her
head: "Now, is it ok to pull this knob now, or will it cause an IPI s**t storm?"

We can never get rid of them all, but I'd really prefer to keep them
down to a minimum
if at all possible. Here, it looks to me that it is possible and that
the price is not great -
that is, the resulting code is not too hairy or none maintainable. At
least, that is how
it looks to me.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-09  6:03         ` Gilad Ben-Yossef
  0 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-09  6:03 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Mon, Apr 8, 2013 at 8:28 PM, Cody P Schafer <cody@linux.vnet.ibm.com> wrote:
> On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
>>
>> On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer <cody@linux.vnet.ibm.com>
>> wrote:
>>>
>>> In free_hot_cold_page(), we rely on pcp->batch remaining stable.
>>> Updating it without being on the cpu owning the percpu pageset
>>> potentially destroys this stability.
>>>
>>> Change for_each_cpu() to on_each_cpu() to fix.
>>
>>
>> Are you referring to this? -
>
>
> This was the case I noticed.
>
>
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                 free_pcppages_bulk(zone, pcp->batch, pcp);
>> 1331                 pcp->count -= pcp->batch;
>> 1332         }
>>
>> I'm probably missing the obvious but won't it be simpler to do this in
>>   free_hot_cold_page() -
>>
>> 1329         if (pcp->count >= pcp->high) {
>> 1330                  unsigned int batch = ACCESS_ONCE(pcp->batch);
>> 1331                 free_pcppages_bulk(zone, batch, pcp);
>> 1332                 pcp->count -= batch;
>> 1333         }
>>
>
> Potentially, yes. Note that this was simply the one case I noticed, rather
> than certainly the only case.

OK, so perhaps the right thing to do is to understand what are (some of) the
other cases so that we may choose the right solution.

> I also wonder whether there could be unexpected interactions between ->high
> and ->batch not changing together atomically. For example, could adjusting
> this knob cause ->batch to rise enough that it is greater than the previous
> ->high? If the code above then runs with the previous ->high, ->count
> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
> this one issue).

You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:

3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
3994                                 unsigned long high)
3995 {
3996         struct per_cpu_pages *pcp;
                unsigned int batch;
3997
3998         pcp = &p->pcp;
                /* We're about to mess with PCP in an non atomic fashion.
                   Put an intermediate safe value of batch and make sure it
                   is visible before any other change */
                pcp->batch = 1UL;
                smb_mb();

3999         pcp->high = high;

4000         batch = max(1UL, high/4);
4001         if ((high/4) > (PAGE_SHIFT * 8))
4002                 batch = PAGE_SHIFT * 8;

               pcp->batch = batch;
4003 }

Or we could use an RCU here, but that might be an overkill.

>
>
>> Now the batch value used is stable and you don't have to IPI every CPU
>> in the system just to change a config knob...
>
>
> Is this really considered an issue? I wouldn't have expected someone to
> adjust the config knob often enough (or even more than once) to cause
> problems. Of course as a "It'd be nice" thing, I completely agree.

Well, interfering unconditionally with other CPUs either via IPIs or
scheduling work
on them is a major headache for people that run work on machines with 4k CPUs,
especially the HPC or RT or combos from the finance and networking
users.

If this was the only little knob or trigger that does this, then maybe
it wont be so bad,
but the problem is there is a list of these little knobs and items
that potentially cause
cross machine interference, and the poor sys admin has to keep them
all in his or her
head: "Now, is it ok to pull this knob now, or will it cause an IPI s**t storm?"

We can never get rid of them all, but I'd really prefer to keep them
down to a minimum
if at all possible. Here, it looks to me that it is possible and that
the price is not great -
that is, the resulting code is not too hairy or none maintainable. At
least, that is how
it looks to me.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-09  6:03         ` Gilad Ben-Yossef
@ 2013-04-09  6:06           ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-09  6:06 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Tue, Apr 9, 2013 at 9:03 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:

>
>> I also wonder whether there could be unexpected interactions between ->high
>> and ->batch not changing together atomically. For example, could adjusting
>> this knob cause ->batch to rise enough that it is greater than the previous
>> ->high? If the code above then runs with the previous ->high, ->count
>> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
>> this one issue).
>
> You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:
>
> 3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
> 3994                                 unsigned long high)
> 3995 {
> 3996         struct per_cpu_pages *pcp;
>                 unsigned int batch;
> 3997
> 3998         pcp = &p->pcp;
>                 /* We're about to mess with PCP in an non atomic fashion.
>                    Put an intermediate safe value of batch and make sure it
>                    is visible before any other change */
>                 pcp->batch = 1UL;
>                 smb_mb();
>
> 3999         pcp->high = high;

and i think I missed another needed barrier here:
                  smp_mb();

>
> 4000         batch = max(1UL, high/4);
> 4001         if ((high/4) > (PAGE_SHIFT * 8))
> 4002                 batch = PAGE_SHIFT * 8;
>
>                pcp->batch = batch;
> 4003 }
>

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-09  6:06           ` Gilad Ben-Yossef
  0 siblings, 0 replies; 60+ messages in thread
From: Gilad Ben-Yossef @ 2013-04-09  6:06 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On Tue, Apr 9, 2013 at 9:03 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:

>
>> I also wonder whether there could be unexpected interactions between ->high
>> and ->batch not changing together atomically. For example, could adjusting
>> this knob cause ->batch to rise enough that it is greater than the previous
>> ->high? If the code above then runs with the previous ->high, ->count
>> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
>> this one issue).
>
> You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:
>
> 3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
> 3994                                 unsigned long high)
> 3995 {
> 3996         struct per_cpu_pages *pcp;
>                 unsigned int batch;
> 3997
> 3998         pcp = &p->pcp;
>                 /* We're about to mess with PCP in an non atomic fashion.
>                    Put an intermediate safe value of batch and make sure it
>                    is visible before any other change */
>                 pcp->batch = 1UL;
>                 smb_mb();
>
> 3999         pcp->high = high;

and i think I missed another needed barrier here:
                  smp_mb();

>
> 4000         batch = max(1UL, high/4);
> 4001         if ((high/4) > (PAGE_SHIFT * 8))
> 4002                 batch = PAGE_SHIFT * 8;
>
>                pcp->batch = batch;
> 4003 }
>

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
  2013-04-09  6:06           ` Gilad Ben-Yossef
@ 2013-04-09 19:27             ` Cody P Schafer
  -1 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-09 19:27 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 11:06 PM, Gilad Ben-Yossef wrote:
> On Tue, Apr 9, 2013 at 9:03 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>>
>>> I also wonder whether there could be unexpected interactions between ->high
>>> and ->batch not changing together atomically. For example, could adjusting
>>> this knob cause ->batch to rise enough that it is greater than the previous
>>> ->high? If the code above then runs with the previous ->high, ->count
>>> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
>>> this one issue).
>>
>> You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:
>>
>> 3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>> 3994                                 unsigned long high)
>> 3995 {
>> 3996         struct per_cpu_pages *pcp;
>>                  unsigned int batch;
>> 3997
>> 3998         pcp = &p->pcp;
>>                  /* We're about to mess with PCP in an non atomic fashion.
>>                     Put an intermediate safe value of batch and make sure it
>>                     is visible before any other change */
>>                  pcp->batch = 1UL;
>>                  smb_mb();
>>
>> 3999         pcp->high = high;
>
> and i think I missed another needed barrier here:
>                    smp_mb();
>
>>
>> 4000         batch = max(1UL, high/4);
>> 4001         if ((high/4) > (PAGE_SHIFT * 8))
>> 4002                 batch = PAGE_SHIFT * 8;
>>
>>                 pcp->batch = batch;
>> 4003 }
>>
>

Yep, that appears to work, provided no additional users of ->batch and 
->high show up. It seems we'll also need some locking to prevent 
concurrent updaters, but that is relatively light weight.

I'll roll up a new patchset that uses this methodology.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields.
@ 2013-04-09 19:27             ` Cody P Schafer
  0 siblings, 0 replies; 60+ messages in thread
From: Cody P Schafer @ 2013-04-09 19:27 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Andrew Morton, Mel Gorman, Linux MM, LKML

On 04/08/2013 11:06 PM, Gilad Ben-Yossef wrote:
> On Tue, Apr 9, 2013 at 9:03 AM, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>>
>>> I also wonder whether there could be unexpected interactions between ->high
>>> and ->batch not changing together atomically. For example, could adjusting
>>> this knob cause ->batch to rise enough that it is greater than the previous
>>> ->high? If the code above then runs with the previous ->high, ->count
>>> wouldn't be correct (checking this inside free_pcppages_bulk() might help on
>>> this one issue).
>>
>> You are right, but that can be treated in  setup_pagelist_highmark()  e.g.:
>>
>> 3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
>> 3994                                 unsigned long high)
>> 3995 {
>> 3996         struct per_cpu_pages *pcp;
>>                  unsigned int batch;
>> 3997
>> 3998         pcp = &p->pcp;
>>                  /* We're about to mess with PCP in an non atomic fashion.
>>                     Put an intermediate safe value of batch and make sure it
>>                     is visible before any other change */
>>                  pcp->batch = 1UL;
>>                  smb_mb();
>>
>> 3999         pcp->high = high;
>
> and i think I missed another needed barrier here:
>                    smp_mb();
>
>>
>> 4000         batch = max(1UL, high/4);
>> 4001         if ((high/4) > (PAGE_SHIFT * 8))
>> 4002                 batch = PAGE_SHIFT * 8;
>>
>>                 pcp->batch = batch;
>> 4003 }
>>
>

Yep, that appears to work, provided no additional users of ->batch and 
->high show up. It seems we'll also need some locking to prevent 
concurrent updaters, but that is relatively light weight.

I'll roll up a new patchset that uses this methodology.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2013-04-09 19:27 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-05 20:33 [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch Cody P Schafer
2013-04-05 20:33 ` Cody P Schafer
2013-04-05 20:33 ` [PATCH 1/3] mm/page_alloc: factor out setting of pcp->high and pcp->batch Cody P Schafer
2013-04-05 20:33   ` Cody P Schafer
2013-04-07  1:37   ` Simon Jeons
2013-04-07  1:37     ` Simon Jeons
2013-04-08 17:39     ` Cody P Schafer
2013-04-08 17:39       ` Cody P Schafer
2013-04-09  5:42       ` Simon Jeons
2013-04-09  5:42         ` Simon Jeons
2013-04-05 20:33 ` [PATCH 2/3] mm/page_alloc: convert zone_pcp_update() to use on_each_cpu() instead of stop_machine() Cody P Schafer
2013-04-05 20:33   ` Cody P Schafer
2013-04-07 15:39   ` KOSAKI Motohiro
2013-04-07 15:39     ` KOSAKI Motohiro
2013-04-08 17:32     ` Cody P Schafer
2013-04-08 17:32       ` Cody P Schafer
2013-04-08 19:26       ` KOSAKI Motohiro
2013-04-08 19:26         ` KOSAKI Motohiro
2013-04-08 19:49         ` Cody P Schafer
2013-04-08 19:49           ` Cody P Schafer
2013-04-08 22:18           ` KOSAKI Motohiro
2013-04-08 22:18             ` KOSAKI Motohiro
2013-04-09  1:52             ` Cody P Schafer
2013-04-09  1:52               ` Cody P Schafer
2013-04-05 20:33 ` [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu() to set percpu pageset fields Cody P Schafer
2013-04-05 20:33   ` Cody P Schafer
2013-04-07  1:56   ` Simon Jeons
2013-04-07  1:56     ` Simon Jeons
2013-04-08 17:34     ` Cody P Schafer
2013-04-08 17:34       ` Cody P Schafer
2013-04-07 15:41   ` KOSAKI Motohiro
2013-04-07 15:41     ` KOSAKI Motohiro
2013-04-08 12:20   ` Gilad Ben-Yossef
2013-04-08 12:20     ` Gilad Ben-Yossef
2013-04-08 17:28     ` Cody P Schafer
2013-04-08 17:28       ` Cody P Schafer
2013-04-08 19:50       ` Cody P Schafer
2013-04-08 19:50         ` Cody P Schafer
2013-04-08 22:23         ` KOSAKI Motohiro
2013-04-08 22:23           ` KOSAKI Motohiro
2013-04-09  6:03       ` Gilad Ben-Yossef
2013-04-09  6:03         ` Gilad Ben-Yossef
2013-04-09  6:06         ` Gilad Ben-Yossef
2013-04-09  6:06           ` Gilad Ben-Yossef
2013-04-09 19:27           ` Cody P Schafer
2013-04-09 19:27             ` Cody P Schafer
2013-04-08 19:16     ` KOSAKI Motohiro
2013-04-08 19:16       ` KOSAKI Motohiro
2013-04-07  1:32 ` [PATCH 0/3] mm: fixup changers of per cpu pageset's ->high and ->batch Simon Jeons
2013-04-07  1:32   ` Simon Jeons
2013-04-08 19:37   ` Cody P Schafer
2013-04-08 19:37     ` Cody P Schafer
2013-04-07 15:23 ` KOSAKI Motohiro
2013-04-07 15:23   ` KOSAKI Motohiro
2013-04-08 17:16   ` Cody P Schafer
2013-04-08 17:16     ` Cody P Schafer
2013-04-08 19:08     ` KOSAKI Motohiro
2013-04-08 19:08       ` KOSAKI Motohiro
2013-04-08 21:17       ` Cody P Schafer
2013-04-08 21:17         ` Cody P Schafer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.