All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] mm: sched: numa: several fixups
@ 2013-12-11  0:49 ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Hi Andrew,

I rebase this patchset against latest mmotm tree since Mel's [PATCH 00/17] 
NUMA balancing segmentation fault fixes and misc followups v4 merged. Several 
patches are dropped in my v5 since one is merged in tip tree and other three 
patches conflict with Mel's series. I have already picked up everybody's Acked-by
or Reviewed-by in v5 and hopefully they can be merged soon. ;-)

Wanpeng Li (8):
  sched/numa: fix set cpupid on page migration twice against thp
  sched/numa: drop sysctl_numa_balancing_settle_count sysctl
  sched/numa: use wrapper function task_node to get node which task is on
  sched/numa: fix set cpupid on page migration twice against normal page
  sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
  sched/numa: fix period_slot recalculation
  sched/numa: fix record hinting faults check
  sched/numa: drop unnecessary variable in task_weight

 include/linux/sched/sysctl.h |    1 -
 kernel/sched/debug.c         |    2 +-
 kernel/sched/fair.c          |   30 +++++++-----------------------
 kernel/sysctl.c              |    7 -------
 mm/migrate.c                 |    4 ----
 5 files changed, 8 insertions(+), 36 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 0/8] mm: sched: numa: several fixups
@ 2013-12-11  0:49 ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Hi Andrew,

I rebase this patchset against latest mmotm tree since Mel's [PATCH 00/17] 
NUMA balancing segmentation fault fixes and misc followups v4 merged. Several 
patches are dropped in my v5 since one is merged in tip tree and other three 
patches conflict with Mel's series. I have already picked up everybody's Acked-by
or Reviewed-by in v5 and hopefully they can be merged soon. ;-)

Wanpeng Li (8):
  sched/numa: fix set cpupid on page migration twice against thp
  sched/numa: drop sysctl_numa_balancing_settle_count sysctl
  sched/numa: use wrapper function task_node to get node which task is on
  sched/numa: fix set cpupid on page migration twice against normal page
  sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
  sched/numa: fix period_slot recalculation
  sched/numa: fix record hinting faults check
  sched/numa: drop unnecessary variable in task_weight

 include/linux/sched/sysctl.h |    1 -
 kernel/sched/debug.c         |    2 +-
 kernel/sched/fair.c          |   30 +++++++-----------------------
 kernel/sysctl.c              |    7 -------
 mm/migrate.c                 |    4 ----
 5 files changed, 8 insertions(+), 36 deletions(-)

-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 1/8] sched/numa: fix set cpupid on page migration twice against thp
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
the cpupid at page migration time, there is unnecessary to set it again
in function migrate_misplaced_transhuge_page, this patch fix it.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/migrate.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 8dc277d..b13e181 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1758,8 +1758,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	if (!new_page)
 		goto out_fail;
 
-	page_cpupid_xchg_last(new_page, page_cpupid_last(page));
-
 	isolated = numamigrate_isolate_page(pgdat, page);
 	if (!isolated) {
 		put_page(new_page);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 1/8] sched/numa: fix set cpupid on page migration twice against thp
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
the cpupid at page migration time, there is unnecessary to set it again
in function migrate_misplaced_transhuge_page, this patch fix it.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/migrate.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 8dc277d..b13e181 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1758,8 +1758,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	if (!new_page)
 		goto out_fail;
 
-	page_cpupid_xchg_last(new_page, page_cpupid_last(page));
-
 	isolated = numamigrate_isolate_page(pgdat, page);
 	if (!isolated) {
 		put_page(new_page);
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 2/8] sched/numa: drop sysctl_numa_balancing_settle_count sysctl
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 887c290e (sched/numa: Decide whether to favour task or group weights
based on swap candidate relationships) drop the check against
sysctl_numa_balancing_settle_count, this patch remove the sysctl.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 include/linux/sched/sysctl.h |    1 -
 kernel/sched/fair.c          |    9 ---------
 kernel/sysctl.c              |    7 -------
 3 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 41467f8..31e0193 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -48,7 +48,6 @@ extern unsigned int sysctl_numa_balancing_scan_delay;
 extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
-extern unsigned int sysctl_numa_balancing_settle_count;
 
 #ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_migration_cost;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 42bb745..57f28d0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -886,15 +886,6 @@ static unsigned int task_scan_max(struct task_struct *p)
 	return max(smin, smax);
 }
 
-/*
- * Once a preferred node is selected the scheduler balancer will prefer moving
- * a task to that node for sysctl_numa_balancing_settle_count number of PTE
- * scans. This will give the process the chance to accumulate more faults on
- * the preferred node but still allow the scheduler to move the task again if
- * the nodes CPUs are overloaded.
- */
-unsigned int sysctl_numa_balancing_settle_count __read_mostly = 4;
-
 static void account_numa_enqueue(struct rq *rq, struct task_struct *p)
 {
 	rq->nr_numa_running += (p->numa_preferred_nid != -1);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..c8da99f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -385,13 +385,6 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
-		.procname       = "numa_balancing_settle_count",
-		.data           = &sysctl_numa_balancing_settle_count,
-		.maxlen         = sizeof(unsigned int),
-		.mode           = 0644,
-		.proc_handler   = proc_dointvec,
-	},
-	{
 		.procname       = "numa_balancing_migrate_deferred",
 		.data           = &sysctl_numa_balancing_migrate_deferred,
 		.maxlen         = sizeof(unsigned int),
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 2/8] sched/numa: drop sysctl_numa_balancing_settle_count sysctl
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 887c290e (sched/numa: Decide whether to favour task or group weights
based on swap candidate relationships) drop the check against
sysctl_numa_balancing_settle_count, this patch remove the sysctl.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 include/linux/sched/sysctl.h |    1 -
 kernel/sched/fair.c          |    9 ---------
 kernel/sysctl.c              |    7 -------
 3 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 41467f8..31e0193 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -48,7 +48,6 @@ extern unsigned int sysctl_numa_balancing_scan_delay;
 extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
-extern unsigned int sysctl_numa_balancing_settle_count;
 
 #ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_migration_cost;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 42bb745..57f28d0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -886,15 +886,6 @@ static unsigned int task_scan_max(struct task_struct *p)
 	return max(smin, smax);
 }
 
-/*
- * Once a preferred node is selected the scheduler balancer will prefer moving
- * a task to that node for sysctl_numa_balancing_settle_count number of PTE
- * scans. This will give the process the chance to accumulate more faults on
- * the preferred node but still allow the scheduler to move the task again if
- * the nodes CPUs are overloaded.
- */
-unsigned int sysctl_numa_balancing_settle_count __read_mostly = 4;
-
 static void account_numa_enqueue(struct rq *rq, struct task_struct *p)
 {
 	rq->nr_numa_running += (p->numa_preferred_nid != -1);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..c8da99f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -385,13 +385,6 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
-		.procname       = "numa_balancing_settle_count",
-		.data           = &sysctl_numa_balancing_settle_count,
-		.maxlen         = sizeof(unsigned int),
-		.mode           = 0644,
-		.proc_handler   = proc_dointvec,
-	},
-	{
 		.procname       = "numa_balancing_migrate_deferred",
 		.data           = &sysctl_numa_balancing_migrate_deferred,
 		.maxlen         = sizeof(unsigned int),
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 3/8] sched/numa: use wrapper function task_node to get node which task is on
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Changelog:
 v2 -> v3:
  * tranlate cpu_to_node(task_cpu(p)) to task_node(p) in sched/debug.c

Use wrapper function task_node to get node which task is on.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/debug.c |    2 +-
 kernel/sched/fair.c  |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 5c34d18..374fe04 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -139,7 +139,7 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 		0LL, 0LL, 0LL, 0L, 0LL, 0L, 0LL, 0L);
 #endif
 #ifdef CONFIG_NUMA_BALANCING
-	SEQ_printf(m, " %d", cpu_to_node(task_cpu(p)));
+	SEQ_printf(m, " %d", task_node(p));
 #endif
 #ifdef CONFIG_CGROUP_SCHED
 	SEQ_printf(m, " %s", task_group_path(task_group(p)));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 57f28d0..c20d22f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1216,7 +1216,7 @@ static int task_numa_migrate(struct task_struct *p)
 	 * elsewhere, so there is no point in (re)trying.
 	 */
 	if (unlikely(!sd)) {
-		p->numa_preferred_nid = cpu_to_node(task_cpu(p));
+		p->numa_preferred_nid = task_node(p);
 		return -EINVAL;
 	}
 
@@ -1287,7 +1287,7 @@ static void numa_migrate_preferred(struct task_struct *p)
 	p->numa_migrate_retry = jiffies + HZ;
 
 	/* Success if task is already running on preferred CPU */
-	if (cpu_to_node(task_cpu(p)) == p->numa_preferred_nid)
+	if (task_node(p) == p->numa_preferred_nid)
 		return;
 
 	/* Otherwise, try migrate to a CPU on the preferred node */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 3/8] sched/numa: use wrapper function task_node to get node which task is on
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Changelog:
 v2 -> v3:
  * tranlate cpu_to_node(task_cpu(p)) to task_node(p) in sched/debug.c

Use wrapper function task_node to get node which task is on.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/debug.c |    2 +-
 kernel/sched/fair.c  |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 5c34d18..374fe04 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -139,7 +139,7 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 		0LL, 0LL, 0LL, 0L, 0LL, 0L, 0LL, 0L);
 #endif
 #ifdef CONFIG_NUMA_BALANCING
-	SEQ_printf(m, " %d", cpu_to_node(task_cpu(p)));
+	SEQ_printf(m, " %d", task_node(p));
 #endif
 #ifdef CONFIG_CGROUP_SCHED
 	SEQ_printf(m, " %s", task_group_path(task_group(p)));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 57f28d0..c20d22f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1216,7 +1216,7 @@ static int task_numa_migrate(struct task_struct *p)
 	 * elsewhere, so there is no point in (re)trying.
 	 */
 	if (unlikely(!sd)) {
-		p->numa_preferred_nid = cpu_to_node(task_cpu(p));
+		p->numa_preferred_nid = task_node(p);
 		return -EINVAL;
 	}
 
@@ -1287,7 +1287,7 @@ static void numa_migrate_preferred(struct task_struct *p)
 	p->numa_migrate_retry = jiffies + HZ;
 
 	/* Success if task is already running on preferred CPU */
-	if (cpu_to_node(task_cpu(p)) == p->numa_preferred_nid)
+	if (task_node(p) == p->numa_preferred_nid)
 		return;
 
 	/* Otherwise, try migrate to a CPU on the preferred node */
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 4/8] sched/numa: fix set cpupid on page migration twice against normal page
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
the cpupid at page migration time, there is unnecessary to set it again
in function alloc_misplaced_dst_page, this patch fix it.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/migrate.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b13e181..30ba8fb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1558,8 +1558,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 					  __GFP_NOMEMALLOC | __GFP_NORETRY |
 					  __GFP_NOWARN) &
 					 ~GFP_IOFS, 0);
-	if (newpage)
-		page_cpupid_xchg_last(newpage, page_cpupid_last(page));
 
 	return newpage;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 4/8] sched/numa: fix set cpupid on page migration twice against normal page
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
the cpupid at page migration time, there is unnecessary to set it again
in function alloc_misplaced_dst_page, this patch fix it.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/migrate.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b13e181..30ba8fb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1558,8 +1558,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 					  __GFP_NOMEMALLOC | __GFP_NORETRY |
 					  __GFP_NOWARN) &
 					 ~GFP_IOFS, 0);
-	if (newpage)
-		page_cpupid_xchg_last(newpage, page_cpupid_last(page));
 
 	return newpage;
 }
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 5/8] sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Use wrapper function task_faults_idx to calculate index in group_faults.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c20d22f..106a607 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -935,7 +935,8 @@ static inline unsigned long group_faults(struct task_struct *p, int nid)
 	if (!p->numa_group)
 		return 0;
 
-	return p->numa_group->faults[2*nid] + p->numa_group->faults[2*nid+1];
+	return p->numa_group->faults[task_faults_idx(nid, 0)] +
+		p->numa_group->faults[task_faults_idx(nid, 1)];
 }
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 5/8] sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Use wrapper function task_faults_idx to calculate index in group_faults.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c20d22f..106a607 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -935,7 +935,8 @@ static inline unsigned long group_faults(struct task_struct *p, int nid)
 	if (!p->numa_group)
 		return 0;
 
-	return p->numa_group->faults[2*nid] + p->numa_group->faults[2*nid+1];
+	return p->numa_group->faults[task_faults_idx(nid, 0)] +
+		p->numa_group->faults[task_faults_idx(nid, 1)];
 }
 
 /*
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 6/8] sched/numa: fix period_slot recalculation
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:49   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Changelog:
 v3 -> v4:
  * remove period_slot recalculation

The original code is as intended and was meant to scale the difference
between the NUMA_PERIOD_THRESHOLD and local/remote ratio when adjusting
the scan period. The period_slot recalculation can be dropped.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 106a607..ac5f1e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1360,7 +1360,6 @@ static void update_task_scan_period(struct task_struct *p,
 		 * scanning faster if shared accesses dominate as it may
 		 * simply bounce migrations uselessly
 		 */
-		period_slot = DIV_ROUND_UP(diff, NUMA_PERIOD_SLOTS);
 		ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
 		diff = (diff * ratio) / NUMA_PERIOD_SLOTS;
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 6/8] sched/numa: fix period_slot recalculation
@ 2013-12-11  0:49   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Changelog:
 v3 -> v4:
  * remove period_slot recalculation

The original code is as intended and was meant to scale the difference
between the NUMA_PERIOD_THRESHOLD and local/remote ratio when adjusting
the scan period. The period_slot recalculation can be dropped.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 106a607..ac5f1e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1360,7 +1360,6 @@ static void update_task_scan_period(struct task_struct *p,
 		 * scanning faster if shared accesses dominate as it may
 		 * simply bounce migrations uselessly
 		 */
-		period_slot = DIV_ROUND_UP(diff, NUMA_PERIOD_SLOTS);
 		ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
 		diff = (diff * ratio) / NUMA_PERIOD_SLOTS;
 	}
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 7/8] sched/numa: fix record hinting faults check
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:50   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Adjust numa_scan_period in task_numa_placement, depending on how much useful
work the numa code can do. The local faults and remote faults should be used
to check if there is record hinting faults instead of local faults and shared
faults. This patch fix it.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac5f1e7..f507e12 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1326,7 +1326,7 @@ static void update_task_scan_period(struct task_struct *p,
 	 * completely idle or all activity is areas that are not of interest
 	 * to automatic numa balancing. Scan slower
 	 */
-	if (local + shared == 0) {
+	if (local + remote == 0) {
 		p->numa_scan_period = min(p->numa_scan_period_max,
 			p->numa_scan_period << 1);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 7/8] sched/numa: fix record hinting faults check
@ 2013-12-11  0:50   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Adjust numa_scan_period in task_numa_placement, depending on how much useful
work the numa code can do. The local faults and remote faults should be used
to check if there is record hinting faults instead of local faults and shared
faults. This patch fix it.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac5f1e7..f507e12 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1326,7 +1326,7 @@ static void update_task_scan_period(struct task_struct *p,
 	 * completely idle or all activity is areas that are not of interest
 	 * to automatic numa balancing. Scan slower
 	 */
-	if (local + shared == 0) {
+	if (local + remote == 0) {
 		p->numa_scan_period = min(p->numa_scan_period_max,
 			p->numa_scan_period << 1);
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
  2013-12-11  0:49 ` Wanpeng Li
@ 2013-12-11  0:50   ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Drop unnecessary total_faults variable in function task_weight to unify
task_weight and group_weight.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f507e12..5c54837 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -947,17 +947,10 @@ static inline unsigned long group_faults(struct task_struct *p, int nid)
  */
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
-	unsigned long total_faults;
-
-	if (!p->numa_faults)
-		return 0;
-
-	total_faults = p->total_numa_faults;
-
-	if (!total_faults)
+	if (!p->numa_faults || !p->total_numa_faults)
 		return 0;
 
-	return 1000 * task_faults(p, nid) / total_faults;
+	return 1000 * task_faults(p, nid) / p->total_numa_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
@ 2013-12-11  0:50   ` Wanpeng Li
  0 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  0:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Rik van Riel, Mel Gorman, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm, Wanpeng Li

Drop unnecessary total_faults variable in function task_weight to unify
task_weight and group_weight.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f507e12..5c54837 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -947,17 +947,10 @@ static inline unsigned long group_faults(struct task_struct *p, int nid)
  */
 static inline unsigned long task_weight(struct task_struct *p, int nid)
 {
-	unsigned long total_faults;
-
-	if (!p->numa_faults)
-		return 0;
-
-	total_faults = p->total_numa_faults;
-
-	if (!total_faults)
+	if (!p->numa_faults || !p->total_numa_faults)
 		return 0;
 
-	return 1000 * task_faults(p, nid) / total_faults;
+	return 1000 * task_faults(p, nid) / p->total_numa_faults;
 }
 
 static inline unsigned long group_weight(struct task_struct *p, int nid)
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/8] sched/numa: fix set cpupid on page migration twice against normal page
  2013-12-11  0:49   ` Wanpeng Li
@ 2013-12-11  9:01     ` Mel Gorman
  -1 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:01 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:57AM +0800, Wanpeng Li wrote:
> commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
> the cpupid at page migration time, there is unnecessary to set it again
> in function alloc_misplaced_dst_page, this patch fix it.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

migratepages aops is not necessarily required to go through migrate_page_copy
but in practice all of them do and it's hard to imagine one that didn't
so.

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 4/8] sched/numa: fix set cpupid on page migration twice against normal page
@ 2013-12-11  9:01     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:01 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:57AM +0800, Wanpeng Li wrote:
> commit 7851a45cd3 (mm: numa: Copy cpupid on page migration) copy over
> the cpupid at page migration time, there is unnecessary to set it again
> in function alloc_misplaced_dst_page, this patch fix it.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

migratepages aops is not necessarily required to go through migrate_page_copy
but in practice all of them do and it's hard to imagine one that didn't
so.

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/8] sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
  2013-12-11  0:49   ` Wanpeng Li
@ 2013-12-11  9:02     ` Mel Gorman
  -1 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:58AM +0800, Wanpeng Li wrote:
> Use wrapper function task_faults_idx to calculate index in group_faults.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 5/8] sched/numa: use wrapper function task_faults_idx to calculate index in group_faults
@ 2013-12-11  9:02     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:58AM +0800, Wanpeng Li wrote:
> Use wrapper function task_faults_idx to calculate index in group_faults.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 6/8] sched/numa: fix period_slot recalculation
  2013-12-11  0:49   ` Wanpeng Li
@ 2013-12-11  9:02     ` Mel Gorman
  -1 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:59AM +0800, Wanpeng Li wrote:
> Changelog:
>  v3 -> v4:
>   * remove period_slot recalculation
> 
> The original code is as intended and was meant to scale the difference
> between the NUMA_PERIOD_THRESHOLD and local/remote ratio when adjusting
> the scan period. The period_slot recalculation can be dropped.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 6/8] sched/numa: fix period_slot recalculation
@ 2013-12-11  9:02     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:49:59AM +0800, Wanpeng Li wrote:
> Changelog:
>  v3 -> v4:
>   * remove period_slot recalculation
> 
> The original code is as intended and was meant to scale the difference
> between the NUMA_PERIOD_THRESHOLD and local/remote ratio when adjusting
> the scan period. The period_slot recalculation can be dropped.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/8] sched/numa: fix record hinting faults check
  2013-12-11  0:50   ` Wanpeng Li
@ 2013-12-11  9:14     ` Mel Gorman
  -1 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:14 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:50:00AM +0800, Wanpeng Li wrote:
> Adjust numa_scan_period in task_numa_placement, depending on how much useful
> work the numa code can do. The local faults and remote faults should be used
> to check if there is record hinting faults instead of local faults and shared
> faults. This patch fix it.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

This potentially has the side-effect of making it easier to reduce the
scan rate because it'll only take the most recent scan window into
account. The existing code takes recent shared accesses into account.
What sort of tests did you do on this patch and what was the result?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/8] sched/numa: fix record hinting faults check
@ 2013-12-11  9:14     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:14 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:50:00AM +0800, Wanpeng Li wrote:
> Adjust numa_scan_period in task_numa_placement, depending on how much useful
> work the numa code can do. The local faults and remote faults should be used
> to check if there is record hinting faults instead of local faults and shared
> faults. This patch fix it.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

This potentially has the side-effect of making it easier to reduce the
scan rate because it'll only take the most recent scan window into
account. The existing code takes recent shared accesses into account.
What sort of tests did you do on this patch and what was the result?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
  2013-12-11  0:50   ` Wanpeng Li
@ 2013-12-11  9:21     ` Mel Gorman
  -1 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:21 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:50:01AM +0800, Wanpeng Li wrote:
> Drop unnecessary total_faults variable in function task_weight to unify
> task_weight and group_weight.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Nak.

task_weight is called for tasks other than current. If p handles a fault
in parallel then it can drop to 0 between when it's checked and used to
divide resulting in an oops.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
@ 2013-12-11  9:21     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11  9:21 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 08:50:01AM +0800, Wanpeng Li wrote:
> Drop unnecessary total_faults variable in function task_weight to unify
> task_weight and group_weight.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Nak.

task_weight is called for tasks other than current. If p handles a fault
in parallel then it can drop to 0 between when it's checked and used to
divide resulting in an oops.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
  2013-12-11  9:21     ` Mel Gorman
  (?)
@ 2013-12-11  9:34     ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  9:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 09:21:23AM +0000, Mel Gorman wrote:
>On Wed, Dec 11, 2013 at 08:50:01AM +0800, Wanpeng Li wrote:
>> Drop unnecessary total_faults variable in function task_weight to unify
>> task_weight and group_weight.
>> 
>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>
>Nak.
>
>task_weight is called for tasks other than current. If p handles a fault
>in parallel then it can drop to 0 between when it's checked and used to
>divide resulting in an oops.

I see, thanks for your pointing out.

Regards,
Wanpeng Li 

>
>-- 
>Mel Gorman
>SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/8] sched/numa: fix record hinting faults check
  2013-12-11  9:14     ` Mel Gorman
  (?)
@ 2013-12-11  9:41     ` Wanpeng Li
  -1 siblings, 0 replies; 34+ messages in thread
From: Wanpeng Li @ 2013-12-11  9:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

Hi Mel,
On Wed, Dec 11, 2013 at 09:14:22AM +0000, Mel Gorman wrote:
>On Wed, Dec 11, 2013 at 08:50:00AM +0800, Wanpeng Li wrote:
>> Adjust numa_scan_period in task_numa_placement, depending on how much useful
>> work the numa code can do. The local faults and remote faults should be used
>> to check if there is record hinting faults instead of local faults and shared
>> faults. This patch fix it.
>> 
>> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>
>This potentially has the side-effect of making it easier to reduce the
>scan rate because it'll only take the most recent scan window into
>account. The existing code takes recent shared accesses into account.

The local/remote and share/private both accumulate the just finished
scan window, why takes the most recent scan window into account will 
reduce the scan rate than takes recent shared accesses into account?

>What sort of tests did you do on this patch and what was the result?

I find this by codes review, I can drop this patch if your point is
correct. ;-)

Regards,
Wanpeng Li 

>
>-- 
>Mel Gorman
>SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/8] sched/numa: fix record hinting faults check
       [not found]     ` <20131211094156.GB26093@hacker.(null)>
@ 2013-12-11 10:15         ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11 10:15 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 05:41:56PM +0800, Wanpeng Li wrote:
> Hi Mel,
> On Wed, Dec 11, 2013 at 09:14:22AM +0000, Mel Gorman wrote:
> >On Wed, Dec 11, 2013 at 08:50:00AM +0800, Wanpeng Li wrote:
> >> Adjust numa_scan_period in task_numa_placement, depending on how much useful
> >> work the numa code can do. The local faults and remote faults should be used
> >> to check if there is record hinting faults instead of local faults and shared
> >> faults. This patch fix it.
> >> 
> >> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> >
> >This potentially has the side-effect of making it easier to reduce the
> >scan rate because it'll only take the most recent scan window into
> >account. The existing code takes recent shared accesses into account.
> 
> The local/remote and share/private both accumulate the just finished
> scan window, why takes the most recent scan window into account will 
> reduce the scan rate than takes recent shared accesses into account?
> 

Ok, shoddy reasoning and explanation on my part. It was the second question
I really cared about -- was this tested? It wasn't and this patch is
surprisingly subtle.

The intent of the code was to check "is this processes recent activity
of interest to automatic numa balancing?"

If it's incurring local faults, then it's interesting.

If it's sharing faults then it is interesting. Shared accesses are
inherently dirty data because it is racing with other threads to be the
first to trap the hinting fault.

The current code takes those points into account and decides to slow
scanning on that basis. The change to using remote accesses is not
equivalent. The change is not necessarily better or worse because it's
workload dependant. It's just different and should be supported by more
detailed reasoning than either you or I are giving it right now. It could
also be argued that it should also be taking remote accesses into account
but again, it is a subtle patch that would require a bit of backup.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 7/8] sched/numa: fix record hinting faults check
@ 2013-12-11 10:15         ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2013-12-11 10:15 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Ingo Molnar, Rik van Riel, Peter Zijlstra,
	Naoya Horiguchi, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 05:41:56PM +0800, Wanpeng Li wrote:
> Hi Mel,
> On Wed, Dec 11, 2013 at 09:14:22AM +0000, Mel Gorman wrote:
> >On Wed, Dec 11, 2013 at 08:50:00AM +0800, Wanpeng Li wrote:
> >> Adjust numa_scan_period in task_numa_placement, depending on how much useful
> >> work the numa code can do. The local faults and remote faults should be used
> >> to check if there is record hinting faults instead of local faults and shared
> >> faults. This patch fix it.
> >> 
> >> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> >
> >This potentially has the side-effect of making it easier to reduce the
> >scan rate because it'll only take the most recent scan window into
> >account. The existing code takes recent shared accesses into account.
> 
> The local/remote and share/private both accumulate the just finished
> scan window, why takes the most recent scan window into account will 
> reduce the scan rate than takes recent shared accesses into account?
> 

Ok, shoddy reasoning and explanation on my part. It was the second question
I really cared about -- was this tested? It wasn't and this patch is
surprisingly subtle.

The intent of the code was to check "is this processes recent activity
of interest to automatic numa balancing?"

If it's incurring local faults, then it's interesting.

If it's sharing faults then it is interesting. Shared accesses are
inherently dirty data because it is racing with other threads to be the
first to trap the hinting fault.

The current code takes those points into account and decides to slow
scanning on that basis. The change to using remote accesses is not
equivalent. The change is not necessarily better or worse because it's
workload dependant. It's just different and should be supported by more
detailed reasoning than either you or I are giving it right now. It could
also be argued that it should also be taking remote accesses into account
but again, it is a subtle patch that would require a bit of backup.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
  2013-12-11  9:21     ` Mel Gorman
@ 2013-12-11 14:50       ` Naoya Horiguchi
  -1 siblings, 0 replies; 34+ messages in thread
From: Naoya Horiguchi @ 2013-12-11 14:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Wanpeng Li, Andrew Morton, Ingo Molnar, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 09:21:23AM +0000, Mel Gorman wrote:
> On Wed, Dec 11, 2013 at 08:50:01AM +0800, Wanpeng Li wrote:
> > Drop unnecessary total_faults variable in function task_weight to unify
> > task_weight and group_weight.
> > 
> > Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> 
> Nak.
> 
> task_weight is called for tasks other than current. If p handles a fault
> in parallel then it can drop to 0 between when it's checked and used to
> divide resulting in an oops.

So we have the same race on group_weight(), and we have to add a local
variable to store p->numa_group->total_faults?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight
@ 2013-12-11 14:50       ` Naoya Horiguchi
  0 siblings, 0 replies; 34+ messages in thread
From: Naoya Horiguchi @ 2013-12-11 14:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Wanpeng Li, Andrew Morton, Ingo Molnar, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Wed, Dec 11, 2013 at 09:21:23AM +0000, Mel Gorman wrote:
> On Wed, Dec 11, 2013 at 08:50:01AM +0800, Wanpeng Li wrote:
> > Drop unnecessary total_faults variable in function task_weight to unify
> > task_weight and group_weight.
> > 
> > Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> 
> Nak.
> 
> task_weight is called for tasks other than current. If p handles a fault
> in parallel then it can drop to 0 between when it's checked and used to
> divide resulting in an oops.

So we have the same race on group_weight(), and we have to add a local
variable to store p->numa_group->total_faults?

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-12-11 14:51 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11  0:49 [PATCH v5 0/8] mm: sched: numa: several fixups Wanpeng Li
2013-12-11  0:49 ` Wanpeng Li
2013-12-11  0:49 ` [PATCH v5 1/8] sched/numa: fix set cpupid on page migration twice against thp Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  0:49 ` [PATCH v5 2/8] sched/numa: drop sysctl_numa_balancing_settle_count sysctl Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  0:49 ` [PATCH v5 3/8] sched/numa: use wrapper function task_node to get node which task is on Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  0:49 ` [PATCH v5 4/8] sched/numa: fix set cpupid on page migration twice against normal page Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  9:01   ` Mel Gorman
2013-12-11  9:01     ` Mel Gorman
2013-12-11  0:49 ` [PATCH v5 5/8] sched/numa: use wrapper function task_faults_idx to calculate index in group_faults Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  9:02   ` Mel Gorman
2013-12-11  9:02     ` Mel Gorman
2013-12-11  0:49 ` [PATCH v5 6/8] sched/numa: fix period_slot recalculation Wanpeng Li
2013-12-11  0:49   ` Wanpeng Li
2013-12-11  9:02   ` Mel Gorman
2013-12-11  9:02     ` Mel Gorman
2013-12-11  0:50 ` [PATCH v5 7/8] sched/numa: fix record hinting faults check Wanpeng Li
2013-12-11  0:50   ` Wanpeng Li
2013-12-11  9:14   ` Mel Gorman
2013-12-11  9:14     ` Mel Gorman
2013-12-11  9:41     ` Wanpeng Li
     [not found]     ` <20131211094156.GB26093@hacker.(null)>
2013-12-11 10:15       ` Mel Gorman
2013-12-11 10:15         ` Mel Gorman
2013-12-11  0:50 ` [PATCH v5 8/8] sched/numa: drop unnecessary variable in task_weight Wanpeng Li
2013-12-11  0:50   ` Wanpeng Li
2013-12-11  9:21   ` Mel Gorman
2013-12-11  9:21     ` Mel Gorman
2013-12-11  9:34     ` Wanpeng Li
2013-12-11 14:50     ` Naoya Horiguchi
2013-12-11 14:50       ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.