All of lore.kernel.org
 help / color / mirror / Atom feed
* vmstat: On demand vmstat workers V8
@ 2014-07-10 14:04 ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-10 14:04 UTC (permalink / raw)
  To: akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz


V7->V8
- hackbench regression test shows a tiny performance increase due
  to reduced OS processing.
- Rediff against 3.16-rc4.

V6->V7
- Remove /sysfs support.

V5->V6:
- Shepherd thread as a general worker thread. This means
  that the general mechanism to control worker thread
  cpu use by Frederic Weisbecker is necessary to
  restrict the shepherd thread to the cpus not used
  for low latency tasks. Hopefully that is ready to be
  merged soon. No need anymore to have a specific
  cpu be the housekeeper cpu.

V4->V5:
- Shepherd thread on a specific cpu (HOUSEKEEPING_CPU).
- Incorporate Andrew's feedback
- Work out the races.
- Make visible which CPUs have stat updates switched off
  in /sys/devices/system/cpu/stat_off

V3->V4:
- Make the shepherd task not deferrable. It runs on the tick cpu
  anyways. Deferral could get deltas too far out of sync if
  vmstat operations are disabled for a certain processor.

V2->V3:
- Introduce a new tick_get_housekeeping_cpu() function. Not sure
  if that is exactly what we want but it is a start. Thomas?
- Migrate the shepherd task if the output of
  tick_get_housekeeping_cpu() changes.
- Fixes recommended by Andrew.

V1->V2:
- Optimize the need_update check by using memchr_inv.
- Clean up.

vmstat workers are used for folding counter differentials into the
zone, per node and global counters at certain time intervals.
They currently run at defined intervals on all processors which will
cause some holdoff for processors that need minimal intrusion by the
OS.

The current vmstat_update mechanism depends on a deferrable timer
firing every other second by default which registers a work queue item
that runs on the local CPU, with the result that we have 1 interrupt
and one additional schedulable task on each CPU every 2 seconds
If a workload indeed causes VM activity or multiple tasks are running
on a CPU, then there are probably bigger issues to deal with.

However, some workloads dedicate a CPU for a single CPU bound task.
This is done in high performance computing, in high frequency
financial applications, in networking (Intel DPDK, EZchip NPS) and with
the advent of systems with more and more CPUs over time, this may become
more and more common to do since when one has enough CPUs
one cares less about efficiently sharing a CPU with other tasks and
more about efficiently monopolizing a CPU per task.

The difference of having this timer firing and workqueue kernel thread
scheduled per second can be enormous. An artificial test measuring the
worst case time to do a simple "i++" in an endless loop on a bare metal
system and under Linux on an isolated CPU with dynticks and with and
without this patch, have Linux match the bare metal performance
(~700 cycles) with this patch and loose by couple of orders of magnitude
(~200k cycles) without it[*].  The loss occurs for something that just
calculates statistics. For networking applications, for example, this
could be the difference between dropping packets or sustaining line rate.

Statistics are important and useful, but it would be great if there
would be a way to not cause statistics gathering produce a huge
performance difference. This patche does just that.

This patch creates a vmstat shepherd worker that monitors the
per cpu differentials on all processors. If there are differentials
on a processor then a vmstat worker local to the processors
with the differentials is created. That worker will then start
folding the diffs in regular intervals. Should the worker
find that there is no work to be done then it will make the shepherd
worker monitor the differentials again.

With this patch it is possible then to have periods longer than
2 seconds without any OS event on a "cpu" (hardware thread).

The patch shows a very minor increased in system performance.


hackbench -s 512 -l 2000 -g 15 -f 25 -P

Results before the patch:

Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.992
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.971
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 5.063

Hackbench after the patch:

Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.973
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.990
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.993



Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Christoph Lameter <cl@linux.com>


Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-07 10:15:01.790099463 -0500
+++ linux/mm/vmstat.c	2014-07-07 10:17:17.397891143 -0500
@@ -7,6 +7,7 @@
  *  zoned VM statistics
  *  Copyright (C) 2006 Silicon Graphics, Inc.,
  *		Christoph Lameter <christoph@lameter.com>
+ *  Copyright (C) 2008-2014 Christoph Lameter
  */
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -14,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
@@ -419,13 +421,22 @@ void dec_zone_page_state(struct page *pa
 EXPORT_SYMBOL(dec_zone_page_state);
 #endif

-static inline void fold_diff(int *diff)
+
+/*
+ * Fold a differential into the global counters.
+ * Returns the number of counters updated.
+ */
+static int fold_diff(int *diff)
 {
 	int i;
+	int changes = 0;

 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
-		if (diff[i])
+		if (diff[i]) {
 			atomic_long_add(diff[i], &vm_stat[i]);
+			changes++;
+	}
+	return changes;
 }

 /*
@@ -441,12 +452,15 @@ static inline void fold_diff(int *diff)
  * statistics in the remote zone struct as well as the global cachelines
  * with the global counters. These could cause remote node cache line
  * bouncing and will have to be only done when necessary.
+ *
+ * The function returns the number of global counters updated.
  */
-static void refresh_cpu_vm_stats(void)
+static int refresh_cpu_vm_stats(void)
 {
 	struct zone *zone;
 	int i;
 	int global_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
+	int changes = 0;

 	for_each_populated_zone(zone) {
 		struct per_cpu_pageset __percpu *p = zone->pageset;
@@ -486,15 +500,17 @@ static void refresh_cpu_vm_stats(void)
 			continue;
 		}

-
 		if (__this_cpu_dec_return(p->expire))
 			continue;

-		if (__this_cpu_read(p->pcp.count))
+		if (__this_cpu_read(p->pcp.count)) {
 			drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
+			changes++;
+		}
 #endif
 	}
-	fold_diff(global_diff);
+	changes += fold_diff(global_diff);
+	return changes;
 }

 /*
@@ -1228,20 +1244,105 @@ static const struct file_operations proc
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
+struct cpumask *cpu_stat_off;

 static void vmstat_update(struct work_struct *w)
 {
-	refresh_cpu_vm_stats();
-	schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+	if (refresh_cpu_vm_stats())
+		/*
+		 * Counters were updated so we expect more updates
+		 * to occur in the future. Keep on running the
+		 * update worker thread.
+		 */
+		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+			round_jiffies_relative(sysctl_stat_interval));
+	else {
+		/*
+		 * We did not update any counters so the app may be in
+		 * a mode where it does not cause counter updates.
+		 * We may be uselessly running vmstat_update.
+		 * Defer the checking for differentials to the
+		 * shepherd thread on a different processor.
+		 */
+		int r;
+		/*
+		 * Shepherd work thread does not race since it never
+		 * changes the bit if its zero but the cpu
+		 * online / off line code may race if
+		 * worker threads are still allowed during
+		 * shutdown / startup.
+		 */
+		r = cpumask_test_and_set_cpu(smp_processor_id(),
+			cpu_stat_off);
+		VM_BUG_ON(r);
+	}
+}
+
+/*
+ * Check if the diffs for a certain cpu indicate that
+ * an update is needed.
+ */
+static bool need_update(int cpu)
+{
+	struct zone *zone;
+
+	for_each_populated_zone(zone) {
+		struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
+
+		BUILD_BUG_ON(sizeof(p->vm_stat_diff[0]) != 1);
+		/*
+		 * The fast way of checking if there are any vmstat diffs.
+		 * This works because the diffs are byte sized items.
+		 */
+		if (memchr_inv(p->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS))
+			return true;
+
+	}
+	return false;
+}
+
+
+/*
+ * Shepherd worker thread that checks the
+ * differentials of processors that have their worker
+ * threads for vm statistics updates disabled because of
+ * inactivity.
+ */
+static void vmstat_shepherd(struct work_struct *w);
+
+static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+
+static void vmstat_shepherd(struct work_struct *w)
+{
+	int cpu;
+
+	/* Check processors whose vmstat worker threads have been disabled */
+	for_each_cpu(cpu, cpu_stat_off)
+		if (need_update(cpu) &&
+			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+
+			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
+				__round_jiffies_relative(sysctl_stat_interval, cpu));
+
+
+	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
+
 }

-static void start_cpu_timer(int cpu)
+static void __init start_shepherd_timer(void)
 {
-	struct delayed_work *work = &per_cpu(vmstat_work, cpu);
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
+			vmstat_update);
+
+	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
+	cpumask_copy(cpu_stat_off, cpu_online_mask);

-	INIT_DEFERRABLE_WORK(work, vmstat_update);
-	schedule_delayed_work_on(cpu, work, __round_jiffies_relative(HZ, cpu));
+	schedule_delayed_work(&shepherd,
+		round_jiffies_relative(sysctl_stat_interval));
 }

 static void vmstat_cpu_dead(int node)
@@ -1272,17 +1373,17 @@ static int vmstat_cpuup_callback(struct
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
 		refresh_zone_stat_thresholds();
-		start_cpu_timer(cpu);
 		node_set_state(cpu_to_node(cpu), N_CPU);
+		cpumask_set_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
-		per_cpu(vmstat_work, cpu).work.func = NULL;
+		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
+			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
-		start_cpu_timer(cpu);
+		cpumask_set_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
@@ -1302,15 +1403,10 @@ static struct notifier_block vmstat_noti
 static int __init setup_vmstat(void)
 {
 #ifdef CONFIG_SMP
-	int cpu;
-
 	cpu_notifier_register_begin();
 	__register_cpu_notifier(&vmstat_notifier);

-	for_each_online_cpu(cpu) {
-		start_cpu_timer(cpu);
-		node_set_state(cpu_to_node(cpu), N_CPU);
-	}
+	start_shepherd_timer();
 	cpu_notifier_register_done();
 #endif
 #ifdef CONFIG_PROC_FS

^ permalink raw reply	[flat|nested] 73+ messages in thread

* vmstat: On demand vmstat workers V8
@ 2014-07-10 14:04 ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-10 14:04 UTC (permalink / raw)
  To: akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz


V7->V8
- hackbench regression test shows a tiny performance increase due
  to reduced OS processing.
- Rediff against 3.16-rc4.

V6->V7
- Remove /sysfs support.

V5->V6:
- Shepherd thread as a general worker thread. This means
  that the general mechanism to control worker thread
  cpu use by Frederic Weisbecker is necessary to
  restrict the shepherd thread to the cpus not used
  for low latency tasks. Hopefully that is ready to be
  merged soon. No need anymore to have a specific
  cpu be the housekeeper cpu.

V4->V5:
- Shepherd thread on a specific cpu (HOUSEKEEPING_CPU).
- Incorporate Andrew's feedback
- Work out the races.
- Make visible which CPUs have stat updates switched off
  in /sys/devices/system/cpu/stat_off

V3->V4:
- Make the shepherd task not deferrable. It runs on the tick cpu
  anyways. Deferral could get deltas too far out of sync if
  vmstat operations are disabled for a certain processor.

V2->V3:
- Introduce a new tick_get_housekeeping_cpu() function. Not sure
  if that is exactly what we want but it is a start. Thomas?
- Migrate the shepherd task if the output of
  tick_get_housekeeping_cpu() changes.
- Fixes recommended by Andrew.

V1->V2:
- Optimize the need_update check by using memchr_inv.
- Clean up.

vmstat workers are used for folding counter differentials into the
zone, per node and global counters at certain time intervals.
They currently run at defined intervals on all processors which will
cause some holdoff for processors that need minimal intrusion by the
OS.

The current vmstat_update mechanism depends on a deferrable timer
firing every other second by default which registers a work queue item
that runs on the local CPU, with the result that we have 1 interrupt
and one additional schedulable task on each CPU every 2 seconds
If a workload indeed causes VM activity or multiple tasks are running
on a CPU, then there are probably bigger issues to deal with.

However, some workloads dedicate a CPU for a single CPU bound task.
This is done in high performance computing, in high frequency
financial applications, in networking (Intel DPDK, EZchip NPS) and with
the advent of systems with more and more CPUs over time, this may become
more and more common to do since when one has enough CPUs
one cares less about efficiently sharing a CPU with other tasks and
more about efficiently monopolizing a CPU per task.

The difference of having this timer firing and workqueue kernel thread
scheduled per second can be enormous. An artificial test measuring the
worst case time to do a simple "i++" in an endless loop on a bare metal
system and under Linux on an isolated CPU with dynticks and with and
without this patch, have Linux match the bare metal performance
(~700 cycles) with this patch and loose by couple of orders of magnitude
(~200k cycles) without it[*].  The loss occurs for something that just
calculates statistics. For networking applications, for example, this
could be the difference between dropping packets or sustaining line rate.

Statistics are important and useful, but it would be great if there
would be a way to not cause statistics gathering produce a huge
performance difference. This patche does just that.

This patch creates a vmstat shepherd worker that monitors the
per cpu differentials on all processors. If there are differentials
on a processor then a vmstat worker local to the processors
with the differentials is created. That worker will then start
folding the diffs in regular intervals. Should the worker
find that there is no work to be done then it will make the shepherd
worker monitor the differentials again.

With this patch it is possible then to have periods longer than
2 seconds without any OS event on a "cpu" (hardware thread).

The patch shows a very minor increased in system performance.


hackbench -s 512 -l 2000 -g 15 -f 25 -P

Results before the patch:

Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.992
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.971
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 5.063

Hackbench after the patch:

Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.973
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.990
Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
Each sender will pass 2000 messages of 512 bytes
Time: 4.993



Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Christoph Lameter <cl@linux.com>


Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-07 10:15:01.790099463 -0500
+++ linux/mm/vmstat.c	2014-07-07 10:17:17.397891143 -0500
@@ -7,6 +7,7 @@
  *  zoned VM statistics
  *  Copyright (C) 2006 Silicon Graphics, Inc.,
  *		Christoph Lameter <christoph@lameter.com>
+ *  Copyright (C) 2008-2014 Christoph Lameter
  */
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -14,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
@@ -419,13 +421,22 @@ void dec_zone_page_state(struct page *pa
 EXPORT_SYMBOL(dec_zone_page_state);
 #endif

-static inline void fold_diff(int *diff)
+
+/*
+ * Fold a differential into the global counters.
+ * Returns the number of counters updated.
+ */
+static int fold_diff(int *diff)
 {
 	int i;
+	int changes = 0;

 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
-		if (diff[i])
+		if (diff[i]) {
 			atomic_long_add(diff[i], &vm_stat[i]);
+			changes++;
+	}
+	return changes;
 }

 /*
@@ -441,12 +452,15 @@ static inline void fold_diff(int *diff)
  * statistics in the remote zone struct as well as the global cachelines
  * with the global counters. These could cause remote node cache line
  * bouncing and will have to be only done when necessary.
+ *
+ * The function returns the number of global counters updated.
  */
-static void refresh_cpu_vm_stats(void)
+static int refresh_cpu_vm_stats(void)
 {
 	struct zone *zone;
 	int i;
 	int global_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
+	int changes = 0;

 	for_each_populated_zone(zone) {
 		struct per_cpu_pageset __percpu *p = zone->pageset;
@@ -486,15 +500,17 @@ static void refresh_cpu_vm_stats(void)
 			continue;
 		}

-
 		if (__this_cpu_dec_return(p->expire))
 			continue;

-		if (__this_cpu_read(p->pcp.count))
+		if (__this_cpu_read(p->pcp.count)) {
 			drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
+			changes++;
+		}
 #endif
 	}
-	fold_diff(global_diff);
+	changes += fold_diff(global_diff);
+	return changes;
 }

 /*
@@ -1228,20 +1244,105 @@ static const struct file_operations proc
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
+struct cpumask *cpu_stat_off;

 static void vmstat_update(struct work_struct *w)
 {
-	refresh_cpu_vm_stats();
-	schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+	if (refresh_cpu_vm_stats())
+		/*
+		 * Counters were updated so we expect more updates
+		 * to occur in the future. Keep on running the
+		 * update worker thread.
+		 */
+		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+			round_jiffies_relative(sysctl_stat_interval));
+	else {
+		/*
+		 * We did not update any counters so the app may be in
+		 * a mode where it does not cause counter updates.
+		 * We may be uselessly running vmstat_update.
+		 * Defer the checking for differentials to the
+		 * shepherd thread on a different processor.
+		 */
+		int r;
+		/*
+		 * Shepherd work thread does not race since it never
+		 * changes the bit if its zero but the cpu
+		 * online / off line code may race if
+		 * worker threads are still allowed during
+		 * shutdown / startup.
+		 */
+		r = cpumask_test_and_set_cpu(smp_processor_id(),
+			cpu_stat_off);
+		VM_BUG_ON(r);
+	}
+}
+
+/*
+ * Check if the diffs for a certain cpu indicate that
+ * an update is needed.
+ */
+static bool need_update(int cpu)
+{
+	struct zone *zone;
+
+	for_each_populated_zone(zone) {
+		struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
+
+		BUILD_BUG_ON(sizeof(p->vm_stat_diff[0]) != 1);
+		/*
+		 * The fast way of checking if there are any vmstat diffs.
+		 * This works because the diffs are byte sized items.
+		 */
+		if (memchr_inv(p->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS))
+			return true;
+
+	}
+	return false;
+}
+
+
+/*
+ * Shepherd worker thread that checks the
+ * differentials of processors that have their worker
+ * threads for vm statistics updates disabled because of
+ * inactivity.
+ */
+static void vmstat_shepherd(struct work_struct *w);
+
+static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+
+static void vmstat_shepherd(struct work_struct *w)
+{
+	int cpu;
+
+	/* Check processors whose vmstat worker threads have been disabled */
+	for_each_cpu(cpu, cpu_stat_off)
+		if (need_update(cpu) &&
+			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+
+			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
+				__round_jiffies_relative(sysctl_stat_interval, cpu));
+
+
+	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
+
 }

-static void start_cpu_timer(int cpu)
+static void __init start_shepherd_timer(void)
 {
-	struct delayed_work *work = &per_cpu(vmstat_work, cpu);
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
+			vmstat_update);
+
+	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
+	cpumask_copy(cpu_stat_off, cpu_online_mask);

-	INIT_DEFERRABLE_WORK(work, vmstat_update);
-	schedule_delayed_work_on(cpu, work, __round_jiffies_relative(HZ, cpu));
+	schedule_delayed_work(&shepherd,
+		round_jiffies_relative(sysctl_stat_interval));
 }

 static void vmstat_cpu_dead(int node)
@@ -1272,17 +1373,17 @@ static int vmstat_cpuup_callback(struct
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
 		refresh_zone_stat_thresholds();
-		start_cpu_timer(cpu);
 		node_set_state(cpu_to_node(cpu), N_CPU);
+		cpumask_set_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
-		per_cpu(vmstat_work, cpu).work.func = NULL;
+		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
+			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
-		start_cpu_timer(cpu);
+		cpumask_set_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
@@ -1302,15 +1403,10 @@ static struct notifier_block vmstat_noti
 static int __init setup_vmstat(void)
 {
 #ifdef CONFIG_SMP
-	int cpu;
-
 	cpu_notifier_register_begin();
 	__register_cpu_notifier(&vmstat_notifier);

-	for_each_online_cpu(cpu) {
-		start_cpu_timer(cpu);
-		node_set_state(cpu_to_node(cpu), N_CPU);
-	}
+	start_shepherd_timer();
 	cpu_notifier_register_done();
 #endif
 #ifdef CONFIG_PROC_FS

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-10 14:04 ` Christoph Lameter
@ 2014-07-11 13:20   ` Frederic Weisbecker
  -1 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 13:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Thu, Jul 10, 2014 at 09:04:55AM -0500, Christoph Lameter wrote:
> 
> V7->V8
> - hackbench regression test shows a tiny performance increase due
>   to reduced OS processing.
> - Rediff against 3.16-rc4.
> 
> V6->V7
> - Remove /sysfs support.
> 
> V5->V6:
> - Shepherd thread as a general worker thread. This means
>   that the general mechanism to control worker thread
>   cpu use by Frederic Weisbecker is necessary to
>   restrict the shepherd thread to the cpus not used
>   for low latency tasks. Hopefully that is ready to be
>   merged soon. No need anymore to have a specific
>   cpu be the housekeeper cpu.
> 
> V4->V5:
> - Shepherd thread on a specific cpu (HOUSEKEEPING_CPU).
> - Incorporate Andrew's feedback
> - Work out the races.
> - Make visible which CPUs have stat updates switched off
>   in /sys/devices/system/cpu/stat_off
> 
> V3->V4:
> - Make the shepherd task not deferrable. It runs on the tick cpu
>   anyways. Deferral could get deltas too far out of sync if
>   vmstat operations are disabled for a certain processor.
> 
> V2->V3:
> - Introduce a new tick_get_housekeeping_cpu() function. Not sure
>   if that is exactly what we want but it is a start. Thomas?
> - Migrate the shepherd task if the output of
>   tick_get_housekeeping_cpu() changes.
> - Fixes recommended by Andrew.
> 
> V1->V2:
> - Optimize the need_update check by using memchr_inv.
> - Clean up.
> 
> vmstat workers are used for folding counter differentials into the
> zone, per node and global counters at certain time intervals.
> They currently run at defined intervals on all processors which will
> cause some holdoff for processors that need minimal intrusion by the
> OS.
> 
> The current vmstat_update mechanism depends on a deferrable timer
> firing every other second by default which registers a work queue item
> that runs on the local CPU, with the result that we have 1 interrupt
> and one additional schedulable task on each CPU every 2 seconds
> If a workload indeed causes VM activity or multiple tasks are running
> on a CPU, then there are probably bigger issues to deal with.
> 
> However, some workloads dedicate a CPU for a single CPU bound task.
> This is done in high performance computing, in high frequency
> financial applications, in networking (Intel DPDK, EZchip NPS) and with
> the advent of systems with more and more CPUs over time, this may become
> more and more common to do since when one has enough CPUs
> one cares less about efficiently sharing a CPU with other tasks and
> more about efficiently monopolizing a CPU per task.
> 
> The difference of having this timer firing and workqueue kernel thread
> scheduled per second can be enormous. An artificial test measuring the
> worst case time to do a simple "i++" in an endless loop on a bare metal
> system and under Linux on an isolated CPU with dynticks and with and
> without this patch, have Linux match the bare metal performance
> (~700 cycles) with this patch and loose by couple of orders of magnitude
> (~200k cycles) without it[*].  The loss occurs for something that just
> calculates statistics. For networking applications, for example, this
> could be the difference between dropping packets or sustaining line rate.
> 
> Statistics are important and useful, but it would be great if there
> would be a way to not cause statistics gathering produce a huge
> performance difference. This patche does just that.
> 
> This patch creates a vmstat shepherd worker that monitors the
> per cpu differentials on all processors. If there are differentials
> on a processor then a vmstat worker local to the processors
> with the differentials is created. That worker will then start
> folding the diffs in regular intervals. Should the worker
> find that there is no work to be done then it will make the shepherd
> worker monitor the differentials again.
> 
> With this patch it is possible then to have periods longer than
> 2 seconds without any OS event on a "cpu" (hardware thread).
> 
> The patch shows a very minor increased in system performance.
> 
> 
> hackbench -s 512 -l 2000 -g 15 -f 25 -P
> 
> Results before the patch:
> 
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.992
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.971
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 5.063
> 
> Hackbench after the patch:
> 
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.973
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.990
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.993
> 
> 
> 
> Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-07 10:15:01.790099463 -0500
> +++ linux/mm/vmstat.c	2014-07-07 10:17:17.397891143 -0500
> @@ -7,6 +7,7 @@
>   *  zoned VM statistics
>   *  Copyright (C) 2006 Silicon Graphics, Inc.,
>   *		Christoph Lameter <christoph@lameter.com>
> + *  Copyright (C) 2008-2014 Christoph Lameter
>   */
>  #include <linux/fs.h>
>  #include <linux/mm.h>
> @@ -14,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/vmstat.h>
>  #include <linux/sched.h>
>  #include <linux/math64.h>
> @@ -419,13 +421,22 @@ void dec_zone_page_state(struct page *pa
>  EXPORT_SYMBOL(dec_zone_page_state);
>  #endif
> 
> -static inline void fold_diff(int *diff)
> +
> +/*
> + * Fold a differential into the global counters.
> + * Returns the number of counters updated.
> + */
> +static int fold_diff(int *diff)
>  {
>  	int i;
> +	int changes = 0;
> 
>  	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
> -		if (diff[i])
> +		if (diff[i]) {
>  			atomic_long_add(diff[i], &vm_stat[i]);
> +			changes++;
> +	}
> +	return changes;
>  }
> 
>  /*
> @@ -441,12 +452,15 @@ static inline void fold_diff(int *diff)
>   * statistics in the remote zone struct as well as the global cachelines
>   * with the global counters. These could cause remote node cache line
>   * bouncing and will have to be only done when necessary.
> + *
> + * The function returns the number of global counters updated.
>   */
> -static void refresh_cpu_vm_stats(void)
> +static int refresh_cpu_vm_stats(void)
>  {
>  	struct zone *zone;
>  	int i;
>  	int global_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
> +	int changes = 0;
> 
>  	for_each_populated_zone(zone) {
>  		struct per_cpu_pageset __percpu *p = zone->pageset;
> @@ -486,15 +500,17 @@ static void refresh_cpu_vm_stats(void)
>  			continue;
>  		}
> 
> -
>  		if (__this_cpu_dec_return(p->expire))
>  			continue;
> 
> -		if (__this_cpu_read(p->pcp.count))
> +		if (__this_cpu_read(p->pcp.count)) {
>  			drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
> +			changes++;
> +		}
>  #endif
>  	}
> -	fold_diff(global_diff);
> +	changes += fold_diff(global_diff);
> +	return changes;
>  }
> 
>  /*
> @@ -1228,20 +1244,105 @@ static const struct file_operations proc
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> +struct cpumask *cpu_stat_off;

I thought you converted it?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 13:20   ` Frederic Weisbecker
  0 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 13:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Thu, Jul 10, 2014 at 09:04:55AM -0500, Christoph Lameter wrote:
> 
> V7->V8
> - hackbench regression test shows a tiny performance increase due
>   to reduced OS processing.
> - Rediff against 3.16-rc4.
> 
> V6->V7
> - Remove /sysfs support.
> 
> V5->V6:
> - Shepherd thread as a general worker thread. This means
>   that the general mechanism to control worker thread
>   cpu use by Frederic Weisbecker is necessary to
>   restrict the shepherd thread to the cpus not used
>   for low latency tasks. Hopefully that is ready to be
>   merged soon. No need anymore to have a specific
>   cpu be the housekeeper cpu.
> 
> V4->V5:
> - Shepherd thread on a specific cpu (HOUSEKEEPING_CPU).
> - Incorporate Andrew's feedback
> - Work out the races.
> - Make visible which CPUs have stat updates switched off
>   in /sys/devices/system/cpu/stat_off
> 
> V3->V4:
> - Make the shepherd task not deferrable. It runs on the tick cpu
>   anyways. Deferral could get deltas too far out of sync if
>   vmstat operations are disabled for a certain processor.
> 
> V2->V3:
> - Introduce a new tick_get_housekeeping_cpu() function. Not sure
>   if that is exactly what we want but it is a start. Thomas?
> - Migrate the shepherd task if the output of
>   tick_get_housekeeping_cpu() changes.
> - Fixes recommended by Andrew.
> 
> V1->V2:
> - Optimize the need_update check by using memchr_inv.
> - Clean up.
> 
> vmstat workers are used for folding counter differentials into the
> zone, per node and global counters at certain time intervals.
> They currently run at defined intervals on all processors which will
> cause some holdoff for processors that need minimal intrusion by the
> OS.
> 
> The current vmstat_update mechanism depends on a deferrable timer
> firing every other second by default which registers a work queue item
> that runs on the local CPU, with the result that we have 1 interrupt
> and one additional schedulable task on each CPU every 2 seconds
> If a workload indeed causes VM activity or multiple tasks are running
> on a CPU, then there are probably bigger issues to deal with.
> 
> However, some workloads dedicate a CPU for a single CPU bound task.
> This is done in high performance computing, in high frequency
> financial applications, in networking (Intel DPDK, EZchip NPS) and with
> the advent of systems with more and more CPUs over time, this may become
> more and more common to do since when one has enough CPUs
> one cares less about efficiently sharing a CPU with other tasks and
> more about efficiently monopolizing a CPU per task.
> 
> The difference of having this timer firing and workqueue kernel thread
> scheduled per second can be enormous. An artificial test measuring the
> worst case time to do a simple "i++" in an endless loop on a bare metal
> system and under Linux on an isolated CPU with dynticks and with and
> without this patch, have Linux match the bare metal performance
> (~700 cycles) with this patch and loose by couple of orders of magnitude
> (~200k cycles) without it[*].  The loss occurs for something that just
> calculates statistics. For networking applications, for example, this
> could be the difference between dropping packets or sustaining line rate.
> 
> Statistics are important and useful, but it would be great if there
> would be a way to not cause statistics gathering produce a huge
> performance difference. This patche does just that.
> 
> This patch creates a vmstat shepherd worker that monitors the
> per cpu differentials on all processors. If there are differentials
> on a processor then a vmstat worker local to the processors
> with the differentials is created. That worker will then start
> folding the diffs in regular intervals. Should the worker
> find that there is no work to be done then it will make the shepherd
> worker monitor the differentials again.
> 
> With this patch it is possible then to have periods longer than
> 2 seconds without any OS event on a "cpu" (hardware thread).
> 
> The patch shows a very minor increased in system performance.
> 
> 
> hackbench -s 512 -l 2000 -g 15 -f 25 -P
> 
> Results before the patch:
> 
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.992
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.971
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 5.063
> 
> Hackbench after the patch:
> 
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.973
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.990
> Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks)
> Each sender will pass 2000 messages of 512 bytes
> Time: 4.993
> 
> 
> 
> Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-07 10:15:01.790099463 -0500
> +++ linux/mm/vmstat.c	2014-07-07 10:17:17.397891143 -0500
> @@ -7,6 +7,7 @@
>   *  zoned VM statistics
>   *  Copyright (C) 2006 Silicon Graphics, Inc.,
>   *		Christoph Lameter <christoph@lameter.com>
> + *  Copyright (C) 2008-2014 Christoph Lameter
>   */
>  #include <linux/fs.h>
>  #include <linux/mm.h>
> @@ -14,6 +15,7 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/cpu.h>
> +#include <linux/cpumask.h>
>  #include <linux/vmstat.h>
>  #include <linux/sched.h>
>  #include <linux/math64.h>
> @@ -419,13 +421,22 @@ void dec_zone_page_state(struct page *pa
>  EXPORT_SYMBOL(dec_zone_page_state);
>  #endif
> 
> -static inline void fold_diff(int *diff)
> +
> +/*
> + * Fold a differential into the global counters.
> + * Returns the number of counters updated.
> + */
> +static int fold_diff(int *diff)
>  {
>  	int i;
> +	int changes = 0;
> 
>  	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
> -		if (diff[i])
> +		if (diff[i]) {
>  			atomic_long_add(diff[i], &vm_stat[i]);
> +			changes++;
> +	}
> +	return changes;
>  }
> 
>  /*
> @@ -441,12 +452,15 @@ static inline void fold_diff(int *diff)
>   * statistics in the remote zone struct as well as the global cachelines
>   * with the global counters. These could cause remote node cache line
>   * bouncing and will have to be only done when necessary.
> + *
> + * The function returns the number of global counters updated.
>   */
> -static void refresh_cpu_vm_stats(void)
> +static int refresh_cpu_vm_stats(void)
>  {
>  	struct zone *zone;
>  	int i;
>  	int global_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
> +	int changes = 0;
> 
>  	for_each_populated_zone(zone) {
>  		struct per_cpu_pageset __percpu *p = zone->pageset;
> @@ -486,15 +500,17 @@ static void refresh_cpu_vm_stats(void)
>  			continue;
>  		}
> 
> -
>  		if (__this_cpu_dec_return(p->expire))
>  			continue;
> 
> -		if (__this_cpu_read(p->pcp.count))
> +		if (__this_cpu_read(p->pcp.count)) {
>  			drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
> +			changes++;
> +		}
>  #endif
>  	}
> -	fold_diff(global_diff);
> +	changes += fold_diff(global_diff);
> +	return changes;
>  }
> 
>  /*
> @@ -1228,20 +1244,105 @@ static const struct file_operations proc
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> +struct cpumask *cpu_stat_off;

I thought you converted it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 13:20   ` Frederic Weisbecker
@ 2014-07-11 13:56     ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 13:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> > @@ -1228,20 +1244,105 @@ static const struct file_operations proc
> >  #ifdef CONFIG_SMP
> >  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
> >  int sysctl_stat_interval __read_mostly = HZ;
> > +struct cpumask *cpu_stat_off;
>
> I thought you converted it?

Converted what? We still need to keep a cpumask around that tells us which
processor have vmstat running and which do not.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 13:56     ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 13:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> > @@ -1228,20 +1244,105 @@ static const struct file_operations proc
> >  #ifdef CONFIG_SMP
> >  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
> >  int sysctl_stat_interval __read_mostly = HZ;
> > +struct cpumask *cpu_stat_off;
>
> I thought you converted it?

Converted what? We still need to keep a cpumask around that tells us which
processor have vmstat running and which do not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 13:56     ` Christoph Lameter
@ 2014-07-11 13:58       ` Frederic Weisbecker
  -1 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 13:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, Jul 11, 2014 at 08:56:04AM -0500, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > > @@ -1228,20 +1244,105 @@ static const struct file_operations proc
> > >  #ifdef CONFIG_SMP
> > >  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
> > >  int sysctl_stat_interval __read_mostly = HZ;
> > > +struct cpumask *cpu_stat_off;
> >
> > I thought you converted it?
> 
> Converted what? We still need to keep a cpumask around that tells us which
> processor have vmstat running and which do not.
> 

Converted to cpumask_var_t.

I mean we spent dozens emails on that...

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 13:58       ` Frederic Weisbecker
  0 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 13:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, Jul 11, 2014 at 08:56:04AM -0500, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > > @@ -1228,20 +1244,105 @@ static const struct file_operations proc
> > >  #ifdef CONFIG_SMP
> > >  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
> > >  int sysctl_stat_interval __read_mostly = HZ;
> > > +struct cpumask *cpu_stat_off;
> >
> > I thought you converted it?
> 
> Converted what? We still need to keep a cpumask around that tells us which
> processor have vmstat running and which do not.
> 

Converted to cpumask_var_t.

I mean we spent dozens emails on that...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 13:58       ` Frederic Weisbecker
@ 2014-07-11 15:17         ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 15:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> > Converted what? We still need to keep a cpumask around that tells us which
> > processor have vmstat running and which do not.
> >
>
> Converted to cpumask_var_t.
>
> I mean we spent dozens emails on that...


Oh there is this outstanding fix, right.


Subject: on demand vmstat: Do not open code alloc_cpumask_var

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
+++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
@@ -1244,7 +1244,7 @@
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
-struct cpumask *cpu_stat_off;
+cpumask_var_t cpu_stat_off;

 static void vmstat_update(struct work_struct *w)
 {
@@ -1338,7 +1338,8 @@
 		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);

-	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
+	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
+		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);

 	schedule_delayed_work(&shepherd,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 15:17         ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 15:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> > Converted what? We still need to keep a cpumask around that tells us which
> > processor have vmstat running and which do not.
> >
>
> Converted to cpumask_var_t.
>
> I mean we spent dozens emails on that...


Oh there is this outstanding fix, right.


Subject: on demand vmstat: Do not open code alloc_cpumask_var

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
+++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
@@ -1244,7 +1244,7 @@
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
-struct cpumask *cpu_stat_off;
+cpumask_var_t cpu_stat_off;

 static void vmstat_update(struct work_struct *w)
 {
@@ -1338,7 +1338,8 @@
 		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);

-	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
+	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
+		BUG();
 	cpumask_copy(cpu_stat_off, cpu_online_mask);

 	schedule_delayed_work(&shepherd,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 15:17         ` Christoph Lameter
@ 2014-07-11 15:19           ` Frederic Weisbecker
  -1 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 15:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, Jul 11, 2014 at 10:17:41AM -0500, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > > Converted what? We still need to keep a cpumask around that tells us which
> > > processor have vmstat running and which do not.
> > >
> >
> > Converted to cpumask_var_t.
> >
> > I mean we spent dozens emails on that...
> 
> 
> Oh there is this outstanding fix, right.
> 
> 
> Subject: on demand vmstat: Do not open code alloc_cpumask_var
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Thanks :)

Maybe just merge both? The whole looks good.


> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
> +++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
> @@ -1244,7 +1244,7 @@
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> -struct cpumask *cpu_stat_off;
> +cpumask_var_t cpu_stat_off;
> 
>  static void vmstat_update(struct work_struct *w)
>  {
> @@ -1338,7 +1338,8 @@
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
> 
> -	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
> +		BUG();
>  	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
>  	schedule_delayed_work(&shepherd,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 15:19           ` Frederic Weisbecker
  0 siblings, 0 replies; 73+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 15:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, Jul 11, 2014 at 10:17:41AM -0500, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > > Converted what? We still need to keep a cpumask around that tells us which
> > > processor have vmstat running and which do not.
> > >
> >
> > Converted to cpumask_var_t.
> >
> > I mean we spent dozens emails on that...
> 
> 
> Oh there is this outstanding fix, right.
> 
> 
> Subject: on demand vmstat: Do not open code alloc_cpumask_var
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Thanks :)

Maybe just merge both? The whole looks good.


> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
> +++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
> @@ -1244,7 +1244,7 @@
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> -struct cpumask *cpu_stat_off;
> +cpumask_var_t cpu_stat_off;
> 
>  static void vmstat_update(struct work_struct *w)
>  {
> @@ -1338,7 +1338,8 @@
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
> 
> -	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
> +		BUG();
>  	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
>  	schedule_delayed_work(&shepherd,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 15:19           ` Frederic Weisbecker
@ 2014-07-11 15:22             ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 15:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> Maybe just merge both? The whole looks good.

I hope so. Andrew?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-11 15:22             ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-11 15:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, peterz

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> Maybe just merge both? The whole looks good.

I hope so. Andrew?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 15:22             ` Christoph Lameter
@ 2014-07-14 20:10               ` Hugh Dickins
  -1 siblings, 0 replies; 73+ messages in thread
From: Hugh Dickins @ 2014-07-14 20:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm, hughd,
	viresh.kumar, hpa, mingo, peterz

On Fri, 11 Jul 2014, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > Maybe just merge both? The whole looks good.
> 
> I hope so. Andrew?

I hope so, too: I know there are idle feckless^Htickless people
eager for it.  I did take the briefest of looks, but couldn't
really find any mm change to ack or otherwise: if Frederic is
happy with it now, seems good to go.

Hugh

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-14 20:10               ` Hugh Dickins
  0 siblings, 0 replies; 73+ messages in thread
From: Hugh Dickins @ 2014-07-14 20:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm, hughd,
	viresh.kumar, hpa, mingo, peterz

On Fri, 11 Jul 2014, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > Maybe just merge both? The whole looks good.
> 
> I hope so. Andrew?

I hope so, too: I know there are idle feckless^Htickless people
eager for it.  I did take the briefest of looks, but couldn't
really find any mm change to ack or otherwise: if Frederic is
happy with it now, seems good to go.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-14 20:10               ` Hugh Dickins
@ 2014-07-14 20:51                 ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-14 20:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm,
	viresh.kumar, hpa, mingo, peterz

On Mon, 14 Jul 2014, Hugh Dickins wrote:

> On Fri, 11 Jul 2014, Christoph Lameter wrote:
> > On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> >
> > > Maybe just merge both? The whole looks good.
> >
> > I hope so. Andrew?
>
> I hope so, too: I know there are idle feckless^Htickless people
> eager for it.  I did take the briefest of looks, but couldn't
> really find any mm change to ack or otherwise: if Frederic is
> happy with it now, seems good to go.

No its all self containted in mm/vmstat.c now.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-14 20:51                 ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-14 20:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm,
	viresh.kumar, hpa, mingo, peterz

On Mon, 14 Jul 2014, Hugh Dickins wrote:

> On Fri, 11 Jul 2014, Christoph Lameter wrote:
> > On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> >
> > > Maybe just merge both? The whole looks good.
> >
> > I hope so. Andrew?
>
> I hope so, too: I know there are idle feckless^Htickless people
> eager for it.  I did take the briefest of looks, but couldn't
> really find any mm change to ack or otherwise: if Frederic is
> happy with it now, seems good to go.

No its all self containted in mm/vmstat.c now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-10 14:04 ` Christoph Lameter
@ 2014-07-26  2:22   ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-26  2:22 UTC (permalink / raw)
  To: Christoph Lameter, akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/10/2014 10:04 AM, Christoph Lameter wrote:
> This patch creates a vmstat shepherd worker that monitors the
> per cpu differentials on all processors. If there are differentials
> on a processor then a vmstat worker local to the processors
> with the differentials is created. That worker will then start
> folding the diffs in regular intervals. Should the worker
> find that there is no work to be done then it will make the shepherd
> worker monitor the differentials again.

Hi Christoph, all,

This patch doesn't interact well with my fuzzing setup. I'm seeing
the following:

[  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
[  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
[  490.448596] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.449847] Workqueue: events vmstat_update
[  490.450558]  ffffffff97383bb6 0000000000000000 ffffffff9727df83 ffff8803077cfb68
[  490.451520]  ffffffff95dc96b3 0000000000000008 ffff8803077cfba0 ffffffff92002438
[  490.452475]  ffff8803077cfc80 ffff880be21ea138 ffff8803077cfc80 00000000001e6a48
[  490.453459] Call Trace:
[  490.453776] dump_stack (lib/dump_stack.c:52)
[  490.454394] check_preemption_disabled (lib/smp_processor_id.c:46)
[  490.455161] __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  490.455927] refresh_cpu_vm_stats (mm/vmstat.c:492)
[  490.456753] vmstat_update (mm/vmstat.c:1252)
[  490.457463] process_one_work (kernel/workqueue.c:2022 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2027)
[  490.458159] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.458887] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.459555] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2854 kernel/sched/core.c:2825)
[  490.460370] ? process_one_work (kernel/workqueue.c:2098)
[  490.461177] kthread (kernel/kthread.c:207)
[  490.461792] ? flush_kthread_work (kernel/kthread.c:176)
[  490.462529] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.463181] ? flush_kthread_work (kernel/kthread.c:176)
[  490.464008] ------------[ cut here ]------------
[  490.464613] kernel BUG at mm/vmstat.c:1278!
[  490.465116] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  490.465981] Dumping ftrace buffer:
[  490.466585]    (ftrace buffer empty)
[  490.467030] Modules linked in:
[  490.467429] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.468641] Workqueue: events vmstat_update
[  490.469163] task: ffff88030772b000 ti: ffff8803077cc000 task.ti: ffff8803077cc000
[  490.470033] RIP: vmstat_update (mm/vmstat.c:1278)
[  490.470269] RSP: 0000:ffff8803077cfcb8  EFLAGS: 00010287
[  490.470269] RAX: ffff87ffffffffff RBX: 0000000000000008 RCX: 0000000000000000
[  490.470269] RDX: ffff88030772bcf8 RSI: ffffffff972e5fd0 RDI: ffffffff986fa5d0
[  490.470269] RBP: ffff8803077cfcd0 R08: 0000000000000002 R09: 0000000000000000
[  490.470269] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8805fa7e34d0
[  490.470269] R13: ffff8803117e2240 R14: 0000000000000800 R15: 0000000000000000
[  490.470269] FS:  0000000000000000(0000) GS:ffff880311200000(0000) knlGS:0000000000000000
[  490.470269] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  490.470269] CR2: 00007fffc67fef1a CR3: 0000000017a22000 CR4: 00000000000006a0
[  490.470269] Stack:
[  490.470269]  ffff8803117dd240 ffff88030af938e0 ffff8803117e2240 ffff8803077cfd88
[  490.470269]  ffffffff911f45f5 ffffffff911f455d ffff88030af93928 ffff88031081e900
[  490.470269]  ffff88030af938f0 ffff88030af93900 ffff88030af938e8 ffff88030af938f8
[  490.470269] Call Trace:
[  490.470269] process_one_work (kernel/workqueue.c:2022 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2027)
[  490.470269] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.470269] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.470269] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2854 kernel/sched/core.c:2825)
[  490.470269] ? process_one_work (kernel/workqueue.c:2098)
[  490.470269] kthread (kernel/kthread.c:207)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[  490.470269] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[ 490.470269] Code: c7 d0 a5 6f 98 89 c3 e8 9f 9e 08 00 3b 1d 89 8b 35 07 73 7f f0 49 0f ab 1c 24 72 0f 5b 41 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 48 63 3d f1 be 36 07 48 c7 c3 40 d2 1d
All code
========
   0:   c7                      (bad)
   1:   d0 a5 6f 98 89 c3       shlb   -0x3c766791(%rbp)
   7:   e8 9f 9e 08 00          callq  0x89eab
   c:   3b 1d 89 8b 35 07       cmp    0x7358b89(%rip),%ebx        # 0x7358b9b
  12:   73 7f                   jae    0x93
  14:   f0 49 0f ab 1c 24       lock bts %rbx,(%r12)
  1a:   72 0f                   jb     0x2b
  1c:   5b                      pop    %rbx
  1d:   41 5c                   pop    %r12
  1f:   41 5d                   pop    %r13
  21:   5d                      pop    %rbp
  22:   c3                      retq
  23:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  2a:   00
  2b:*  0f 0b                   ud2             <-- trapping instruction
  2d:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  33:   48 63 3d f1 be 36 07    movslq 0x736bef1(%rip),%rdi        # 0x736bf2b
  3a:   48 c7 c3 40 d2 1d 00    mov    $0x1dd240,%rbx

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
   8:   48 63 3d f1 be 36 07    movslq 0x736bef1(%rip),%rdi        # 0x736bf00
   f:   48 c7 c3 40 d2 1d 00    mov    $0x1dd240,%rbx
[  490.470269] RIP vmstat_update (mm/vmstat.c:1278)
[  490.470269]  RSP <ffff8803077cfcb8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-26  2:22   ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-26  2:22 UTC (permalink / raw)
  To: Christoph Lameter, akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/10/2014 10:04 AM, Christoph Lameter wrote:
> This patch creates a vmstat shepherd worker that monitors the
> per cpu differentials on all processors. If there are differentials
> on a processor then a vmstat worker local to the processors
> with the differentials is created. That worker will then start
> folding the diffs in regular intervals. Should the worker
> find that there is no work to be done then it will make the shepherd
> worker monitor the differentials again.

Hi Christoph, all,

This patch doesn't interact well with my fuzzing setup. I'm seeing
the following:

[  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
[  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
[  490.448596] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.449847] Workqueue: events vmstat_update
[  490.450558]  ffffffff97383bb6 0000000000000000 ffffffff9727df83 ffff8803077cfb68
[  490.451520]  ffffffff95dc96b3 0000000000000008 ffff8803077cfba0 ffffffff92002438
[  490.452475]  ffff8803077cfc80 ffff880be21ea138 ffff8803077cfc80 00000000001e6a48
[  490.453459] Call Trace:
[  490.453776] dump_stack (lib/dump_stack.c:52)
[  490.454394] check_preemption_disabled (lib/smp_processor_id.c:46)
[  490.455161] __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  490.455927] refresh_cpu_vm_stats (mm/vmstat.c:492)
[  490.456753] vmstat_update (mm/vmstat.c:1252)
[  490.457463] process_one_work (kernel/workqueue.c:2022 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2027)
[  490.458159] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.458887] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.459555] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2854 kernel/sched/core.c:2825)
[  490.460370] ? process_one_work (kernel/workqueue.c:2098)
[  490.461177] kthread (kernel/kthread.c:207)
[  490.461792] ? flush_kthread_work (kernel/kthread.c:176)
[  490.462529] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.463181] ? flush_kthread_work (kernel/kthread.c:176)
[  490.464008] ------------[ cut here ]------------
[  490.464613] kernel BUG at mm/vmstat.c:1278!
[  490.465116] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  490.465981] Dumping ftrace buffer:
[  490.466585]    (ftrace buffer empty)
[  490.467030] Modules linked in:
[  490.467429] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.468641] Workqueue: events vmstat_update
[  490.469163] task: ffff88030772b000 ti: ffff8803077cc000 task.ti: ffff8803077cc000
[  490.470033] RIP: vmstat_update (mm/vmstat.c:1278)
[  490.470269] RSP: 0000:ffff8803077cfcb8  EFLAGS: 00010287
[  490.470269] RAX: ffff87ffffffffff RBX: 0000000000000008 RCX: 0000000000000000
[  490.470269] RDX: ffff88030772bcf8 RSI: ffffffff972e5fd0 RDI: ffffffff986fa5d0
[  490.470269] RBP: ffff8803077cfcd0 R08: 0000000000000002 R09: 0000000000000000
[  490.470269] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8805fa7e34d0
[  490.470269] R13: ffff8803117e2240 R14: 0000000000000800 R15: 0000000000000000
[  490.470269] FS:  0000000000000000(0000) GS:ffff880311200000(0000) knlGS:0000000000000000
[  490.470269] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  490.470269] CR2: 00007fffc67fef1a CR3: 0000000017a22000 CR4: 00000000000006a0
[  490.470269] Stack:
[  490.470269]  ffff8803117dd240 ffff88030af938e0 ffff8803117e2240 ffff8803077cfd88
[  490.470269]  ffffffff911f45f5 ffffffff911f455d ffff88030af93928 ffff88031081e900
[  490.470269]  ffff88030af938f0 ffff88030af93900 ffff88030af938e8 ffff88030af938f8
[  490.470269] Call Trace:
[  490.470269] process_one_work (kernel/workqueue.c:2022 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2027)
[  490.470269] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.470269] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.470269] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2854 kernel/sched/core.c:2825)
[  490.470269] ? process_one_work (kernel/workqueue.c:2098)
[  490.470269] kthread (kernel/kthread.c:207)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[  490.470269] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[ 490.470269] Code: c7 d0 a5 6f 98 89 c3 e8 9f 9e 08 00 3b 1d 89 8b 35 07 73 7f f0 49 0f ab 1c 24 72 0f 5b 41 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 48 63 3d f1 be 36 07 48 c7 c3 40 d2 1d
All code
========
   0:   c7                      (bad)
   1:   d0 a5 6f 98 89 c3       shlb   -0x3c766791(%rbp)
   7:   e8 9f 9e 08 00          callq  0x89eab
   c:   3b 1d 89 8b 35 07       cmp    0x7358b89(%rip),%ebx        # 0x7358b9b
  12:   73 7f                   jae    0x93
  14:   f0 49 0f ab 1c 24       lock bts %rbx,(%r12)
  1a:   72 0f                   jb     0x2b
  1c:   5b                      pop    %rbx
  1d:   41 5c                   pop    %r12
  1f:   41 5d                   pop    %r13
  21:   5d                      pop    %rbp
  22:   c3                      retq
  23:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  2a:   00
  2b:*  0f 0b                   ud2             <-- trapping instruction
  2d:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  33:   48 63 3d f1 be 36 07    movslq 0x736bef1(%rip),%rdi        # 0x736bf2b
  3a:   48 c7 c3 40 d2 1d 00    mov    $0x1dd240,%rbx

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
   8:   48 63 3d f1 be 36 07    movslq 0x736bef1(%rip),%rdi        # 0x736bf00
   f:   48 c7 c3 40 d2 1d 00    mov    $0x1dd240,%rbx
[  490.470269] RIP vmstat_update (mm/vmstat.c:1278)
[  490.470269]  RSP <ffff8803077cfcb8>


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-26  2:22   ` Sasha Levin
@ 2014-07-28 18:55     ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-28 18:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Fri, 25 Jul 2014, Sasha Levin wrote:

> This patch doesn't interact well with my fuzzing setup. I'm seeing
> the following:
>
> [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20

__this_cpu_read() from vmstat_update is only called from a kworker that
is bound to a single cpu. A false positive?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-28 18:55     ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-28 18:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Fri, 25 Jul 2014, Sasha Levin wrote:

> This patch doesn't interact well with my fuzzing setup. I'm seeing
> the following:
>
> [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20

__this_cpu_read() from vmstat_update is only called from a kworker that
is bound to a single cpu. A false positive?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-28 18:55     ` Christoph Lameter
@ 2014-07-28 21:54       ` Andrew Morton
  -1 siblings, 0 replies; 73+ messages in thread
From: Andrew Morton @ 2014-07-28 21:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Sasha Levin, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 28 Jul 2014 13:55:17 -0500 (CDT) Christoph Lameter <cl@gentwo.org> wrote:

> On Fri, 25 Jul 2014, Sasha Levin wrote:
> 
> > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > the following:
> >
> > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> 
> __this_cpu_read() from vmstat_update is only called from a kworker that
> is bound to a single cpu. A false positive?

schedule_delayed_work() uses system_wq.  The comment in workqueue.h says

 * system_wq is the one used by schedule[_delayed]_work[_on]().
 * Multi-CPU multi-threaded.  There are users which expect relatively
 * short queue flush time.  Don't queue works which can run for too
 * long.

but the code itself does

	system_wq = alloc_workqueue("events", 0, 0);

ie: it didn't pass WQ_UNBOUND in the flags.


Tejun, wazzup?



Also, Sasha's report showed this:

[  490.464613] kernel BUG at mm/vmstat.c:1278!

That's your VM_BUG_ON() in vmstat_update().  That ain't no false
positive!



Is this code expecting that schedule_delayed_work() will schedule the
work on the current CPU?  I don't think it will do that.  Maybe you
should be looking at schedule_delayed_work_on().


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-28 21:54       ` Andrew Morton
  0 siblings, 0 replies; 73+ messages in thread
From: Andrew Morton @ 2014-07-28 21:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Sasha Levin, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 28 Jul 2014 13:55:17 -0500 (CDT) Christoph Lameter <cl@gentwo.org> wrote:

> On Fri, 25 Jul 2014, Sasha Levin wrote:
> 
> > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > the following:
> >
> > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> 
> __this_cpu_read() from vmstat_update is only called from a kworker that
> is bound to a single cpu. A false positive?

schedule_delayed_work() uses system_wq.  The comment in workqueue.h says

 * system_wq is the one used by schedule[_delayed]_work[_on]().
 * Multi-CPU multi-threaded.  There are users which expect relatively
 * short queue flush time.  Don't queue works which can run for too
 * long.

but the code itself does

	system_wq = alloc_workqueue("events", 0, 0);

ie: it didn't pass WQ_UNBOUND in the flags.


Tejun, wazzup?



Also, Sasha's report showed this:

[  490.464613] kernel BUG at mm/vmstat.c:1278!

That's your VM_BUG_ON() in vmstat_update().  That ain't no false
positive!



Is this code expecting that schedule_delayed_work() will schedule the
work on the current CPU?  I don't think it will do that.  Maybe you
should be looking at schedule_delayed_work_on().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-28 21:54       ` Andrew Morton
@ 2014-07-28 22:00         ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-28 22:00 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/28/2014 05:54 PM, Andrew Morton wrote:
> Also, Sasha's report showed this:
> 
> [  490.464613] kernel BUG at mm/vmstat.c:1278!
> 
> That's your VM_BUG_ON() in vmstat_update().  That ain't no false
> positive!
> 
> 
> 
> Is this code expecting that schedule_delayed_work() will schedule the
> work on the current CPU?  I don't think it will do that.  Maybe you
> should be looking at schedule_delayed_work_on().

I suspected that the re-queue might be wrong (schedule_delayed_work vs
schedule_delayed_work_on) and tried fixing that, but that didn't solve
the issue.

I could give it another go unless someone sees the issue, I might have
messed something up.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-28 22:00         ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-28 22:00 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/28/2014 05:54 PM, Andrew Morton wrote:
> Also, Sasha's report showed this:
> 
> [  490.464613] kernel BUG at mm/vmstat.c:1278!
> 
> That's your VM_BUG_ON() in vmstat_update().  That ain't no false
> positive!
> 
> 
> 
> Is this code expecting that schedule_delayed_work() will schedule the
> work on the current CPU?  I don't think it will do that.  Maybe you
> should be looking at schedule_delayed_work_on().

I suspected that the re-queue might be wrong (schedule_delayed_work vs
schedule_delayed_work_on) and tried fixing that, but that didn't solve
the issue.

I could give it another go unless someone sees the issue, I might have
messed something up.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-28 18:55     ` Christoph Lameter
  (?)
  (?)
@ 2014-07-29  7:56     ` Peter Zijlstra
  2014-07-29 12:05         ` Tejun Heo
  -1 siblings, 1 reply; 73+ messages in thread
From: Peter Zijlstra @ 2014-07-29  7:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Sasha Levin, akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo

[-- Attachment #1: Type: text/plain, Size: 564 bytes --]

On Mon, Jul 28, 2014 at 01:55:17PM -0500, Christoph Lameter wrote:
> On Fri, 25 Jul 2014, Sasha Levin wrote:
> 
> > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > the following:
> >
> > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> 
> __this_cpu_read() from vmstat_update is only called from a kworker that
> is bound to a single cpu. A false positive?

kworkers are never guaranteed to be so, its a 'feature' :/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29  7:56     ` Peter Zijlstra
@ 2014-07-29 12:05         ` Tejun Heo
  0 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 12:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Tue, Jul 29, 2014 at 09:56:37AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 01:55:17PM -0500, Christoph Lameter wrote:
> > On Fri, 25 Jul 2014, Sasha Levin wrote:
> > 
> > > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > > the following:
> > >
> > > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> > 
> > __this_cpu_read() from vmstat_update is only called from a kworker that
> > is bound to a single cpu. A false positive?
> 
> kworkers are never guaranteed to be so, its a 'feature' :/

It's because we don't distinguish work items which are per-cpu for
optimization and per-cpu for correctness and can't automatically flush
/ cancel / block per-cpu work items when a cpu goes down.  I like the
idea of distingushing them but it's gonna take a lot of auditing.

Any work item usage which requires per-cpu for correctness should
implement cpu down hook to flush in-flight work items and block
further issuance.  This hasn't changed from the beginning and was
necessary even before cmwq.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 12:05         ` Tejun Heo
  0 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 12:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Tue, Jul 29, 2014 at 09:56:37AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 28, 2014 at 01:55:17PM -0500, Christoph Lameter wrote:
> > On Fri, 25 Jul 2014, Sasha Levin wrote:
> > 
> > > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > > the following:
> > >
> > > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> > 
> > __this_cpu_read() from vmstat_update is only called from a kworker that
> > is bound to a single cpu. A false positive?
> 
> kworkers are never guaranteed to be so, its a 'feature' :/

It's because we don't distinguish work items which are per-cpu for
optimization and per-cpu for correctness and can't automatically flush
/ cancel / block per-cpu work items when a cpu goes down.  I like the
idea of distingushing them but it's gonna take a lot of auditing.

Any work item usage which requires per-cpu for correctness should
implement cpu down hook to flush in-flight work items and block
further issuance.  This hasn't changed from the beginning and was
necessary even before cmwq.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 12:05         ` Tejun Heo
@ 2014-07-29 12:23           ` Peter Zijlstra
  -1 siblings, 0 replies; 73+ messages in thread
From: Peter Zijlstra @ 2014-07-29 12:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Tue, Jul 29, 2014 at 08:05:25AM -0400, Tejun Heo wrote:
> On Tue, Jul 29, 2014 at 09:56:37AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 28, 2014 at 01:55:17PM -0500, Christoph Lameter wrote:
> > > On Fri, 25 Jul 2014, Sasha Levin wrote:
> > > 
> > > > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > > > the following:
> > > >
> > > > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > > > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> > > 
> > > __this_cpu_read() from vmstat_update is only called from a kworker that
> > > is bound to a single cpu. A false positive?
> > 
> > kworkers are never guaranteed to be so, its a 'feature' :/
> 
> It's because we don't distinguish work items which are per-cpu for
> optimization and per-cpu for correctness and can't automatically flush
> / cancel / block per-cpu work items when a cpu goes down.  I like the
> idea of distingushing them but it's gonna take a lot of auditing.

Just force flush on unplug and fix those that complain. No auditing
needed for that.

> Any work item usage which requires per-cpu for correctness should
> implement cpu down hook to flush in-flight work items and block
> further issuance.  This hasn't changed from the beginning and was
> necessary even before cmwq.

I think before cmwq we'd run into the broken affinity warning in the
scheduler.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 12:23           ` Peter Zijlstra
  0 siblings, 0 replies; 73+ messages in thread
From: Peter Zijlstra @ 2014-07-29 12:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Tue, Jul 29, 2014 at 08:05:25AM -0400, Tejun Heo wrote:
> On Tue, Jul 29, 2014 at 09:56:37AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 28, 2014 at 01:55:17PM -0500, Christoph Lameter wrote:
> > > On Fri, 25 Jul 2014, Sasha Levin wrote:
> > > 
> > > > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > > > the following:
> > > >
> > > > [  490.446927] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/16:1/7368
> > > > [  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
> > > 
> > > __this_cpu_read() from vmstat_update is only called from a kworker that
> > > is bound to a single cpu. A false positive?
> > 
> > kworkers are never guaranteed to be so, its a 'feature' :/
> 
> It's because we don't distinguish work items which are per-cpu for
> optimization and per-cpu for correctness and can't automatically flush
> / cancel / block per-cpu work items when a cpu goes down.  I like the
> idea of distingushing them but it's gonna take a lot of auditing.

Just force flush on unplug and fix those that complain. No auditing
needed for that.

> Any work item usage which requires per-cpu for correctness should
> implement cpu down hook to flush in-flight work items and block
> further issuance.  This hasn't changed from the beginning and was
> necessary even before cmwq.

I think before cmwq we'd run into the broken affinity warning in the
scheduler.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 12:23           ` Peter Zijlstra
@ 2014-07-29 13:12             ` Tejun Heo
  -1 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

(cc'ing Lai)

Hello,

On Tue, Jul 29, 2014 at 02:23:03PM +0200, Peter Zijlstra wrote:
> > It's because we don't distinguish work items which are per-cpu for
> > optimization and per-cpu for correctness and can't automatically flush
> > / cancel / block per-cpu work items when a cpu goes down.  I like the
> > idea of distingushing them but it's gonna take a lot of auditing.
> 
> Just force flush on unplug and fix those that complain. No auditing
> needed for that.

I'm not sure that's a viable way forward.  It's not like we can
readily trigger the problematic cases which can lead to long pauses
during cpu down.  Besides, we need the distinction at the API level,
which is the whole point of this.  The best way probably is converting
all the correctness ones (these are the minorities) over to
queue_work_on() so that the per-cpu requirement is explicit.

> > Any work item usage which requires per-cpu for correctness should
> > implement cpu down hook to flush in-flight work items and block
> > further issuance.  This hasn't changed from the beginning and was
> > necessary even before cmwq.
> 
> I think before cmwq we'd run into the broken affinity warning in the
> scheduler.

That and work items silently not executed if queued on a downed cpu.
IIRC, we also had quite a few broken ones which were per-cpu but w/o
cpu down handling which just happened to work most of the time because
queueing itself was per-cpu in most cases and we didn't do cpu
on/offlining as often back then.  During cmwq conversion, I just
allowed them as I didn't want to add cpu down hooks for all of the
many per-cpu workqueue usages.  The lack of the distinction between
the two sets has always been there.

I agree this can be improved, but at least for now, please add cpu
down hooks.  We need them right now and they'll be helpful when later
separating out the correctness ones.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 13:12             ` Tejun Heo
  0 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

(cc'ing Lai)

Hello,

On Tue, Jul 29, 2014 at 02:23:03PM +0200, Peter Zijlstra wrote:
> > It's because we don't distinguish work items which are per-cpu for
> > optimization and per-cpu for correctness and can't automatically flush
> > / cancel / block per-cpu work items when a cpu goes down.  I like the
> > idea of distingushing them but it's gonna take a lot of auditing.
> 
> Just force flush on unplug and fix those that complain. No auditing
> needed for that.

I'm not sure that's a viable way forward.  It's not like we can
readily trigger the problematic cases which can lead to long pauses
during cpu down.  Besides, we need the distinction at the API level,
which is the whole point of this.  The best way probably is converting
all the correctness ones (these are the minorities) over to
queue_work_on() so that the per-cpu requirement is explicit.

> > Any work item usage which requires per-cpu for correctness should
> > implement cpu down hook to flush in-flight work items and block
> > further issuance.  This hasn't changed from the beginning and was
> > necessary even before cmwq.
> 
> I think before cmwq we'd run into the broken affinity warning in the
> scheduler.

That and work items silently not executed if queued on a downed cpu.
IIRC, we also had quite a few broken ones which were per-cpu but w/o
cpu down handling which just happened to work most of the time because
queueing itself was per-cpu in most cases and we didn't do cpu
on/offlining as often back then.  During cmwq conversion, I just
allowed them as I didn't want to add cpu down hooks for all of the
many per-cpu workqueue usages.  The lack of the distinction between
the two sets has always been there.

I agree this can be improved, but at least for now, please add cpu
down hooks.  We need them right now and they'll be helpful when later
separating out the correctness ones.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 13:12             ` Tejun Heo
@ 2014-07-29 15:10               ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> I agree this can be improved, but at least for now, please add cpu
> down hooks.  We need them right now and they'll be helpful when later
> separating out the correctness ones.

mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:10               ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> I agree this can be improved, but at least for now, please add cpu
> down hooks.  We need them right now and they'll be helpful when later
> separating out the correctness ones.

mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:10               ` Christoph Lameter
@ 2014-07-29 15:14                 ` Tejun Heo
  -1 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 15:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, Jul 29, 2014 at 10:10:11AM -0500, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
> > I agree this can be improved, but at least for now, please add cpu
> > down hooks.  We need them right now and they'll be helpful when later
> > separating out the correctness ones.
> 
> mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().

Hmmm, well, then it's something else.  Either a bug in workqueue or in
the caller.  Given the track record, the latter is more likely.
e.g. it looks kinda suspicious that the work func is cleared after
cancel_delayed_work_sync() is called.  What happens if somebody tries
to schedule it inbetween?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:14                 ` Tejun Heo
  0 siblings, 0 replies; 73+ messages in thread
From: Tejun Heo @ 2014-07-29 15:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, Jul 29, 2014 at 10:10:11AM -0500, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
> > I agree this can be improved, but at least for now, please add cpu
> > down hooks.  We need them right now and they'll be helpful when later
> > separating out the correctness ones.
> 
> mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().

Hmmm, well, then it's something else.  Either a bug in workqueue or in
the caller.  Given the track record, the latter is more likely.
e.g. it looks kinda suspicious that the work func is cleared after
cancel_delayed_work_sync() is called.  What happens if somebody tries
to schedule it inbetween?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-28 21:54       ` Andrew Morton
@ 2014-07-29 15:17         ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sasha Levin, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 28 Jul 2014, Andrew Morton wrote:

> Also, Sasha's report showed this:
>
> [  490.464613] kernel BUG at mm/vmstat.c:1278!
>
> That's your VM_BUG_ON() in vmstat_update().  That ain't no false
> positive!

AFAICT this is because of the earlier BUG().



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:17         ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sasha Levin, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 28 Jul 2014, Andrew Morton wrote:

> Also, Sasha's report showed this:
>
> [  490.464613] kernel BUG at mm/vmstat.c:1278!
>
> That's your VM_BUG_ON() in vmstat_update().  That ain't no false
> positive!

AFAICT this is because of the earlier BUG().


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 13:12             ` Tejun Heo
@ 2014-07-29 15:22               ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> I'm not sure that's a viable way forward.  It's not like we can
> readily trigger the problematic cases which can lead to long pauses
> during cpu down.  Besides, we need the distinction at the API level,
> which is the whole point of this.  The best way probably is converting
> all the correctness ones (these are the minorities) over to
> queue_work_on() so that the per-cpu requirement is explicit.

Ok so we would need this fix to avoid the message:


Subject: vmstat: use schedule_delayed_work_on to avoid false positives

It seems that schedule_delayed_work_on will check for preemption even
though none can occur. schedule_delayed_work_on will not do that. So
use that function to suppress false positives.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-29 10:14:42.356988271 -0500
+++ linux/mm/vmstat.c	2014-07-29 10:18:28.205920997 -0500
@@ -1255,7 +1255,8 @@ static void vmstat_update(struct work_st
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
 		 */
-		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+		schedule_delayed_work_on(smp_processor_id(),
+			this_cpu_ptr(&vmstat_work),
 			round_jiffies_relative(sysctl_stat_interval));
 	else {
 		/*

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:22               ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> I'm not sure that's a viable way forward.  It's not like we can
> readily trigger the problematic cases which can lead to long pauses
> during cpu down.  Besides, we need the distinction at the API level,
> which is the whole point of this.  The best way probably is converting
> all the correctness ones (these are the minorities) over to
> queue_work_on() so that the per-cpu requirement is explicit.

Ok so we would need this fix to avoid the message:


Subject: vmstat: use schedule_delayed_work_on to avoid false positives

It seems that schedule_delayed_work_on will check for preemption even
though none can occur. schedule_delayed_work_on will not do that. So
use that function to suppress false positives.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-29 10:14:42.356988271 -0500
+++ linux/mm/vmstat.c	2014-07-29 10:18:28.205920997 -0500
@@ -1255,7 +1255,8 @@ static void vmstat_update(struct work_st
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
 		 */
-		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
+		schedule_delayed_work_on(smp_processor_id(),
+			this_cpu_ptr(&vmstat_work),
 			round_jiffies_relative(sysctl_stat_interval));
 	else {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:14                 ` Tejun Heo
@ 2014-07-29 15:26                   ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> > mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().
>
> Hmmm, well, then it's something else.  Either a bug in workqueue or in
> the caller.  Given the track record, the latter is more likely.
> e.g. it looks kinda suspicious that the work func is cleared after
> cancel_delayed_work_sync() is called.  What happens if somebody tries

Ok we can clear it before then.

Just looked at the current upstream code. It also does a __this_cpu_read()
in refresh_cpu_stats() without triggering the preemption check. What
changed in -next that made the test trigger now?


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:26                   ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> > mm/vmstat.c already has cpu down hooks. See vmstat_cpuup_callback().
>
> Hmmm, well, then it's something else.  Either a bug in workqueue or in
> the caller.  Given the track record, the latter is more likely.
> e.g. it looks kinda suspicious that the work func is cleared after
> cancel_delayed_work_sync() is called.  What happens if somebody tries

Ok we can clear it before then.

Just looked at the current upstream code. It also does a __this_cpu_read()
in refresh_cpu_stats() without triggering the preemption check. What
changed in -next that made the test trigger now?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:14                 ` Tejun Heo
@ 2014-07-29 15:39                   ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> Hmmm, well, then it's something else.  Either a bug in workqueue or in
> the caller.  Given the track record, the latter is more likely.
> e.g. it looks kinda suspicious that the work func is cleared after
> cancel_delayed_work_sync() is called.  What happens if somebody tries
> to schedule it inbetween?

Here is yet another patch to also address this idea:

Subject: vmstat: Clear the work.func before cancelling delayed work

Looks strange to me but Tejun thinks this could do some good.
If this really is the right thing to do then cancel_delayed_work should
zap the work func itselt I think.

Signed-off-by: Christoph Lameter <cl@linux.com>


Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
+++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
@@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		per_cpu(vmstat_work, cpu).work.func = NULL;
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:39                   ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Tejun Heo wrote:

> Hmmm, well, then it's something else.  Either a bug in workqueue or in
> the caller.  Given the track record, the latter is more likely.
> e.g. it looks kinda suspicious that the work func is cleared after
> cancel_delayed_work_sync() is called.  What happens if somebody tries
> to schedule it inbetween?

Here is yet another patch to also address this idea:

Subject: vmstat: Clear the work.func before cancelling delayed work

Looks strange to me but Tejun thinks this could do some good.
If this really is the right thing to do then cancel_delayed_work should
zap the work func itselt I think.

Signed-off-by: Christoph Lameter <cl@linux.com>


Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
+++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
@@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		per_cpu(vmstat_work, cpu).work.func = NULL;
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:22               ` Christoph Lameter
@ 2014-07-29 15:43                 ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-29 15:43 UTC (permalink / raw)
  To: Christoph Lameter, Tejun Heo
  Cc: Peter Zijlstra, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo,
	Lai Jiangshan

On 07/29/2014 11:22 AM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> I'm not sure that's a viable way forward.  It's not like we can
>> readily trigger the problematic cases which can lead to long pauses
>> during cpu down.  Besides, we need the distinction at the API level,
>> which is the whole point of this.  The best way probably is converting
>> all the correctness ones (these are the minorities) over to
>> queue_work_on() so that the per-cpu requirement is explicit.
> 
> Ok so we would need this fix to avoid the message:
> 
> 
> Subject: vmstat: use schedule_delayed_work_on to avoid false positives
> 
> It seems that schedule_delayed_work_on will check for preemption even
> though none can occur. schedule_delayed_work_on will not do that. So
> use that function to suppress false positives.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:14:42.356988271 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:18:28.205920997 -0500
> @@ -1255,7 +1255,8 @@ static void vmstat_update(struct work_st
>  		 * to occur in the future. Keep on running the
>  		 * update worker thread.
>  		 */
> -		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
> +		schedule_delayed_work_on(smp_processor_id(),
> +			this_cpu_ptr(&vmstat_work),
>  			round_jiffies_relative(sysctl_stat_interval));
>  	else {
>  		/*
> 

I've tested, and this patch doesn't fix neither of the bugs reported.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:43                 ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-29 15:43 UTC (permalink / raw)
  To: Christoph Lameter, Tejun Heo
  Cc: Peter Zijlstra, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo,
	Lai Jiangshan

On 07/29/2014 11:22 AM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> I'm not sure that's a viable way forward.  It's not like we can
>> readily trigger the problematic cases which can lead to long pauses
>> during cpu down.  Besides, we need the distinction at the API level,
>> which is the whole point of this.  The best way probably is converting
>> all the correctness ones (these are the minorities) over to
>> queue_work_on() so that the per-cpu requirement is explicit.
> 
> Ok so we would need this fix to avoid the message:
> 
> 
> Subject: vmstat: use schedule_delayed_work_on to avoid false positives
> 
> It seems that schedule_delayed_work_on will check for preemption even
> though none can occur. schedule_delayed_work_on will not do that. So
> use that function to suppress false positives.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:14:42.356988271 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:18:28.205920997 -0500
> @@ -1255,7 +1255,8 @@ static void vmstat_update(struct work_st
>  		 * to occur in the future. Keep on running the
>  		 * update worker thread.
>  		 */
> -		schedule_delayed_work(this_cpu_ptr(&vmstat_work),
> +		schedule_delayed_work_on(smp_processor_id(),
> +			this_cpu_ptr(&vmstat_work),
>  			round_jiffies_relative(sysctl_stat_interval));
>  	else {
>  		/*
> 

I've tested, and this patch doesn't fix neither of the bugs reported.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:39                   ` Christoph Lameter
@ 2014-07-29 15:47                     ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-29 15:47 UTC (permalink / raw)
  To: Christoph Lameter, Tejun Heo
  Cc: Peter Zijlstra, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo,
	Lai Jiangshan

On 07/29/2014 11:39 AM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> Hmmm, well, then it's something else.  Either a bug in workqueue or in
>> the caller.  Given the track record, the latter is more likely.
>> e.g. it looks kinda suspicious that the work func is cleared after
>> cancel_delayed_work_sync() is called.  What happens if somebody tries
>> to schedule it inbetween?
> 
> Here is yet another patch to also address this idea:
> 
> Subject: vmstat: Clear the work.func before cancelling delayed work
> 
> Looks strange to me but Tejun thinks this could do some good.
> If this really is the right thing to do then cancel_delayed_work should
> zap the work func itselt I think.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> 

I'm slightly confused here. The on demand vmstat workers patch did this:

        case CPU_DOWN_PREPARE_FROZEN:
-               cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
-               per_cpu(vmstat_work, cpu).work.func = NULL;
+               if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
+                       cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

So your new patch doesn't apply on top of it, and doesn't make sense before it.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:47                     ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-07-29 15:47 UTC (permalink / raw)
  To: Christoph Lameter, Tejun Heo
  Cc: Peter Zijlstra, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Frederic Weisbecker, Paul E. McKenney,
	linux-kernel, linux-mm, hughd, viresh.kumar, hpa, mingo,
	Lai Jiangshan

On 07/29/2014 11:39 AM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> Hmmm, well, then it's something else.  Either a bug in workqueue or in
>> the caller.  Given the track record, the latter is more likely.
>> e.g. it looks kinda suspicious that the work func is cleared after
>> cancel_delayed_work_sync() is called.  What happens if somebody tries
>> to schedule it inbetween?
> 
> Here is yet another patch to also address this idea:
> 
> Subject: vmstat: Clear the work.func before cancelling delayed work
> 
> Looks strange to me but Tejun thinks this could do some good.
> If this really is the right thing to do then cancel_delayed_work should
> zap the work func itselt I think.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> 

I'm slightly confused here. The on demand vmstat workers patch did this:

        case CPU_DOWN_PREPARE_FROZEN:
-               cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
-               per_cpu(vmstat_work, cpu).work.func = NULL;
+               if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
+                       cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

So your new patch doesn't apply on top of it, and doesn't make sense before it.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:47                     ` Sasha Levin
@ 2014-07-29 15:59                       ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:59 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Tejun Heo, Peter Zijlstra, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Sasha Levin wrote:

> > Index: linux/mm/vmstat.c
> > ===================================================================
> > --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> > +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> > @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
> >  		break;
> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		break;
> >  	case CPU_DOWN_FAILED:
> >  	case CPU_DOWN_FAILED_FROZEN:
> >
>
> I'm slightly confused here. The on demand vmstat workers patch did this:
>
>         case CPU_DOWN_PREPARE_FROZEN:
> -               cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> -               per_cpu(vmstat_work, cpu).work.func = NULL;
> +               if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> +                       cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> So your new patch doesn't apply on top of it, and doesn't make sense before it.

Tejun was looking at upsteram and so I fixed upstream ;-)

Is it really necessary to set the work.func to NULL? If so then the
work.func will have to be initialized when a processor is brought online.

Canceling the work should be enough to disable the execution of the
function.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-29 15:59                       ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-29 15:59 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Tejun Heo, Peter Zijlstra, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo, Lai Jiangshan

On Tue, 29 Jul 2014, Sasha Levin wrote:

> > Index: linux/mm/vmstat.c
> > ===================================================================
> > --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> > +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> > @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
> >  		break;
> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		break;
> >  	case CPU_DOWN_FAILED:
> >  	case CPU_DOWN_FAILED_FROZEN:
> >
>
> I'm slightly confused here. The on demand vmstat workers patch did this:
>
>         case CPU_DOWN_PREPARE_FROZEN:
> -               cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> -               per_cpu(vmstat_work, cpu).work.func = NULL;
> +               if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> +                       cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> So your new patch doesn't apply on top of it, and doesn't make sense before it.

Tejun was looking at upsteram and so I fixed upstream ;-)

Is it really necessary to set the work.func to NULL? If so then the
work.func will have to be initialized when a processor is brought online.

Canceling the work should be enough to disable the execution of the
function.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-10 14:04 ` Christoph Lameter
@ 2014-07-30  2:57   ` Lai Jiangshan
  -1 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  2:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

If I understand the semantics of the cpu_stat_off correctly, please read.

cpu_stat_off = a set of such CPU: the cpu is online && vmstat_work is off
I consider some code forget to guarantee each cpu in cpu_stat_off is online.

Thanks,
Lai

On 07/10/2014 10:04 PM, Christoph Lameter wrote:

> +
> +/*
> + * Shepherd worker thread that checks the
> + * differentials of processors that have their worker
> + * threads for vm statistics updates disabled because of
> + * inactivity.
> + */
> +static void vmstat_shepherd(struct work_struct *w);
> +
> +static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
> +
> +static void vmstat_shepherd(struct work_struct *w)
> +{
> +	int cpu;
> +
> +	/* Check processors whose vmstat worker threads have been disabled */

I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
(after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
proper lock.

I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.


	get_cpu_online(); /* mutex_lock(&cpu_stat_off_mutex); */
	for_each_cpu(cpu, cpu_stat_off)
		if (need_update(cpu) &&
			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))

			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
				__round_jiffies_relative(sysctl_stat_interval, cpu));
	put_cpu_online(); /* mutex_unlock(&cpu_stat_off_mutex); */



> +
> +
> +	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
> +
>  }
> 
> -static void start_cpu_timer(int cpu)
> +static void __init start_shepherd_timer(void)
>  {
> -	struct delayed_work *work = &per_cpu(vmstat_work, cpu);
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> +			vmstat_update);
> +
> +	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
> -	INIT_DEFERRABLE_WORK(work, vmstat_update);
> -	schedule_delayed_work_on(cpu, work, __round_jiffies_relative(HZ, cpu));
> +	schedule_delayed_work(&shepherd,
> +		round_jiffies_relative(sysctl_stat_interval));
>  }
> 
>  static void vmstat_cpu_dead(int node)
> @@ -1272,17 +1373,17 @@ static int vmstat_cpuup_callback(struct
>  	case CPU_ONLINE:
>  	case CPU_ONLINE_FROZEN:
>  		refresh_zone_stat_thresholds();
> -		start_cpu_timer(cpu);
>  		node_set_state(cpu_to_node(cpu), N_CPU);
> +		cpumask_set_cpu(cpu, cpu_stat_off);
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> -		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
(you set it, it is BUG according to vmstat_shepherd() and the semantics of the
cpu_stat_off).

	/* mutex_lock(&cpu_stat_off_mutex); */
		/*if you use cpu_stat_off_mutex instead of get_cpu_online() */
	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
	cpumask_clear_cpu(cpu, cpu_stat_off);
	/* mutex_unlock(&cpu_stat_off_mutex); */

	/* don't forget to use cpu_stat_off_mutex on other place for
	   accessing to cpu_stat_off except the one in vmstat_update() which
	   is protected by cancel_delayed_work_sync() + other stuffs
	   please also update that comments and keep that VM_BUG_ON() */


>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> -		start_cpu_timer(cpu);
> +		cpumask_set_cpu(cpu, cpu_stat_off);
>  		break;
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
> @@ -1302,15 +1403,10 @@ static struct notifier_block vmstat_noti
>  static int __init setup_vmstat(void)
>  {
>  #ifdef CONFIG_SMP
> -	int cpu;
> -
>  	cpu_notifier_register_begin();
>  	__register_cpu_notifier(&vmstat_notifier);
> 
> -	for_each_online_cpu(cpu) {
> -		start_cpu_timer(cpu);
> -		node_set_state(cpu_to_node(cpu), N_CPU);
> -	}
> +	start_shepherd_timer();
>  	cpu_notifier_register_done();
>  #endif
>  #ifdef CONFIG_PROC_FS
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-30  2:57   ` Lai Jiangshan
  0 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  2:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

If I understand the semantics of the cpu_stat_off correctly, please read.

cpu_stat_off = a set of such CPU: the cpu is online && vmstat_work is off
I consider some code forget to guarantee each cpu in cpu_stat_off is online.

Thanks,
Lai

On 07/10/2014 10:04 PM, Christoph Lameter wrote:

> +
> +/*
> + * Shepherd worker thread that checks the
> + * differentials of processors that have their worker
> + * threads for vm statistics updates disabled because of
> + * inactivity.
> + */
> +static void vmstat_shepherd(struct work_struct *w);
> +
> +static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
> +
> +static void vmstat_shepherd(struct work_struct *w)
> +{
> +	int cpu;
> +
> +	/* Check processors whose vmstat worker threads have been disabled */

I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
(after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
proper lock.

I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.


	get_cpu_online(); /* mutex_lock(&cpu_stat_off_mutex); */
	for_each_cpu(cpu, cpu_stat_off)
		if (need_update(cpu) &&
			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))

			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
				__round_jiffies_relative(sysctl_stat_interval, cpu));
	put_cpu_online(); /* mutex_unlock(&cpu_stat_off_mutex); */



> +
> +
> +	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
> +
>  }
> 
> -static void start_cpu_timer(int cpu)
> +static void __init start_shepherd_timer(void)
>  {
> -	struct delayed_work *work = &per_cpu(vmstat_work, cpu);
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu)
> +		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> +			vmstat_update);
> +
> +	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
> -	INIT_DEFERRABLE_WORK(work, vmstat_update);
> -	schedule_delayed_work_on(cpu, work, __round_jiffies_relative(HZ, cpu));
> +	schedule_delayed_work(&shepherd,
> +		round_jiffies_relative(sysctl_stat_interval));
>  }
> 
>  static void vmstat_cpu_dead(int node)
> @@ -1272,17 +1373,17 @@ static int vmstat_cpuup_callback(struct
>  	case CPU_ONLINE:
>  	case CPU_ONLINE_FROZEN:
>  		refresh_zone_stat_thresholds();
> -		start_cpu_timer(cpu);
>  		node_set_state(cpu_to_node(cpu), N_CPU);
> +		cpumask_set_cpu(cpu, cpu_stat_off);
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> -		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
(you set it, it is BUG according to vmstat_shepherd() and the semantics of the
cpu_stat_off).

	/* mutex_lock(&cpu_stat_off_mutex); */
		/*if you use cpu_stat_off_mutex instead of get_cpu_online() */
	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
	cpumask_clear_cpu(cpu, cpu_stat_off);
	/* mutex_unlock(&cpu_stat_off_mutex); */

	/* don't forget to use cpu_stat_off_mutex on other place for
	   accessing to cpu_stat_off except the one in vmstat_update() which
	   is protected by cancel_delayed_work_sync() + other stuffs
	   please also update that comments and keep that VM_BUG_ON() */


>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> -		start_cpu_timer(cpu);
> +		cpumask_set_cpu(cpu, cpu_stat_off);
>  		break;
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
> @@ -1302,15 +1403,10 @@ static struct notifier_block vmstat_noti
>  static int __init setup_vmstat(void)
>  {
>  #ifdef CONFIG_SMP
> -	int cpu;
> -
>  	cpu_notifier_register_begin();
>  	__register_cpu_notifier(&vmstat_notifier);
> 
> -	for_each_online_cpu(cpu) {
> -		start_cpu_timer(cpu);
> -		node_set_state(cpu_to_node(cpu), N_CPU);
> -	}
> +	start_shepherd_timer();
>  	cpu_notifier_register_done();
>  #endif
>  #ifdef CONFIG_PROC_FS
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-11 15:17         ` Christoph Lameter
@ 2014-07-30  3:04           ` Lai Jiangshan
  -1 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  3:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm, hughd,
	viresh.kumar, hpa, mingo, peterz

On 07/11/2014 11:17 PM, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
>>> Converted what? We still need to keep a cpumask around that tells us which
>>> processor have vmstat running and which do not.
>>>
>>
>> Converted to cpumask_var_t.
>>
>> I mean we spent dozens emails on that...
> 
> 
> Oh there is this outstanding fix, right.
> 
> 
> Subject: on demand vmstat: Do not open code alloc_cpumask_var
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
> +++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
> @@ -1244,7 +1244,7 @@
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> -struct cpumask *cpu_stat_off;
> +cpumask_var_t cpu_stat_off;
> 
>  static void vmstat_update(struct work_struct *w)
>  {
> @@ -1338,7 +1338,8 @@
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
> 
> -	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
> +		BUG();

	BUG_ON(!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL));

>  	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
>  	schedule_delayed_work(&shepherd,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-30  3:04           ` Lai Jiangshan
  0 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  3:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, akpm, Gilad Ben-Yossef, Thomas Gleixner,
	Tejun Heo, John Stultz, Mike Frysinger, Minchan Kim, Hakan Akkan,
	Max Krasnyansky, Paul E. McKenney, linux-kernel, linux-mm, hughd,
	viresh.kumar, hpa, mingo, peterz

On 07/11/2014 11:17 PM, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
>>> Converted what? We still need to keep a cpumask around that tells us which
>>> processor have vmstat running and which do not.
>>>
>>
>> Converted to cpumask_var_t.
>>
>> I mean we spent dozens emails on that...
> 
> 
> Oh there is this outstanding fix, right.
> 
> 
> Subject: on demand vmstat: Do not open code alloc_cpumask_var
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-11 10:15:55.356856916 -0500
> +++ linux/mm/vmstat.c	2014-07-11 10:15:55.352856994 -0500
> @@ -1244,7 +1244,7 @@
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>  int sysctl_stat_interval __read_mostly = HZ;
> -struct cpumask *cpu_stat_off;
> +cpumask_var_t cpu_stat_off;
> 
>  static void vmstat_update(struct work_struct *w)
>  {
> @@ -1338,7 +1338,8 @@
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
> 
> -	cpu_stat_off = kmalloc(cpumask_size(), GFP_KERNEL);
> +	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
> +		BUG();

	BUG_ON(!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL));

>  	cpumask_copy(cpu_stat_off, cpu_online_mask);
> 
>  	schedule_delayed_work(&shepherd,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-29 15:39                   ` Christoph Lameter
@ 2014-07-30  3:11                     ` Lai Jiangshan
  -1 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  3:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On 07/29/2014 11:39 PM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> Hmmm, well, then it's something else.  Either a bug in workqueue or in
>> the caller.  Given the track record, the latter is more likely.
>> e.g. it looks kinda suspicious that the work func is cleared after
>> cancel_delayed_work_sync() is called.  What happens if somebody tries
>> to schedule it inbetween?
> 
> Here is yet another patch to also address this idea:
> 
> Subject: vmstat: Clear the work.func before cancelling delayed work
> 
> Looks strange to me but Tejun thinks this could do some good.
> If this really is the right thing to do then cancel_delayed_work should
> zap the work func itselt I think.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

I think we should just remove "per_cpu(vmstat_work, cpu).work.func = NULL;"

>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-30  3:11                     ` Lai Jiangshan
  0 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-30  3:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On 07/29/2014 11:39 PM, Christoph Lameter wrote:
> On Tue, 29 Jul 2014, Tejun Heo wrote:
> 
>> Hmmm, well, then it's something else.  Either a bug in workqueue or in
>> the caller.  Given the track record, the latter is more likely.
>> e.g. it looks kinda suspicious that the work func is cleared after
>> cancel_delayed_work_sync() is called.  What happens if somebody tries
>> to schedule it inbetween?
> 
> Here is yet another patch to also address this idea:
> 
> Subject: vmstat: Clear the work.func before cancelling delayed work
> 
> Looks strange to me but Tejun thinks this could do some good.
> If this really is the right thing to do then cancel_delayed_work should
> zap the work func itselt I think.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  		per_cpu(vmstat_work, cpu).work.func = NULL;
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));

I think we should just remove "per_cpu(vmstat_work, cpu).work.func = NULL;"

>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-30  3:11                     ` Lai Jiangshan
@ 2014-07-30 14:34                       ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-30 14:34 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Wed, 30 Jul 2014, Lai Jiangshan wrote:

> >
> >
> > Index: linux/mm/vmstat.c
> > ===================================================================
> > --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> > +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> > @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
> >  		break;
> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> I think we should just remove "per_cpu(vmstat_work, cpu).work.func = NULL;"

It has been removed by the vmstat patch. The patch I posted is against
upstream not against -next.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-30 14:34                       ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-30 14:34 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, Peter Zijlstra, Sasha Levin, akpm, Gilad Ben-Yossef,
	Thomas Gleixner, John Stultz, Mike Frysinger, Minchan Kim,
	Hakan Akkan, Max Krasnyansky, Frederic Weisbecker,
	Paul E. McKenney, linux-kernel, linux-mm, hughd, viresh.kumar,
	hpa, mingo

On Wed, 30 Jul 2014, Lai Jiangshan wrote:

> >
> >
> > Index: linux/mm/vmstat.c
> > ===================================================================
> > --- linux.orig/mm/vmstat.c	2014-07-29 10:22:45.073884943 -0500
> > +++ linux/mm/vmstat.c	2014-07-29 10:34:45.083369228 -0500
> > @@ -1277,8 +1277,8 @@ static int vmstat_cpuup_callback(struct
> >  		break;
> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> I think we should just remove "per_cpu(vmstat_work, cpu).work.func = NULL;"

It has been removed by the vmstat patch. The patch I posted is against
upstream not against -next.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-30  2:57   ` Lai Jiangshan
@ 2014-07-30 14:45     ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-30 14:45 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Wed, 30 Jul 2014, Lai Jiangshan wrote:

> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
> proper lock.

Ok. I guess we need to make the preemption check output more information
so that it tells us that an operation was performed on a processor that is
down?

> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.

If a processor is downed then cpu_stat_off bit should be cleared but also
the worker thread should not run.

> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > -		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> > +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
> be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
> cpu_stat_off).

True.

Subject: vmstat ondemand: Fix online/offline races

Do not allow onlining/offlining while the shepherd task is checking
for vmstat threads.

On offlining a processor do the right thing cancelling the vmstat
worker thread if it exista and also exclude it from the shepherd
process checks.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
+++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
@@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
 {
 	int cpu;

+	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
 		if (need_update(cpu) &&
@@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
 			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
 				__round_jiffies_relative(sysctl_stat_interval, cpu));

+	put_online_cpus();

 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
@@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
-			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cpumask_clear_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-30 14:45     ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-07-30 14:45 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Wed, 30 Jul 2014, Lai Jiangshan wrote:

> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
> proper lock.

Ok. I guess we need to make the preemption check output more information
so that it tells us that an operation was performed on a processor that is
down?

> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.

If a processor is downed then cpu_stat_off bit should be cleared but also
the worker thread should not run.

> >  	case CPU_DOWN_PREPARE:
> >  	case CPU_DOWN_PREPARE_FROZEN:
> > -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > -		per_cpu(vmstat_work, cpu).work.func = NULL;
> > +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> > +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>
> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
> be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
> cpu_stat_off).

True.

Subject: vmstat ondemand: Fix online/offline races

Do not allow onlining/offlining while the shepherd task is checking
for vmstat threads.

On offlining a processor do the right thing cancelling the vmstat
worker thread if it exista and also exclude it from the shepherd
process checks.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
+++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
@@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
 {
 	int cpu;

+	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
 		if (need_update(cpu) &&
@@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
 			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
 				__round_jiffies_relative(sysctl_stat_interval, cpu));

+	put_online_cpus();

 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
@@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
-			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cpumask_clear_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-30 14:45     ` Christoph Lameter
@ 2014-07-31  0:52       ` Lai Jiangshan
  -1 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-31  0:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/30/2014 10:45 PM, Christoph Lameter wrote:
> On Wed, 30 Jul 2014, Lai Jiangshan wrote:
> 
>> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
>> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
>> proper lock.
> 
> Ok. I guess we need to make the preemption check output more information
> so that it tells us that an operation was performed on a processor that is
> down?

If the cpu_allows of the percpu-kworker is changed, the specific processor of the kworker
should have been down if workqueue is implemented correctly.
(the preemption check checks the cpu_allows)
> 
>> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.
> 
> If a processor is downed then cpu_stat_off bit should be cleared but also
> the worker thread should not run.

The kworker need to run for some reasons after the processor is down.
Peter and TJ were just discussing it.

The root cause (TO ME only) here is vmstat queues work to offline (or offlining) CPU,
so the kworker has to run it.  We may add some check for queuing on offline CPU,
but we can't check for higher level user guarantees.  (Example, vmstat can't queue
work to a CPU which is still online but after vmstat_cpuup_callback(CPU_DOWN_PREPARE)).

> 
>>>  	case CPU_DOWN_PREPARE:
>>>  	case CPU_DOWN_PREPARE_FROZEN:
>>> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>> -		per_cpu(vmstat_work, cpu).work.func = NULL;
>>> +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
>>> +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>
>> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
>> be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
>> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
>> cpu_stat_off).
> 
> True.
> 
> Subject: vmstat ondemand: Fix online/offline races
> 
> Do not allow onlining/offlining while the shepherd task is checking
> for vmstat threads.
> 
> On offlining a processor do the right thing cancelling the vmstat
> worker thread if it exista and also exclude it from the shepherd
> process checks.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
> +++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
> @@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
>  {
>  	int cpu;
> 
> +	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
>  		if (need_update(cpu) &&
> @@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
>  			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
>  				__round_jiffies_relative(sysctl_stat_interval, cpu));
> 
> +	put_online_cpus();
> 
>  	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
> @@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> -			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +		cpumask_clear_cpu(cpu, cpu_stat_off);

Sasha Levin's test result?

>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> .
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-07-31  0:52       ` Lai Jiangshan
  0 siblings, 0 replies; 73+ messages in thread
From: Lai Jiangshan @ 2014-07-31  0:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/30/2014 10:45 PM, Christoph Lameter wrote:
> On Wed, 30 Jul 2014, Lai Jiangshan wrote:
> 
>> I think the bug is here, it re-queues the per_cpu(vmstat_work, cpu) which is offline
>> (after vmstat_cpuup_callback(CPU_DOWN_PREPARE).  And cpu_stat_off is accessed without
>> proper lock.
> 
> Ok. I guess we need to make the preemption check output more information
> so that it tells us that an operation was performed on a processor that is
> down?

If the cpu_allows of the percpu-kworker is changed, the specific processor of the kworker
should have been down if workqueue is implemented correctly.
(the preemption check checks the cpu_allows)
> 
>> I suggest to use get_cpu_online() or a new cpu_stat_off_mutex to protect it.
> 
> If a processor is downed then cpu_stat_off bit should be cleared but also
> the worker thread should not run.

The kworker need to run for some reasons after the processor is down.
Peter and TJ were just discussing it.

The root cause (TO ME only) here is vmstat queues work to offline (or offlining) CPU,
so the kworker has to run it.  We may add some check for queuing on offline CPU,
but we can't check for higher level user guarantees.  (Example, vmstat can't queue
work to a CPU which is still online but after vmstat_cpuup_callback(CPU_DOWN_PREPARE)).

> 
>>>  	case CPU_DOWN_PREPARE:
>>>  	case CPU_DOWN_PREPARE_FROZEN:
>>> -		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>> -		per_cpu(vmstat_work, cpu).work.func = NULL;
>>> +		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
>>> +			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>
>> It is suggest that cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)) should
>> be called unconditionally.  And the cpu should be cleared from cpu_stat_off.
>> (you set it, it is BUG according to vmstat_shepherd() and the semantics of the
>> cpu_stat_off).
> 
> True.
> 
> Subject: vmstat ondemand: Fix online/offline races
> 
> Do not allow onlining/offlining while the shepherd task is checking
> for vmstat threads.
> 
> On offlining a processor do the right thing cancelling the vmstat
> worker thread if it exista and also exclude it from the shepherd
> process checks.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
> +++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
> @@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
>  {
>  	int cpu;
> 
> +	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
>  		if (need_update(cpu) &&
> @@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
>  			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
>  				__round_jiffies_relative(sysctl_stat_interval, cpu));
> 
> +	put_online_cpus();
> 
>  	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
> @@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
> -			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +		cpumask_clear_cpu(cpu, cpu_stat_off);

Sasha Levin's test result?

>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-07-26  2:22   ` Sasha Levin
@ 2014-08-04 21:37     ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-04 21:37 UTC (permalink / raw)
  To: Christoph Lameter, akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/25/2014 10:22 PM, Sasha Levin wrote:
> On 07/10/2014 10:04 AM, Christoph Lameter wrote:
>> > This patch creates a vmstat shepherd worker that monitors the
>> > per cpu differentials on all processors. If there are differentials
>> > on a processor then a vmstat worker local to the processors
>> > with the differentials is created. That worker will then start
>> > folding the diffs in regular intervals. Should the worker
>> > find that there is no work to be done then it will make the shepherd
>> > worker monitor the differentials again.
> Hi Christoph, all,
> 
> This patch doesn't interact well with my fuzzing setup. I'm seeing
> the following:

I think we got sidetracked here a bit, I've noticed that this issue
is still happening in -next and discussions here died out.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-08-04 21:37     ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-04 21:37 UTC (permalink / raw)
  To: Christoph Lameter, akpm
  Cc: Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 07/25/2014 10:22 PM, Sasha Levin wrote:
> On 07/10/2014 10:04 AM, Christoph Lameter wrote:
>> > This patch creates a vmstat shepherd worker that monitors the
>> > per cpu differentials on all processors. If there are differentials
>> > on a processor then a vmstat worker local to the processors
>> > with the differentials is created. That worker will then start
>> > folding the diffs in regular intervals. Should the worker
>> > find that there is no work to be done then it will make the shepherd
>> > worker monitor the differentials again.
> Hi Christoph, all,
> 
> This patch doesn't interact well with my fuzzing setup. I'm seeing
> the following:

I think we got sidetracked here a bit, I've noticed that this issue
is still happening in -next and discussions here died out.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-08-04 21:37     ` Sasha Levin
@ 2014-08-05 14:51       ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-08-05 14:51 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 4 Aug 2014, Sasha Levin wrote:

> On 07/25/2014 10:22 PM, Sasha Levin wrote:
> > On 07/10/2014 10:04 AM, Christoph Lameter wrote:
> >> > This patch creates a vmstat shepherd worker that monitors the
> >> > per cpu differentials on all processors. If there are differentials
> >> > on a processor then a vmstat worker local to the processors
> >> > with the differentials is created. That worker will then start
> >> > folding the diffs in regular intervals. Should the worker
> >> > find that there is no work to be done then it will make the shepherd
> >> > worker monitor the differentials again.
> > Hi Christoph, all,
> >
> > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > the following:
>
> I think we got sidetracked here a bit, I've noticed that this issue
> is still happening in -next and discussions here died out.

Ok I saw in another thread that this issue has gone away. Is there an
easy way to reproduce this on my system?


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-08-05 14:51       ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-08-05 14:51 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Mon, 4 Aug 2014, Sasha Levin wrote:

> On 07/25/2014 10:22 PM, Sasha Levin wrote:
> > On 07/10/2014 10:04 AM, Christoph Lameter wrote:
> >> > This patch creates a vmstat shepherd worker that monitors the
> >> > per cpu differentials on all processors. If there are differentials
> >> > on a processor then a vmstat worker local to the processors
> >> > with the differentials is created. That worker will then start
> >> > folding the diffs in regular intervals. Should the worker
> >> > find that there is no work to be done then it will make the shepherd
> >> > worker monitor the differentials again.
> > Hi Christoph, all,
> >
> > This patch doesn't interact well with my fuzzing setup. I'm seeing
> > the following:
>
> I think we got sidetracked here a bit, I've noticed that this issue
> is still happening in -next and discussions here died out.

Ok I saw in another thread that this issue has gone away. Is there an
easy way to reproduce this on my system?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-08-05 14:51       ` Christoph Lameter
@ 2014-08-05 22:25         ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-05 22:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 08/05/2014 10:51 AM, Christoph Lameter wrote:
> On Mon, 4 Aug 2014, Sasha Levin wrote:
> 
>> On 07/25/2014 10:22 PM, Sasha Levin wrote:
>>> On 07/10/2014 10:04 AM, Christoph Lameter wrote:
>>>>> This patch creates a vmstat shepherd worker that monitors the
>>>>> per cpu differentials on all processors. If there are differentials
>>>>> on a processor then a vmstat worker local to the processors
>>>>> with the differentials is created. That worker will then start
>>>>> folding the diffs in regular intervals. Should the worker
>>>>> find that there is no work to be done then it will make the shepherd
>>>>> worker monitor the differentials again.
>>> Hi Christoph, all,
>>>
>>> This patch doesn't interact well with my fuzzing setup. I'm seeing
>>> the following:
>>
>> I think we got sidetracked here a bit, I've noticed that this issue
>> is still happening in -next and discussions here died out.
> 
> Ok I saw in another thread that this issue has gone away. Is there an
> easy way to reproduce this on my system?
> 

I don't see the VM_BUG_ON anymore, but the cpu warnings are still there.

I can easily trigger it by cranking up the cpu hotplug code. Just try to
frequently offline and online cpus, it should reproduce quickly.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-08-05 22:25         ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-05 22:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 08/05/2014 10:51 AM, Christoph Lameter wrote:
> On Mon, 4 Aug 2014, Sasha Levin wrote:
> 
>> On 07/25/2014 10:22 PM, Sasha Levin wrote:
>>> On 07/10/2014 10:04 AM, Christoph Lameter wrote:
>>>>> This patch creates a vmstat shepherd worker that monitors the
>>>>> per cpu differentials on all processors. If there are differentials
>>>>> on a processor then a vmstat worker local to the processors
>>>>> with the differentials is created. That worker will then start
>>>>> folding the diffs in regular intervals. Should the worker
>>>>> find that there is no work to be done then it will make the shepherd
>>>>> worker monitor the differentials again.
>>> Hi Christoph, all,
>>>
>>> This patch doesn't interact well with my fuzzing setup. I'm seeing
>>> the following:
>>
>> I think we got sidetracked here a bit, I've noticed that this issue
>> is still happening in -next and discussions here died out.
> 
> Ok I saw in another thread that this issue has gone away. Is there an
> easy way to reproduce this on my system?
> 

I don't see the VM_BUG_ON anymore, but the cpu warnings are still there.

I can easily trigger it by cranking up the cpu hotplug code. Just try to
frequently offline and online cpus, it should reproduce quickly.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-08-05 22:25         ` Sasha Levin
@ 2014-08-06 14:12           ` Christoph Lameter
  -1 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-08-06 14:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Tue, 5 Aug 2014, Sasha Levin wrote:

> I can easily trigger it by cranking up the cpu hotplug code. Just try to
> frequently offline and online cpus, it should reproduce quickly.

Thats what I thought.

The test was done with this fix applied right?


Subject: vmstat ondemand: Fix online/offline races

Do not allow onlining/offlining while the shepherd task is checking
for vmstat threads.

On offlining a processor do the right thing cancelling the vmstat
worker thread if it exista and also exclude it from the shepherd
process checks.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
+++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
@@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
 {
 	int cpu;

+	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
 		if (need_update(cpu) &&
@@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
 			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
 				__round_jiffies_relative(sysctl_stat_interval, cpu));

+	put_online_cpus();

 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
@@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
-			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cpumask_clear_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-08-06 14:12           ` Christoph Lameter
  0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2014-08-06 14:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On Tue, 5 Aug 2014, Sasha Levin wrote:

> I can easily trigger it by cranking up the cpu hotplug code. Just try to
> frequently offline and online cpus, it should reproduce quickly.

Thats what I thought.

The test was done with this fix applied right?


Subject: vmstat ondemand: Fix online/offline races

Do not allow onlining/offlining while the shepherd task is checking
for vmstat threads.

On offlining a processor do the right thing cancelling the vmstat
worker thread if it exista and also exclude it from the shepherd
process checks.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c	2014-07-30 09:35:54.602662306 -0500
+++ linux/mm/vmstat.c	2014-07-30 09:43:07.109037043 -0500
@@ -1317,6 +1317,7 @@ static void vmstat_shepherd(struct work_
 {
 	int cpu;

+	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
 		if (need_update(cpu) &&
@@ -1325,6 +1326,7 @@ static void vmstat_shepherd(struct work_
 			schedule_delayed_work_on(cpu, &per_cpu(vmstat_work, cpu),
 				__round_jiffies_relative(sysctl_stat_interval, cpu));

+	put_online_cpus();

 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
@@ -1380,8 +1382,8 @@ static int vmstat_cpuup_callback(struct
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		if (!cpumask_test_and_set_cpu(cpu, cpu_stat_off))
-			cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+		cpumask_clear_cpu(cpu, cpu_stat_off);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
  2014-08-06 14:12           ` Christoph Lameter
@ 2014-08-07  1:50             ` Sasha Levin
  -1 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-07  1:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 08/06/2014 10:12 AM, Christoph Lameter wrote:
> On Tue, 5 Aug 2014, Sasha Levin wrote:
> 
>> > I can easily trigger it by cranking up the cpu hotplug code. Just try to
>> > frequently offline and online cpus, it should reproduce quickly.
> Thats what I thought.
> 
> The test was done with this fix applied right?

Nope, I never saw the patch before. Applied it and the problem goes away. Thanks!


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: vmstat: On demand vmstat workers V8
@ 2014-08-07  1:50             ` Sasha Levin
  0 siblings, 0 replies; 73+ messages in thread
From: Sasha Levin @ 2014-08-07  1:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Gilad Ben-Yossef, Thomas Gleixner, Tejun Heo, John Stultz,
	Mike Frysinger, Minchan Kim, Hakan Akkan, Max Krasnyansky,
	Frederic Weisbecker, Paul E. McKenney, linux-kernel, linux-mm,
	hughd, viresh.kumar, hpa, mingo, peterz

On 08/06/2014 10:12 AM, Christoph Lameter wrote:
> On Tue, 5 Aug 2014, Sasha Levin wrote:
> 
>> > I can easily trigger it by cranking up the cpu hotplug code. Just try to
>> > frequently offline and online cpus, it should reproduce quickly.
> Thats what I thought.
> 
> The test was done with this fix applied right?

Nope, I never saw the patch before. Applied it and the problem goes away. Thanks!


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2014-08-07  1:51 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-10 14:04 vmstat: On demand vmstat workers V8 Christoph Lameter
2014-07-10 14:04 ` Christoph Lameter
2014-07-11 13:20 ` Frederic Weisbecker
2014-07-11 13:20   ` Frederic Weisbecker
2014-07-11 13:56   ` Christoph Lameter
2014-07-11 13:56     ` Christoph Lameter
2014-07-11 13:58     ` Frederic Weisbecker
2014-07-11 13:58       ` Frederic Weisbecker
2014-07-11 15:17       ` Christoph Lameter
2014-07-11 15:17         ` Christoph Lameter
2014-07-11 15:19         ` Frederic Weisbecker
2014-07-11 15:19           ` Frederic Weisbecker
2014-07-11 15:22           ` Christoph Lameter
2014-07-11 15:22             ` Christoph Lameter
2014-07-14 20:10             ` Hugh Dickins
2014-07-14 20:10               ` Hugh Dickins
2014-07-14 20:51               ` Christoph Lameter
2014-07-14 20:51                 ` Christoph Lameter
2014-07-30  3:04         ` Lai Jiangshan
2014-07-30  3:04           ` Lai Jiangshan
2014-07-26  2:22 ` Sasha Levin
2014-07-26  2:22   ` Sasha Levin
2014-07-28 18:55   ` Christoph Lameter
2014-07-28 18:55     ` Christoph Lameter
2014-07-28 21:54     ` Andrew Morton
2014-07-28 21:54       ` Andrew Morton
2014-07-28 22:00       ` Sasha Levin
2014-07-28 22:00         ` Sasha Levin
2014-07-29 15:17       ` Christoph Lameter
2014-07-29 15:17         ` Christoph Lameter
2014-07-29  7:56     ` Peter Zijlstra
2014-07-29 12:05       ` Tejun Heo
2014-07-29 12:05         ` Tejun Heo
2014-07-29 12:23         ` Peter Zijlstra
2014-07-29 12:23           ` Peter Zijlstra
2014-07-29 13:12           ` Tejun Heo
2014-07-29 13:12             ` Tejun Heo
2014-07-29 15:10             ` Christoph Lameter
2014-07-29 15:10               ` Christoph Lameter
2014-07-29 15:14               ` Tejun Heo
2014-07-29 15:14                 ` Tejun Heo
2014-07-29 15:26                 ` Christoph Lameter
2014-07-29 15:26                   ` Christoph Lameter
2014-07-29 15:39                 ` Christoph Lameter
2014-07-29 15:39                   ` Christoph Lameter
2014-07-29 15:47                   ` Sasha Levin
2014-07-29 15:47                     ` Sasha Levin
2014-07-29 15:59                     ` Christoph Lameter
2014-07-29 15:59                       ` Christoph Lameter
2014-07-30  3:11                   ` Lai Jiangshan
2014-07-30  3:11                     ` Lai Jiangshan
2014-07-30 14:34                     ` Christoph Lameter
2014-07-30 14:34                       ` Christoph Lameter
2014-07-29 15:22             ` Christoph Lameter
2014-07-29 15:22               ` Christoph Lameter
2014-07-29 15:43               ` Sasha Levin
2014-07-29 15:43                 ` Sasha Levin
2014-08-04 21:37   ` Sasha Levin
2014-08-04 21:37     ` Sasha Levin
2014-08-05 14:51     ` Christoph Lameter
2014-08-05 14:51       ` Christoph Lameter
2014-08-05 22:25       ` Sasha Levin
2014-08-05 22:25         ` Sasha Levin
2014-08-06 14:12         ` Christoph Lameter
2014-08-06 14:12           ` Christoph Lameter
2014-08-07  1:50           ` Sasha Levin
2014-08-07  1:50             ` Sasha Levin
2014-07-30  2:57 ` Lai Jiangshan
2014-07-30  2:57   ` Lai Jiangshan
2014-07-30 14:45   ` Christoph Lameter
2014-07-30 14:45     ` Christoph Lameter
2014-07-31  0:52     ` Lai Jiangshan
2014-07-31  0:52       ` Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.