All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-03-20 13:48 ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:48 UTC (permalink / raw)
  To: tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups, Viresh Kumar

For networking platforms we need to provide one isolated CPU per user space data
plane thread. These CPUs should not be interrupted by kernel at all unless
userspace has requested for some syscalls. And so must live in isolated mode.
Currently, there are background kernel activities that are running on almost
every CPU, like: timers/hrtimers/watchdogs/etc, and these are required to be
migrated to other CPUs.

This patchset tries to migrate un-pinned timers away in the first attempt. And
support for migrating others like hrtimers/workqueues/etc would be added later.

This has only went through basic testing currently on ARM Samsung Exynos board
which only has two CPUs. Separate cpusets were created for these two CPUs and
then timers were migrated from one cpuset to other.

This option was earlier suggested by Peter Z. here.

https://lkml.org/lkml/2014/1/15/186

Please provide your inputs on how this can be improved..

Viresh Kumar (4):
  timer: track pinned timers with TIMER_PINNED flag
  timer: don't migrate pinned timers
  timer: create timer_quiesce_cpu() for cpusets.quiesce option
  cpuset: Add cpusets.quiesce option

 include/linux/timer.h | 10 +++---
 kernel/cpuset.c       | 56 +++++++++++++++++++++++++++++++
 kernel/timer.c        | 92 +++++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 22 deletions(-)

-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-03-20 13:48 ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:48 UTC (permalink / raw)
  To: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Viresh Kumar

For networking platforms we need to provide one isolated CPU per user space data
plane thread. These CPUs should not be interrupted by kernel at all unless
userspace has requested for some syscalls. And so must live in isolated mode.
Currently, there are background kernel activities that are running on almost
every CPU, like: timers/hrtimers/watchdogs/etc, and these are required to be
migrated to other CPUs.

This patchset tries to migrate un-pinned timers away in the first attempt. And
support for migrating others like hrtimers/workqueues/etc would be added later.

This has only went through basic testing currently on ARM Samsung Exynos board
which only has two CPUs. Separate cpusets were created for these two CPUs and
then timers were migrated from one cpuset to other.

This option was earlier suggested by Peter Z. here.

https://lkml.org/lkml/2014/1/15/186

Please provide your inputs on how this can be improved..

Viresh Kumar (4):
  timer: track pinned timers with TIMER_PINNED flag
  timer: don't migrate pinned timers
  timer: create timer_quiesce_cpu() for cpusets.quiesce option
  cpuset: Add cpusets.quiesce option

 include/linux/timer.h | 10 +++---
 kernel/cpuset.c       | 56 +++++++++++++++++++++++++++++++
 kernel/timer.c        | 92 +++++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 22 deletions(-)

-- 
1.7.12.rc2.18.g61b472e

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 1/4] timer: track pinned timers with TIMER_PINNED flag
  2014-03-20 13:48 ` Viresh Kumar
  (?)
@ 2014-03-20 13:48 ` Viresh Kumar
  -1 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:48 UTC (permalink / raw)
  To: tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups, Viresh Kumar

In order to quiesce a CPU on which Isolation might be required, we need to move
away all the timers queued on that CPU. There are two types of timers queued on
any CPU: ones that are pinned to that CPU and others can run on any CPU but are
queued on CPU in question. And we need to migrate only the second type of timers
away from the CPU entering quiesce state.

For this we need some basic infrastructure in timer core to identify which
timers are pinned and which are not.

Hence, this patch adds another flag bit TIMER_PINNED which will be set only for
the timers which are pinned to a CPU.

It also removes 'pinned' parameter of __mod_timer() as it is no more required.

NOTE: One functional change worth mentioning

Existing Behavior: add_timer_on() followed by multiple mod_timer() wouldn't pin
the timer on CPU mentioned in add_timer_on()..

New Behavior: add_timer_on() followed by multiple mod_timer() would pin the
timer on CPU running mod_timer().

I didn't gave much attention to this as we should call mod_timer_on() for the
timers queued with add_timer_on(). Though if required we can simply clear the
TIMER_PINNED flag in mod_timer().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/timer.h | 10 ++++++----
 kernel/timer.c        | 27 ++++++++++++++++++++-------
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..2962403 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -49,7 +49,7 @@ extern struct tvec_base boot_tvec_bases;
 #endif
 
 /*
- * Note that all tvec_bases are at least 4 byte aligned and lower two bits
+ * Note that all tvec_bases are at least 8 byte aligned and lower three bits
  * of base in timer_list is guaranteed to be zero. Use them for flags.
  *
  * A deferrable timer will work normally when the system is busy, but
@@ -61,14 +61,18 @@ extern struct tvec_base boot_tvec_bases;
  * the completion of the running instance from IRQ handlers, for example,
  * by calling del_timer_sync().
  *
+ * A pinned timer is allowed to run only on the cpu mentioned and shouldn't be
+ * migrated to any other CPU.
+ *
  * Note: The irq disabled callback execution is a special case for
  * workqueue locking issues. It's not meant for executing random crap
  * with interrupts disabled. Abuse is monitored!
  */
 #define TIMER_DEFERRABLE		0x1LU
 #define TIMER_IRQSAFE			0x2LU
+#define TIMER_PINNED			0x4LU
 
-#define TIMER_FLAG_MASK			0x3LU
+#define TIMER_FLAG_MASK			0x7LU
 
 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
@@ -179,8 +183,6 @@ extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);
 
 extern void set_timer_slack(struct timer_list *time, int slack_hz);
 
-#define TIMER_NOT_PINNED	0
-#define TIMER_PINNED		1
 /*
  * The jiffies value which is added to now, when there is no timer
  * in the timer wheel:
diff --git a/kernel/timer.c b/kernel/timer.c
index 87bd529..fec4ab4 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -104,6 +104,11 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
 	return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
 }
 
+static inline unsigned int tbase_get_pinned(struct tvec_base *base)
+{
+	return ((unsigned int)(unsigned long)base & TIMER_PINNED);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -117,6 +122,13 @@ timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
 	timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags);
 }
 
+static inline void
+timer_set_flags(struct timer_list *timer, unsigned int flags)
+{
+	timer->base = (struct tvec_base *)((unsigned long)(timer->base) |
+					   flags);
+}
+
 static unsigned long round_jiffies_common(unsigned long j, int cpu,
 		bool force_up)
 {
@@ -742,8 +754,7 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
 }
 
 static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires,
-						bool pending_only, int pinned)
+__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 {
 	struct tvec_base *base, *new_base;
 	unsigned long flags;
@@ -760,7 +771,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 
 	debug_activate(timer, expires);
 
-	cpu = get_nohz_timer_target(pinned);
+	cpu = get_nohz_timer_target(!!tbase_get_pinned(timer->base));
 	new_base = per_cpu(tvec_bases, cpu);
 
 	if (base != new_base) {
@@ -802,7 +813,7 @@ out_unlock:
  */
 int mod_timer_pending(struct timer_list *timer, unsigned long expires)
 {
-	return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
+	return __mod_timer(timer, expires, true);
 }
 EXPORT_SYMBOL(mod_timer_pending);
 
@@ -877,7 +888,7 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
 	if (timer_pending(timer) && timer->expires == expires)
 		return 1;
 
-	return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
+	return __mod_timer(timer, expires, false);
 }
 EXPORT_SYMBOL(mod_timer);
 
@@ -905,7 +916,8 @@ int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
 	if (timer->expires == expires && timer_pending(timer))
 		return 1;
 
-	return __mod_timer(timer, expires, false, TIMER_PINNED);
+	timer_set_flags(timer, TIMER_PINNED);
+	return __mod_timer(timer, expires, false);
 }
 EXPORT_SYMBOL(mod_timer_pinned);
 
@@ -944,6 +956,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
+	timer_set_flags(timer, TIMER_PINNED);
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	debug_activate(timer, timer->expires);
@@ -1493,7 +1506,7 @@ signed long __sched schedule_timeout(signed long timeout)
 	expire = timeout + jiffies;
 
 	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
-	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
+	__mod_timer(&timer, expire, false);
 	schedule();
 	del_singleshot_timer_sync(&timer);
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 2/4] timer: don't migrate pinned timers
  2014-03-20 13:48 ` Viresh Kumar
  (?)
  (?)
@ 2014-03-20 13:48 ` Viresh Kumar
  2014-03-31 15:56     ` Kevin Hilman
  -1 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:48 UTC (permalink / raw)
  To: tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups, Viresh Kumar

migrate_timer() is called when a CPU goes down and its timers are required to be
migrated to some other CPU. Its the responsibility of the users of the timer to
remove it before control reaches to migrate_timers().

As these were the pinned timers, the best we can do is: don't migrate these and
report to the user as well.

That's all this patch does.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/timer.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index fec4ab4..a7f8b99 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1606,11 +1606,22 @@ static int init_timers_cpu(int cpu)
 static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
 {
 	struct timer_list *timer;
+	int is_pinned;
 
 	while (!list_empty(head)) {
 		timer = list_first_entry(head, struct timer_list, entry);
 		/* We ignore the accounting on the dying cpu */
 		detach_timer(timer, false);
+
+		is_pinned = tbase_get_pinned(timer->base);
+
+		/* Check if CPU still has pinned timers */
+		if (is_pinned) {
+			pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
+					__func__, timer);
+			continue;
+		}
+
 		timer_set_base(timer, new_base);
 		internal_add_timer(new_base, timer);
 	}
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 3/4] timer: create timer_quiesce_cpu() for cpusets.quiesce option
  2014-03-20 13:48 ` Viresh Kumar
                   ` (2 preceding siblings ...)
  (?)
@ 2014-03-20 13:49 ` Viresh Kumar
  -1 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:49 UTC (permalink / raw)
  To: tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups, Viresh Kumar

Cpusets would be using timer migration code to isolate/quiese a CPU. And hence
that should be changed a bit as the CPU in question would be online now. Also we
need to make sure that we don't remove pinned timers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/timer.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index a7f8b99..81d64e0 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1602,18 +1602,27 @@ static int init_timers_cpu(int cpu)
 	return 0;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_CPUSETS)
+static void migrate_timer_list(struct tvec_base *new_base,
+			       struct list_head *head, bool remove_pinned)
 {
 	struct timer_list *timer;
+	struct list_head pinned_list;
 	int is_pinned;
 
+	INIT_LIST_HEAD(&pinned_list);
+
 	while (!list_empty(head)) {
 		timer = list_first_entry(head, struct timer_list, entry);
-		/* We ignore the accounting on the dying cpu */
-		detach_timer(timer, false);
 
 		is_pinned = tbase_get_pinned(timer->base);
+		if (!remove_pinned && is_pinned) {
+			list_move_tail(&timer->entry, &pinned_list);
+			continue;
+		} else {
+			/* We ignore the accounting on the dying cpu */
+			detach_timer(timer, false);
+		}
 
 		/* Check if CPU still has pinned timers */
 		if (is_pinned) {
@@ -1625,15 +1634,18 @@ static void migrate_timer_list(struct tvec_base *new_base, struct list_head *hea
 		timer_set_base(timer, new_base);
 		internal_add_timer(new_base, timer);
 	}
+
+	if (!list_empty(&pinned_list))
+		list_splice_tail(&pinned_list, head);
 }
 
-static void migrate_timers(int cpu)
+/* Migrate timers from 'cpu' to this_cpu */
+static void __migrate_timers(int cpu, bool remove_pinned)
 {
 	struct tvec_base *old_base;
 	struct tvec_base *new_base;
 	int i;
 
-	BUG_ON(cpu_online(cpu));
 	old_base = per_cpu(tvec_bases, cpu);
 	new_base = get_cpu_var(tvec_bases);
 	/*
@@ -1646,20 +1658,40 @@ static void migrate_timers(int cpu)
 	BUG_ON(old_base->running_timer);
 
 	for (i = 0; i < TVR_SIZE; i++)
-		migrate_timer_list(new_base, old_base->tv1.vec + i);
+		migrate_timer_list(new_base, old_base->tv1.vec + i,
+				remove_pinned);
 	for (i = 0; i < TVN_SIZE; i++) {
-		migrate_timer_list(new_base, old_base->tv2.vec + i);
-		migrate_timer_list(new_base, old_base->tv3.vec + i);
-		migrate_timer_list(new_base, old_base->tv4.vec + i);
-		migrate_timer_list(new_base, old_base->tv5.vec + i);
+		migrate_timer_list(new_base, old_base->tv2.vec + i,
+				remove_pinned);
+		migrate_timer_list(new_base, old_base->tv3.vec + i,
+				remove_pinned);
+		migrate_timer_list(new_base, old_base->tv4.vec + i,
+				remove_pinned);
+		migrate_timer_list(new_base, old_base->tv5.vec + i,
+				remove_pinned);
 	}
 
 	spin_unlock(&old_base->lock);
 	spin_unlock_irq(&new_base->lock);
 	put_cpu_var(tvec_bases);
 }
+#endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void migrate_timers(int cpu)
+{
+	BUG_ON(cpu_online(cpu));
+	__migrate_timers(cpu, true);
+}
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_CPUSETS
+void timer_quiesce_cpu(void *data)
+{
+	__migrate_timers((int)data, false);
+}
+#endif /* CONFIG_CPUSETS */
+
 static int timer_cpu_notify(struct notifier_block *self,
 				unsigned long action, void *hcpu)
 {
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 4/4] cpuset: Add cpusets.quiesce option
  2014-03-20 13:48 ` Viresh Kumar
                   ` (3 preceding siblings ...)
  (?)
@ 2014-03-20 13:49 ` Viresh Kumar
  2014-03-27  2:47     ` Li Zefan
  -1 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2014-03-20 13:49 UTC (permalink / raw)
  To: tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups, Viresh Kumar

For networking applications platforms need to provide one CPU per each user
space data plane thread. These CPUs should not be interrupted by kernel at all
unless userspace has requested for some syscalls. Currently, there are
background kernel activities that are running on almost every CPU, like:
timers/hrtimers/watchdogs/etc, and these are required to be migrated to other
CPUs.

To achieve that, this patch adds another option to cpusets, i.e. 'quiesce'.
Writing '1' on this file would migrate these unbound/unpinned timers/workqueues
away from the CPUs of the cpuset in question. Writing '0' has no effect and this
file can't be read from userspace as we aren't maintaining a state here.

Currently, only timers are migrated. This would be followed by other kernel
infrastructure later.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/cpuset.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..1b79ae6 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -43,10 +43,12 @@
 #include <linux/pagemap.h>
 #include <linux/proc_fs.h>
 #include <linux/rcupdate.h>
+#include <linux/tick.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/security.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/spinlock.h>
 #include <linux/stat.h>
 #include <linux/string.h>
@@ -150,6 +152,7 @@ typedef enum {
 	CS_SCHED_LOAD_BALANCE,
 	CS_SPREAD_PAGE,
 	CS_SPREAD_SLAB,
+	CS_QUIESCE,
 } cpuset_flagbits_t;
 
 /* convenient tests for these bits */
@@ -1208,6 +1211,44 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
 	return 0;
 }
 
+void timer_quiesce_cpu(void *cpu);
+
+/**
+ * quiesce_cpuset - Move unbound timers/workqueues away from cpuset.cpus
+ * @cs: cpuset to be quiesced
+ *
+ * For isolating a core with cpusets we require all unbound timers/workqueues to
+ * move away for isolated core. For simplicity, currently we migrate these to
+ * the first online CPU which is not part of tick_nohz_full_mask.
+ *
+ * Currently we are only migrating timers away.
+ */
+void quiesce_cpuset(struct cpuset *cs)
+{
+	int from_cpu, to_cpu;
+	cpumask_t cpumask;
+
+	cpumask_andnot(&cpumask, cpu_online_mask, cs->cpus_allowed);
+
+#ifdef CONFIG_NO_HZ_FULL
+	cpumask_andnot(&cpumask, &cpumask, tick_nohz_full_mask);
+#endif
+
+	if (cpumask_empty(&cpumask)) {
+		pr_err("%s: Couldn't find a CPU to migrate to\n", __func__);
+		return;
+	}
+
+	to_cpu = cpumask_first(&cpumask);
+
+	for_each_cpu(from_cpu, cs->cpus_allowed) {
+		pr_debug("%s: Migrating from CPU:%d to CPU:%d\n", __func__,
+				from_cpu, to_cpu);
+		smp_call_function_single(to_cpu, timer_quiesce_cpu,
+				(void *)from_cpu, true);
+	}
+}
+
 /**
  * update_tasks_flags - update the spread flags of tasks in the cpuset.
  * @cs: the cpuset in which each task's spread flags needs to be changed
@@ -1244,6 +1285,11 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	int spread_flag_changed;
 	int err;
 
+	if (bit == CS_QUIESCE && turning_on) {
+		quiesce_cpuset(cs);
+		return 0;
+	}
+
 	trialcs = alloc_trial_cpuset(cs);
 	if (!trialcs)
 		return -ENOMEM;
@@ -1526,6 +1572,7 @@ typedef enum {
 	FILE_MEMORY_PRESSURE,
 	FILE_SPREAD_PAGE,
 	FILE_SPREAD_SLAB,
+	FILE_CPU_QUIESCE,
 } cpuset_filetype_t;
 
 static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1569,6 +1616,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	case FILE_SPREAD_SLAB:
 		retval = update_flag(CS_SPREAD_SLAB, cs, val);
 		break;
+	case FILE_CPU_QUIESCE:
+		retval = update_flag(CS_QUIESCE, cs, val);
+		break;
 	default:
 		retval = -EINVAL;
 		break;
@@ -1837,6 +1887,12 @@ static struct cftype files[] = {
 		.private = FILE_MEMORY_PRESSURE_ENABLED,
 	},
 
+	{
+		.name = "quiesce",
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_CPU_QUIESCE,
+	},
+
 	{ }	/* terminate */
 };
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 4/4] cpuset: Add cpusets.quiesce option
@ 2014-03-27  2:47     ` Li Zefan
  0 siblings, 0 replies; 32+ messages in thread
From: Li Zefan @ 2014-03-27  2:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: tglx, fweisbec, peterz, mingo, tj, linaro-kernel, linux-kernel, cgroups

On 2014/3/20 21:49, Viresh Kumar wrote:
> For networking applications platforms need to provide one CPU per each user
> space data plane thread. These CPUs should not be interrupted by kernel at all
> unless userspace has requested for some syscalls. Currently, there are
> background kernel activities that are running on almost every CPU, like:
> timers/hrtimers/watchdogs/etc, and these are required to be migrated to other
> CPUs.
> 
> To achieve that, this patch adds another option to cpusets, i.e. 'quiesce'.
> Writing '1' on this file would migrate these unbound/unpinned timers/workqueues
> away from the CPUs of the cpuset in question. Writing '0' has no effect and this
> file can't be read from userspace as we aren't maintaining a state here.
> 

This doesn't look like a complete solution, because newer timers/workqueues can
still run in those CPUs. Seems like the proposal discussed is to support setting
cpu affinity for workqueues through sysfs. If so, we can migrate workqueues when
affinity is set, so we don't need this cpuset.quiesce ?

> Currently, only timers are migrated. This would be followed by other kernel
> infrastructure later.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 4/4] cpuset: Add cpusets.quiesce option
@ 2014-03-27  2:47     ` Li Zefan
  0 siblings, 0 replies; 32+ messages in thread
From: Li Zefan @ 2014-03-27  2:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	tj-DgEjT+Ai2ygdnm+yROfE0A, linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 2014/3/20 21:49, Viresh Kumar wrote:
> For networking applications platforms need to provide one CPU per each user
> space data plane thread. These CPUs should not be interrupted by kernel at all
> unless userspace has requested for some syscalls. Currently, there are
> background kernel activities that are running on almost every CPU, like:
> timers/hrtimers/watchdogs/etc, and these are required to be migrated to other
> CPUs.
> 
> To achieve that, this patch adds another option to cpusets, i.e. 'quiesce'.
> Writing '1' on this file would migrate these unbound/unpinned timers/workqueues
> away from the CPUs of the cpuset in question. Writing '0' has no effect and this
> file can't be read from userspace as we aren't maintaining a state here.
> 

This doesn't look like a complete solution, because newer timers/workqueues can
still run in those CPUs. Seems like the proposal discussed is to support setting
cpu affinity for workqueues through sysfs. If so, we can migrate workqueues when
affinity is set, so we don't need this cpuset.quiesce ?

> Currently, only timers are migrated. This would be followed by other kernel
> infrastructure later.
> 
> Suggested-by: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Signed-off-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 4/4] cpuset: Add cpusets.quiesce option
@ 2014-03-27  4:29       ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-27  4:29 UTC (permalink / raw)
  To: Li Zefan
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 27 March 2014 08:17, Li Zefan <lizefan@huawei.com> wrote:
> This doesn't look like a complete solution, because newer timers/workqueues can
> still run in those CPUs.

The initial idea was to disable load balance between CPUs and then do this.
So, that new timers and workqueues from other CPUs would never get
queued on this CPU..

But I think we can just modify get_nohz_timer_target() for making sure this
for timers..

> Seems like the proposal discussed is to support setting
> cpu affinity for workqueues through sysfs. If so, we can migrate workqueues when
> affinity is set, so we don't need this cpuset.quiesce ?

That was another thread just for workqueues, but this one is about migrating
everything else as well.. Probably some more additions apart from timers/
hrtimers/wqs in future. So, for us it is still required :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 4/4] cpuset: Add cpusets.quiesce option
@ 2014-03-27  4:29       ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-03-27  4:29 UTC (permalink / raw)
  To: Li Zefan
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 27 March 2014 08:17, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> This doesn't look like a complete solution, because newer timers/workqueues can
> still run in those CPUs.

The initial idea was to disable load balance between CPUs and then do this.
So, that new timers and workqueues from other CPUs would never get
queued on this CPU..

But I think we can just modify get_nohz_timer_target() for making sure this
for timers..

> Seems like the proposal discussed is to support setting
> cpu affinity for workqueues through sysfs. If so, we can migrate workqueues when
> affinity is set, so we don't need this cpuset.quiesce ?

That was another thread just for workqueues, but this one is about migrating
everything else as well.. Probably some more additions apart from timers/
hrtimers/wqs in future. So, for us it is still required :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-03-31 15:56     ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-03-31 15:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: tglx, fweisbec, peterz, mingo, tj, lizefan, linaro-kernel,
	linux-kernel, cgroups

Viresh Kumar <viresh.kumar@linaro.org> writes:

> migrate_timer() is called when a CPU goes down and its timers are required to be
> migrated to some other CPU. Its the responsibility of the users of the timer to
> remove it before control reaches to migrate_timers().
>
> As these were the pinned timers, the best we can do is: don't migrate these and
> report to the user as well.
>
> That's all this patch does.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/timer.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index fec4ab4..a7f8b99 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1606,11 +1606,22 @@ static int init_timers_cpu(int cpu)
>  static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
>  {
>  	struct timer_list *timer;
> +	int is_pinned;
>  
>  	while (!list_empty(head)) {
>  		timer = list_first_entry(head, struct timer_list, entry);
>  		/* We ignore the accounting on the dying cpu */
>  		detach_timer(timer, false);
> +
> +		is_pinned = tbase_get_pinned(timer->base);
> +
> +		/* Check if CPU still has pinned timers */
> +		if (is_pinned) {
> +			pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
> +					__func__, timer);

printk message will be confusing: removing it from what? 

Kevin


> +			continue;
> +		}
> +
>  		timer_set_base(timer, new_base);
>  		internal_add_timer(new_base, timer);
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-03-31 15:56     ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-03-31 15:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> writes:

> migrate_timer() is called when a CPU goes down and its timers are required to be
> migrated to some other CPU. Its the responsibility of the users of the timer to
> remove it before control reaches to migrate_timers().
>
> As these were the pinned timers, the best we can do is: don't migrate these and
> report to the user as well.
>
> That's all this patch does.
>
> Signed-off-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  kernel/timer.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index fec4ab4..a7f8b99 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1606,11 +1606,22 @@ static int init_timers_cpu(int cpu)
>  static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
>  {
>  	struct timer_list *timer;
> +	int is_pinned;
>  
>  	while (!list_empty(head)) {
>  		timer = list_first_entry(head, struct timer_list, entry);
>  		/* We ignore the accounting on the dying cpu */
>  		detach_timer(timer, false);
> +
> +		is_pinned = tbase_get_pinned(timer->base);
> +
> +		/* Check if CPU still has pinned timers */
> +		if (is_pinned) {
> +			pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
> +					__func__, timer);

printk message will be confusing: removing it from what? 

Kevin


> +			continue;
> +		}
> +
>  		timer_set_base(timer, new_base);
>  		internal_add_timer(new_base, timer);
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-04-01  4:32       ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-01  4:32 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 31 March 2014 21:26, Kevin Hilman <khilman@linaro.org> wrote:
> Viresh Kumar <viresh.kumar@linaro.org> writes:
>> +             if (is_pinned) {
>> +                     pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
>> +                                     __func__, timer);
>
> printk message will be confusing: removing it from what?

Hmm.. So, I am looking to do two modifications here. Just need inputs
if that would be the right thing to do:

- do a WARN() here as these timers should have been already removed
- change print to: "can't migrate pinned timer %p, deactivating timer"

Looks fine?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-04-01  4:32       ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-01  4:32 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 31 March 2014 21:26, Kevin Hilman <khilman-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> writes:
>> +             if (is_pinned) {
>> +                     pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
>> +                                     __func__, timer);
>
> printk message will be confusing: removing it from what?

Hmm.. So, I am looking to do two modifications here. Just need inputs
if that would be the right thing to do:

- do a WARN() here as these timers should have been already removed
- change print to: "can't migrate pinned timer %p, deactivating timer"

Looks fine?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-04-04 15:15         ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-04-04 15:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

Viresh Kumar <viresh.kumar@linaro.org> writes:

> On 31 March 2014 21:26, Kevin Hilman <khilman@linaro.org> wrote:
>> Viresh Kumar <viresh.kumar@linaro.org> writes:
>>> +             if (is_pinned) {
>>> +                     pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
>>> +                                     __func__, timer);
>>
>> printk message will be confusing: removing it from what?
>
> Hmm.. So, I am looking to do two modifications here. Just need inputs
> if that would be the right thing to do:
>
> - do a WARN() here as these timers should have been already removed
> - change print to: "can't migrate pinned timer %p, deactivating timer"
>
> Looks fine?

Yes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/4] timer: don't migrate pinned timers
@ 2014-04-04 15:15         ` Kevin Hilman
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Hilman @ 2014-04-04 15:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> writes:

> On 31 March 2014 21:26, Kevin Hilman <khilman-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> writes:
>>> +             if (is_pinned) {
>>> +                     pr_warn("%s: can't migrate pinned timer: %p, removing it\n",
>>> +                                     __func__, timer);
>>
>> printk message will be confusing: removing it from what?
>
> Hmm.. So, I am looking to do two modifications here. Just need inputs
> if that would be the right thing to do:
>
> - do a WARN() here as these timers should have been already removed
> - change print to: "can't migrate pinned timer %p, deactivating timer"
>
> Looks fine?

Yes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  7:25   ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-24  7:25 UTC (permalink / raw)
  To: Viresh Kumar, tglx, fweisbec, peterz, mingo, tj, lizefan
  Cc: linaro-kernel, linux-kernel, cgroups

Dear Viresh Kumar,

I tried your set of patches for isolating particular CPU cores from unpinned
timers. On x86_64 they were working fine, however I found out that on ARM
they would fail under the following test:

# mount -t cpuset none /cpuset
# cd /cpuset
# mkdir rt
# cd rt
# echo 1 > cpus
# echo 1 > cpu_exclusive
# cd
# taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
[   75.622375] ------------[ cut here ]------------
[   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
[   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
[   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
[   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
[   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
[   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
[   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
[   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
[   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
[   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
[   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
[   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
[   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
....

I also backported your patches to Linux 3.10.y and found the same problem
both in ARM and x86_64. However, I think I figured out the reason for those
errors. Please, could you check the patch below (it applies on the top of
your tree, branch isolate-cpusets) and let me know what you think?

Thanks,
Daniel

-------------------------PATCH STARTS HERE---------------------------------
cpuset: quiesce: change irq disable/enable by irq save/restore

The function __migrate_timers can be called under interrupt context
or thread context depending on the core where the system call was
executed. In case it executes under interrupt context, it
seems a bad idea to leave interrupts enabled after migrating the
timers. In fact, this caused kernel errors on the ARM architecture and
on the x86_64 architecture with the 3.10 kernel (backported version
of the cpuset-quiesce patch).

Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
Signed-off-by: Yoshitake Kobayashi <yoshitake.kobayashi@toshiba.co.jp>
---
 kernel/hrtimer.c | 5 +++--
 kernel/timer.c   | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e8cd1db..abb1707 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1701,8 +1701,9 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)
 	struct hrtimer_clock_base *clock_base;
 	unsigned int active_bases;
 	int i;
+	unsigned long flags;

-	local_irq_disable();
+	local_irq_save(flags);
 	old_base = &per_cpu(hrtimer_bases, scpu);
 	new_base = &__get_cpu_var(hrtimer_bases);
 	/*
@@ -1722,7 +1723,7 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)

 	/* Check, if we got expired work to do */
 	__hrtimer_peek_ahead_timers();
-	local_irq_enable();
+	local_irq_restore(flags);
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */

diff --git a/kernel/timer.c b/kernel/timer.c
index 4676a07..2715b7d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1644,6 +1644,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	struct tvec_base *old_base;
 	struct tvec_base *new_base;
 	int i;
+	unsigned long flags;

 	old_base = per_cpu(tvec_bases, cpu);
 	new_base = get_cpu_var(tvec_bases);
@@ -1651,7 +1652,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	 * The caller is globally serialized and nobody else
 	 * takes two locks at once, deadlock is not possible.
 	 */
-	spin_lock_irq(&new_base->lock);
+	spin_lock_irqsave(&new_base->lock, flags);
 	spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);

 	BUG_ON(old_base->running_timer);
@@ -1671,7 +1672,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	}

 	spin_unlock(&old_base->lock);
-	spin_unlock_irq(&new_base->lock);
+	spin_unlock_irqrestore(&new_base->lock, flags);
 	put_cpu_var(tvec_bases);
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
-- 
1.8.1.1
-------------------------PATCH ENDS HERE---------------------------------

On 2014/03/20 22:48, Viresh Kumar wrote:
> For networking platforms we need to provide one isolated CPU per user space data
> plane thread. These CPUs should not be interrupted by kernel at all unless
> userspace has requested for some syscalls. And so must live in isolated mode.
> Currently, there are background kernel activities that are running on almost
> every CPU, like: timers/hrtimers/watchdogs/etc, and these are required to be
> migrated to other CPUs.
> 
> This patchset tries to migrate un-pinned timers away in the first attempt. And
> support for migrating others like hrtimers/workqueues/etc would be added later.
> 
> This has only went through basic testing currently on ARM Samsung Exynos board
> which only has two CPUs. Separate cpusets were created for these two CPUs and
> then timers were migrated from one cpuset to other.
> 
> This option was earlier suggested by Peter Z. here.
> 
> https://lkml.org/lkml/2014/1/15/186
> 
> Please provide your inputs on how this can be improved..
> 
> Viresh Kumar (4):
>   timer: track pinned timers with TIMER_PINNED flag
>   timer: don't migrate pinned timers
>   timer: create timer_quiesce_cpu() for cpusets.quiesce option
>   cpuset: Add cpusets.quiesce option
> 
>  include/linux/timer.h | 10 +++---
>  kernel/cpuset.c       | 56 +++++++++++++++++++++++++++++++
>  kernel/timer.c        | 92 +++++++++++++++++++++++++++++++++++++++++----------
>  3 files changed, 136 insertions(+), 22 deletions(-)
> 

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin@toshiba.co.jp

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  7:25   ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-24  7:25 UTC (permalink / raw)
  To: Viresh Kumar, tglx-hfZtesqFncYOwBW4kG4KsQ,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, tj-DgEjT+Ai2ygdnm+yROfE0A,
	lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Dear Viresh Kumar,

I tried your set of patches for isolating particular CPU cores from unpinned
timers. On x86_64 they were working fine, however I found out that on ARM
they would fail under the following test:

# mount -t cpuset none /cpuset
# cd /cpuset
# mkdir rt
# cd rt
# echo 1 > cpus
# echo 1 > cpu_exclusive
# cd
# taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
[   75.622375] ------------[ cut here ]------------
[   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
[   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
[   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
[   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
[   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
[   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
[   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
[   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
[   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
[   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
[   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
[   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
[   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
....

I also backported your patches to Linux 3.10.y and found the same problem
both in ARM and x86_64. However, I think I figured out the reason for those
errors. Please, could you check the patch below (it applies on the top of
your tree, branch isolate-cpusets) and let me know what you think?

Thanks,
Daniel

-------------------------PATCH STARTS HERE---------------------------------
cpuset: quiesce: change irq disable/enable by irq save/restore

The function __migrate_timers can be called under interrupt context
or thread context depending on the core where the system call was
executed. In case it executes under interrupt context, it
seems a bad idea to leave interrupts enabled after migrating the
timers. In fact, this caused kernel errors on the ARM architecture and
on the x86_64 architecture with the 3.10 kernel (backported version
of the cpuset-quiesce patch).

Signed-off-by: Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org>
Signed-off-by: Yoshitake Kobayashi <yoshitake.kobayashi-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org>
---
 kernel/hrtimer.c | 5 +++--
 kernel/timer.c   | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e8cd1db..abb1707 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1701,8 +1701,9 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)
 	struct hrtimer_clock_base *clock_base;
 	unsigned int active_bases;
 	int i;
+	unsigned long flags;

-	local_irq_disable();
+	local_irq_save(flags);
 	old_base = &per_cpu(hrtimer_bases, scpu);
 	new_base = &__get_cpu_var(hrtimer_bases);
 	/*
@@ -1722,7 +1723,7 @@ static void __migrate_hrtimers(int scpu, bool remove_pinned)

 	/* Check, if we got expired work to do */
 	__hrtimer_peek_ahead_timers();
-	local_irq_enable();
+	local_irq_restore(flags);
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */

diff --git a/kernel/timer.c b/kernel/timer.c
index 4676a07..2715b7d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1644,6 +1644,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	struct tvec_base *old_base;
 	struct tvec_base *new_base;
 	int i;
+	unsigned long flags;

 	old_base = per_cpu(tvec_bases, cpu);
 	new_base = get_cpu_var(tvec_bases);
@@ -1651,7 +1652,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	 * The caller is globally serialized and nobody else
 	 * takes two locks at once, deadlock is not possible.
 	 */
-	spin_lock_irq(&new_base->lock);
+	spin_lock_irqsave(&new_base->lock, flags);
 	spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);

 	BUG_ON(old_base->running_timer);
@@ -1671,7 +1672,7 @@ static void __migrate_timers(int cpu, bool remove_pinned)
 	}

 	spin_unlock(&old_base->lock);
-	spin_unlock_irq(&new_base->lock);
+	spin_unlock_irqrestore(&new_base->lock, flags);
 	put_cpu_var(tvec_bases);
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
-- 
1.8.1.1
-------------------------PATCH ENDS HERE---------------------------------

On 2014/03/20 22:48, Viresh Kumar wrote:
> For networking platforms we need to provide one isolated CPU per user space data
> plane thread. These CPUs should not be interrupted by kernel at all unless
> userspace has requested for some syscalls. And so must live in isolated mode.
> Currently, there are background kernel activities that are running on almost
> every CPU, like: timers/hrtimers/watchdogs/etc, and these are required to be
> migrated to other CPUs.
> 
> This patchset tries to migrate un-pinned timers away in the first attempt. And
> support for migrating others like hrtimers/workqueues/etc would be added later.
> 
> This has only went through basic testing currently on ARM Samsung Exynos board
> which only has two CPUs. Separate cpusets were created for these two CPUs and
> then timers were migrated from one cpuset to other.
> 
> This option was earlier suggested by Peter Z. here.
> 
> https://lkml.org/lkml/2014/1/15/186
> 
> Please provide your inputs on how this can be improved..
> 
> Viresh Kumar (4):
>   timer: track pinned timers with TIMER_PINNED flag
>   timer: don't migrate pinned timers
>   timer: create timer_quiesce_cpu() for cpusets.quiesce option
>   cpuset: Add cpusets.quiesce option
> 
>  include/linux/timer.h | 10 +++---
>  kernel/cpuset.c       | 56 +++++++++++++++++++++++++++++++
>  kernel/timer.c        | 92 +++++++++++++++++++++++++++++++++++++++++----------
>  3 files changed, 136 insertions(+), 22 deletions(-)
> 

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  7:43     ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-24  7:43 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 24 April 2014 12:55, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
> I tried your set of patches for isolating particular CPU cores from unpinned
> timers. On x86_64 they were working fine, however I found out that on ARM
> they would fail under the following test:

I am happy that these drew attention from somebody Atleast :)

> # mount -t cpuset none /cpuset
> # cd /cpuset
> # mkdir rt
> # cd rt
> # echo 1 > cpus
> # echo 1 > cpu_exclusive
> # cd
> # taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
> [   75.622375] ------------[ cut here ]------------
> [   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
> [   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> [   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
> [   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
> [   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
> [   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
> [   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
> [   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
> [   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
> [   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
> [   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
> [   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
> [   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)

I couldn't understand why we went via a interrupt here ? Probably CPU1
was idle and was woken up with a IPI and then this happened. But in
that case too,
shouldn't the script run from process context instead ?

> I also backported your patches to Linux 3.10.y and found the same problem
> both in ARM and x86_64.

There are very few changes in between 3.10 and latest for timers/hrtimers
and so things are expected to be the same.

> However, I think I figured out the reason for those
> errors. Please, could you check the patch below (it applies on the top of
> your tree, branch isolate-cpusets) and let me know what you think?

Okay, just to let you know, I have also found some issues and they are
now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

> -------------------------PATCH STARTS HERE---------------------------------
> cpuset: quiesce: change irq disable/enable by irq save/restore
>
> The function __migrate_timers can be called under interrupt context
> or thread context depending on the core where the system call was
> executed. In case it executes under interrupt context, it

How exactly?

> seems a bad idea to leave interrupts enabled after migrating the
> timers. In fact, this caused kernel errors on the ARM architecture and
> on the x86_64 architecture with the 3.10 kernel (backported version
> of the cpuset-quiesce patch).

I can't keep it as a separate patch and so would be required to merge
it into my original patch..

Thanks for your inputs :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  7:43     ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-24  7:43 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 24 April 2014 12:55, Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org> wrote:
> I tried your set of patches for isolating particular CPU cores from unpinned
> timers. On x86_64 they were working fine, however I found out that on ARM
> they would fail under the following test:

I am happy that these drew attention from somebody Atleast :)

> # mount -t cpuset none /cpuset
> # cd /cpuset
> # mkdir rt
> # cd rt
> # echo 1 > cpus
> # echo 1 > cpu_exclusive
> # cd
> # taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
> [   75.622375] ------------[ cut here ]------------
> [   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
> [   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> [   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
> [   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
> [   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
> [   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
> [   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
> [   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
> [   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
> [   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
> [   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
> [   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
> [   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)

I couldn't understand why we went via a interrupt here ? Probably CPU1
was idle and was woken up with a IPI and then this happened. But in
that case too,
shouldn't the script run from process context instead ?

> I also backported your patches to Linux 3.10.y and found the same problem
> both in ARM and x86_64.

There are very few changes in between 3.10 and latest for timers/hrtimers
and so things are expected to be the same.

> However, I think I figured out the reason for those
> errors. Please, could you check the patch below (it applies on the top of
> your tree, branch isolate-cpusets) and let me know what you think?

Okay, just to let you know, I have also found some issues and they are
now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

> -------------------------PATCH STARTS HERE---------------------------------
> cpuset: quiesce: change irq disable/enable by irq save/restore
>
> The function __migrate_timers can be called under interrupt context
> or thread context depending on the core where the system call was
> executed. In case it executes under interrupt context, it

How exactly?

> seems a bad idea to leave interrupts enabled after migrating the
> timers. In fact, this caused kernel errors on the ARM architecture and
> on the x86_64 architecture with the 3.10 kernel (backported version
> of the cpuset-quiesce patch).

I can't keep it as a separate patch and so would be required to merge
it into my original patch..

Thanks for your inputs :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  8:31       ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-24  8:31 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups


On 2014/04/24 16:43, Viresh Kumar wrote:
> On 24 April 2014 12:55, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
>> I tried your set of patches for isolating particular CPU cores from unpinned
>> timers. On x86_64 they were working fine, however I found out that on ARM
>> they would fail under the following test:
> 
> I am happy that these drew attention from somebody Atleast :)

Thanks to you for your hard work.

>> # mount -t cpuset none /cpuset
>> # cd /cpuset
>> # mkdir rt
>> # cd rt
>> # echo 1 > cpus
>> # echo 1 > cpu_exclusive
>> # cd
>> # taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
>> [   75.622375] ------------[ cut here ]------------
>> [   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
>> [   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>> [   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
>> [   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
>> [   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
>> [   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
>> [   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
>> [   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
>> [   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
>> [   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
>> [   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
>> [   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
>> [   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
> 
> I couldn't understand why we went via a interrupt here ? Probably CPU1
> was idle and was woken up with a IPI and then this happened. But in
> that case too,
> shouldn't the script run from process context instead ?

In kernel/cpuset.c:quiesce_cpuset() you are using the function
'smp_call_function_any' which asks CPU cores in 'cpumask' to
execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.

In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
the call to be executed from core 1 (by using taskset),
an inter-processor interrupt is sent to core 0 for those functions
to be executed.

>> I also backported your patches to Linux 3.10.y and found the same problem
>> both in ARM and x86_64.
> 
> There are very few changes in between 3.10 and latest for timers/hrtimers
> and so things are expected to be the same.
> 
>> However, I think I figured out the reason for those
>> errors. Please, could you check the patch below (it applies on the top of
>> your tree, branch isolate-cpusets) and let me know what you think?
> 
> Okay, just to let you know, I have also found some issues and they are
> now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

Ok, thank you! I see that you have already fixed the problem. I tested
your tree on ARM and now it seems to work correctly.



> 
>> -------------------------PATCH STARTS HERE---------------------------------
>> cpuset: quiesce: change irq disable/enable by irq save/restore
>>
>> The function __migrate_timers can be called under interrupt context
>> or thread context depending on the core where the system call was
>> executed. In case it executes under interrupt context, it
> 
> How exactly?

See my reply above.

>> seems a bad idea to leave interrupts enabled after migrating the
>> timers. In fact, this caused kernel errors on the ARM architecture and
>> on the x86_64 architecture with the 3.10 kernel (backported version
>> of the cpuset-quiesce patch).
> 
> I can't keep it as a separate patch and so would be required to merge
> it into my original patch..
> 
> Thanks for your inputs :)
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

Thanks,
Daniel

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin@toshiba.co.jp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  8:31       ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-24  8:31 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups


On 2014/04/24 16:43, Viresh Kumar wrote:
> On 24 April 2014 12:55, Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org> wrote:
>> I tried your set of patches for isolating particular CPU cores from unpinned
>> timers. On x86_64 they were working fine, however I found out that on ARM
>> they would fail under the following test:
> 
> I am happy that these drew attention from somebody Atleast :)

Thanks to you for your hard work.

>> # mount -t cpuset none /cpuset
>> # cd /cpuset
>> # mkdir rt
>> # cd rt
>> # echo 1 > cpus
>> # echo 1 > cpu_exclusive
>> # cd
>> # taskset 0x2 ./setquiesce.sh  <--- contains "echo 1 > /cpuset/rt/quiesce"
>> [   75.622375] ------------[ cut here ]------------
>> [   75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc()
>> [   75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>> [   75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1
>> [   75.649627] [<c0014d18>] (unwind_backtrace) from [<c00119e8>] (show_stack+0x10/0x14)
>> [   75.649627] [<c00119e8>] (show_stack) from [<c065b61c>] (dump_stack+0x78/0x94)
>> [   75.662689] [<c065b61c>] (dump_stack) from [<c003e9a4>] (warn_slowpath_common+0x60/0x84)
>> [   75.670410] [<c003e9a4>] (warn_slowpath_common) from [<c003ea24>] (warn_slowpath_fmt+0x30/0x40)
>> [   75.677673] [<c003ea24>] (warn_slowpath_fmt) from [<c005d7b0>] (__migrate_hrtimers+0x17c/0x1bc)
>> [   75.677673] [<c005d7b0>] (__migrate_hrtimers) from [<c009e004>] (generic_smp_call_function_single_interrupt+0x8c/0x104)
>> [   75.699645] [<c009e004>] (generic_smp_call_function_single_interrupt) from [<c00134d0>] (handle_IPI+0xa4/0x16c)
>> [   75.706970] [<c00134d0>] (handle_IPI) from [<c0008614>] (gic_handle_irq+0x54/0x5c)
>> [   75.715087] [<c0008614>] (gic_handle_irq) from [<c0012624>] (__irq_svc+0x44/0x5c)
>> [   75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0)
> 
> I couldn't understand why we went via a interrupt here ? Probably CPU1
> was idle and was woken up with a IPI and then this happened. But in
> that case too,
> shouldn't the script run from process context instead ?

In kernel/cpuset.c:quiesce_cpuset() you are using the function
'smp_call_function_any' which asks CPU cores in 'cpumask' to
execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.

In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
the call to be executed from core 1 (by using taskset),
an inter-processor interrupt is sent to core 0 for those functions
to be executed.

>> I also backported your patches to Linux 3.10.y and found the same problem
>> both in ARM and x86_64.
> 
> There are very few changes in between 3.10 and latest for timers/hrtimers
> and so things are expected to be the same.
> 
>> However, I think I figured out the reason for those
>> errors. Please, could you check the patch below (it applies on the top of
>> your tree, branch isolate-cpusets) and let me know what you think?
> 
> Okay, just to let you know, I have also found some issues and they are
> now pushed in my tree.. Also it is rebased over 3.15-rc2 now.

Ok, thank you! I see that you have already fixed the problem. I tested
your tree on ARM and now it seems to work correctly.



> 
>> -------------------------PATCH STARTS HERE---------------------------------
>> cpuset: quiesce: change irq disable/enable by irq save/restore
>>
>> The function __migrate_timers can be called under interrupt context
>> or thread context depending on the core where the system call was
>> executed. In case it executes under interrupt context, it
> 
> How exactly?

See my reply above.

>> seems a bad idea to leave interrupts enabled after migrating the
>> timers. In fact, this caused kernel errors on the ARM architecture and
>> on the x86_64 architecture with the 3.10 kernel (backported version
>> of the cpuset-quiesce patch).
> 
> I can't keep it as a separate patch and so would be required to merge
> it into my original patch..
> 
> Thanks for your inputs :)
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

Thanks,
Daniel

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  8:41         ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-24  8:41 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 24 April 2014 14:01, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
> In kernel/cpuset.c:quiesce_cpuset() you are using the function
> 'smp_call_function_any' which asks CPU cores in 'cpumask' to
> execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.
>
> In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
> the call to be executed from core 1 (by using taskset),
> an inter-processor interrupt is sent to core 0 for those functions
> to be executed.

Ahh, I understood that now :) .. So we are setting cpuset.quiesce from CPU1
which will do a IPI to get migrate_timers called on CPU0.. I was setting quiesce
from CPU0 only in my tests :)

But how does this work fine on x86 then? There we should have exactly same
problem, isn't it?

> Ok, thank you! I see that you have already fixed the problem. I tested
> your tree on ARM and now it seems to work correctly.

Yeah, I just pushed your changes as well at the time I wrote last mail :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-24  8:41         ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-24  8:41 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 24 April 2014 14:01, Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org> wrote:
> In kernel/cpuset.c:quiesce_cpuset() you are using the function
> 'smp_call_function_any' which asks CPU cores in 'cpumask' to
> execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.
>
> In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
> the call to be executed from core 1 (by using taskset),
> an inter-processor interrupt is sent to core 0 for those functions
> to be executed.

Ahh, I understood that now :) .. So we are setting cpuset.quiesce from CPU1
which will do a IPI to get migrate_timers called on CPU0.. I was setting quiesce
from CPU0 only in my tests :)

But how does this work fine on x86 then? There we should have exactly same
problem, isn't it?

> Ok, thank you! I see that you have already fixed the problem. I tested
> your tree on ARM and now it seems to work correctly.

Yeah, I just pushed your changes as well at the time I wrote last mail :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
  2014-04-24  8:41         ` Viresh Kumar
  (?)
@ 2014-04-24  9:24         ` Daniel Sangorrin
  2014-04-24  9:30           ` Viresh Kumar
  -1 siblings, 1 reply; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-24  9:24 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 2014/04/24 17:41, Viresh Kumar wrote:
> On 24 April 2014 14:01, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
>> In kernel/cpuset.c:quiesce_cpuset() you are using the function
>> 'smp_call_function_any' which asks CPU cores in 'cpumask' to
>> execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'.
>>
>> In the case above, 'cpumask' corresponds to core 0. Since I'm forcing
>> the call to be executed from core 1 (by using taskset),
>> an inter-processor interrupt is sent to core 0 for those functions
>> to be executed.
> 
> Ahh, I understood that now :) .. So we are setting cpuset.quiesce from CPU1
> which will do a IPI to get migrate_timers called on CPU0.. I was setting quiesce
> from CPU0 only in my tests :)
> 
> But how does this work fine on x86 then? There we should have exactly same
> problem, isn't it?

Yeah, I'm not sure why it is working on 3.15 x86_64 but not in 3.10 x86_64.
Perhaps it's related to this patch: https://lkml.org/lkml/2013/9/19/349

>> Ok, thank you! I see that you have already fixed the problem. I tested
>> your tree on ARM and now it seems to work correctly.
> 
> Yeah, I just pushed your changes as well at the time I wrote last mail :)

Oh, I see!

Why didn't you just apply the patch on top of your tree so that the
information included in the git commit (e.g: my name and mail) remains?

This part:

cpuset: quiesce: change irq disable/enable by irq save/restore

The function __migrate_timers can be called under interrupt context
or thread context depending on the core where the system call was
executed. In case it executes under interrupt context, it
seems a bad idea to leave interrupts enabled after migrating the
timers. In fact, this caused kernel errors on the ARM architecture and
on the x86_64 architecture with the 3.10 kernel (backported version
of the cpuset-quiesce patch).

Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
Signed-off-by: Yoshitake Kobayashi <yoshitake.kobayashi@toshiba.co.jp>

Thanks,
Daniel

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin@toshiba.co.jp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
  2014-04-24  9:24         ` Daniel Sangorrin
@ 2014-04-24  9:30           ` Viresh Kumar
  2014-04-25  0:31               ` Daniel Sangorrin
  0 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2014-04-24  9:30 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 24 April 2014 14:54, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
> Why didn't you just apply the patch on top of your tree so that the
> information included in the git commit (e.g: my name and mail) remains?
>
> This part:
>
> cpuset: quiesce: change irq disable/enable by irq save/restore
>
> The function __migrate_timers can be called under interrupt context
> or thread context depending on the core where the system call was
> executed. In case it executes under interrupt context, it
> seems a bad idea to leave interrupts enabled after migrating the
> timers. In fact, this caused kernel errors on the ARM architecture and
> on the x86_64 architecture with the 3.10 kernel (backported version
> of the cpuset-quiesce patch).
>
> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> Signed-off-by: Yoshitake Kobayashi <yoshitake.kobayashi@toshiba.co.jp>

That's what I told you earlier when I said this:

> I can't keep it as a separate patch and so would be required to merge
> it into my original patch..

And the reason being: "No patch is supposed to break things, otherwise
git bisect wouldn't work smoothly".. And so git bisect would complain
this issue after my patch and so I have to merge the fixes you gave into
the original patch as its not yet merged.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  0:31               ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-25  0:31 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups


>> I can't keep it as a separate patch and so would be required to merge
>> it into my original patch..
> 
> And the reason being: "No patch is supposed to break things, otherwise
> git bisect wouldn't work smoothly".. And so git bisect would complain
> this issue after my patch and so I have to merge the fixes you gave into
> the original patch as its not yet merged.
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ok, no problem then.

Thanks,
Daniel

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin@toshiba.co.jp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  0:31               ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-25  0:31 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups


>> I can't keep it as a separate patch and so would be required to merge
>> it into my original patch..
> 
> And the reason being: "No patch is supposed to break things, otherwise
> git bisect wouldn't work smoothly".. And so git bisect would complain
> this issue after my patch and so I have to merge the fixes you gave into
> the original patch as its not yet merged.
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ok, no problem then.

Thanks,
Daniel

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  4:51                 ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-25  4:51 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 25 April 2014 06:01, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
>
>>> I can't keep it as a separate patch and so would be required to merge
>>> it into my original patch..
>>
>> And the reason being: "No patch is supposed to break things, otherwise
>> git bisect wouldn't work smoothly".. And so git bisect would complain
>> this issue after my patch and so I have to merge the fixes you gave into
>> the original patch as its not yet merged.
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Ok, no problem then.

Do you want me to add your Tested-by for my next version ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  4:51                 ` Viresh Kumar
  0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2014-04-25  4:51 UTC (permalink / raw)
  To: Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups

On 25 April 2014 06:01, Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org> wrote:
>
>>> I can't keep it as a separate patch and so would be required to merge
>>> it into my original patch..
>>
>> And the reason being: "No patch is supposed to break things, otherwise
>> git bisect wouldn't work smoothly".. And so git bisect would complain
>> this issue after my patch and so I have to merge the fixes you gave into
>> the original patch as its not yet merged.
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Ok, no problem then.

Do you want me to add your Tested-by for my next version ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  5:21                   ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-25  5:21 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups



On 2014/04/25 13:51, Viresh Kumar wrote:
> On 25 April 2014 06:01, Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> wrote:
>>
>>>> I can't keep it as a separate patch and so would be required to merge
>>>> it into my original patch..
>>>
>>> And the reason being: "No patch is supposed to break things, otherwise
>>> git bisect wouldn't work smoothly".. And so git bisect would complain
>>> this issue after my patch and so I have to merge the fixes you gave into
>>> the original patch as its not yet merged.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Ok, no problem then.
> 
> Do you want me to add your Tested-by for my next version ?
> 
> 

Sure.

Tested-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin@toshiba.co.jp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce
@ 2014-04-25  5:21                   ` Daniel Sangorrin
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Sangorrin @ 2014-04-25  5:21 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Sangorrin
  Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
	Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
	Linux Kernel Mailing List, Cgroups



On 2014/04/25 13:51, Viresh Kumar wrote:
> On 25 April 2014 06:01, Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org> wrote:
>>
>>>> I can't keep it as a separate patch and so would be required to merge
>>>> it into my original patch..
>>>
>>> And the reason being: "No patch is supposed to break things, otherwise
>>> git bisect wouldn't work smoothly".. And so git bisect would complain
>>> this issue after my patch and so I have to merge the fixes you gave into
>>> the original patch as its not yet merged.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Ok, no problem then.
> 
> Do you want me to add your Tested-by for my next version ?
> 
> 

Sure.

Tested-by: Daniel Sangorrin <daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org>

-- 
Toshiba Corporate Software Engineering Center
Daniel SANGORRIN
E-mail:  daniel.sangorrin-g3qfMnKXm6lL9jVzuh4AOg@public.gmane.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-04-25  5:21 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-20 13:48 [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce Viresh Kumar
2014-03-20 13:48 ` Viresh Kumar
2014-03-20 13:48 ` [RFC 1/4] timer: track pinned timers with TIMER_PINNED flag Viresh Kumar
2014-03-20 13:48 ` [RFC 2/4] timer: don't migrate pinned timers Viresh Kumar
2014-03-31 15:56   ` Kevin Hilman
2014-03-31 15:56     ` Kevin Hilman
2014-04-01  4:32     ` Viresh Kumar
2014-04-01  4:32       ` Viresh Kumar
2014-04-04 15:15       ` Kevin Hilman
2014-04-04 15:15         ` Kevin Hilman
2014-03-20 13:49 ` [RFC 3/4] timer: create timer_quiesce_cpu() for cpusets.quiesce option Viresh Kumar
2014-03-20 13:49 ` [RFC 4/4] cpuset: Add " Viresh Kumar
2014-03-27  2:47   ` Li Zefan
2014-03-27  2:47     ` Li Zefan
2014-03-27  4:29     ` Viresh Kumar
2014-03-27  4:29       ` Viresh Kumar
2014-04-24  7:25 ` [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce Daniel Sangorrin
2014-04-24  7:25   ` Daniel Sangorrin
2014-04-24  7:43   ` Viresh Kumar
2014-04-24  7:43     ` Viresh Kumar
2014-04-24  8:31     ` Daniel Sangorrin
2014-04-24  8:31       ` Daniel Sangorrin
2014-04-24  8:41       ` Viresh Kumar
2014-04-24  8:41         ` Viresh Kumar
2014-04-24  9:24         ` Daniel Sangorrin
2014-04-24  9:30           ` Viresh Kumar
2014-04-25  0:31             ` Daniel Sangorrin
2014-04-25  0:31               ` Daniel Sangorrin
2014-04-25  4:51               ` Viresh Kumar
2014-04-25  4:51                 ` Viresh Kumar
2014-04-25  5:21                 ` Daniel Sangorrin
2014-04-25  5:21                   ` Daniel Sangorrin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.