All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
@ 2007-02-14 14:40 Gautham R Shenoy
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
                   ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-14 14:40 UTC (permalink / raw)
  To: akpm, paulmck, mingo
  Cc: vatsa, dipankar, venkatesh.pallipadi, linux-kernel, oleg, rjw

Hello Everybody,

This is an experiment towards process_freezer based implementation
of cpu-hotplug. This is mainly based on ideas of Andrew Morton, 
Ingo Molnar and Paul Mckenney featured in the discussion
http://lkml.org/lkml/2007/1/31/323.

This is an absolute bare-minimal implementation to check the feasibility
of using process freezer for cpu-hotplug. 

The patchset comprises of four patches.
o PATCH 1/4: Core implementation of freezer-based-hotplug.
o PATCH 2/4: Revert changes to workqueue to make it work with the
             freezer-cpu-hotplug.
o PATCH 3/4: Eliminate hotcpu subsystem mutexes from sched and slab.
o PATCH 4/4: Eliminate lock_cpu_hotplug from the kernel.

These patches have been stress tested on i386 SMP box, with cpu-hotplug
operations in a tight-loop and make -jN (kernel compile) running
parallely. The cpu hotplug operations have been pretty stable.

Observation:
-------------
Certain threads like ksoftirqd/1, firmware.agent
were not frozen during the hotplug operations. However, these
were rare occurances.

This implementation is by no means perfect or complete. Things
that are yet to be done are as follows:

- Most of Paul's suggestions including introduction of new states
  for process freezer like PFE_SUSPEND, PFE_KPROBES, PFE_HOTPLUG
  so that processes which are not concerned with these events
  can be exempted from being frozen.

- Ensure the working of cpu_hotplug on all architectures on which 
  it is supported. At present, it has been tested only on i386.

- Updated documentation for cpu-hotplug.

Any feedback would be greatly appreciated.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 14:40 [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Gautham R Shenoy
@ 2007-02-14 14:42 ` Gautham R Shenoy
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
                     ` (3 more replies)
  2007-02-14 21:43 ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
       [not found] ` <200702231041.17136.rjw@sisk.pl>
  2 siblings, 4 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-14 14:42 UTC (permalink / raw)
  To: akpm, paulmck, mingo
  Cc: vatsa, dipankar, venkatesh.pallipadi, linux-kernel, oleg, rjw

This patch implements process_freezer based cpu-hotplug
core. 
The sailent features are:
o No more (un)lock_cpu_hotplug.
o No more CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. Hence no per-subsystem
  hotcpu mutexes.
o Calls freeze_process/thaw_processes at the beginning/end of
  the hotplug operation.
o Splits CPU_DEAD into two events namely
  - CPU_DEAD: which will be handled while the processes are still
              frozen.

  - CPU_DEAD_KILL_THREADS: To be handled after we thaw_processes.

 This split is required because stopping of the per-cpu threads
 using kthread_stop cannot be done while the system is in the frozen
 state. Hence we need subsystems which have created per-cpu threads
 have to stop them once thaw_processes have been called.

o Handles CPU_DEAD and CPU_DEAD_KILL_THREADS for subsystems which
  create per-cpu threads.

Points to ponder: 
o Can calling CPU_DOWN_PREPARE/CPU_UP_PREPARE in the 
frozen context create any dirty dependencies in the future?

o Can the SYSTEM_RUNNING hack in _cpu_up be avoided by some cleaner means.

Signed-off-by : Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by : Gautham R Shenoy   <ego@in.ibm.com>
--
 include/linux/notifier.h |    3 --
 kernel/cpu.c             |   68 +++++++++++++++++++----------------------------
 kernel/sched.c           |    7 +++-
 kernel/softirq.c         |   10 ++++--
 kernel/softlockup.c      |    3 +-
 kernel/workqueue.c       |   16 ++++++++++-
 6 files changed, 58 insertions(+), 49 deletions(-)

Index: hotplug/kernel/cpu.c
===================================================================
--- hotplug.orig/kernel/cpu.c
+++ hotplug/kernel/cpu.c
@@ -14,6 +14,7 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <linux/freezer.h>
 
 /* This protects CPUs going up and down... */
 static DEFINE_MUTEX(cpu_add_remove_lock);
@@ -28,38 +29,15 @@ static int cpu_hotplug_disabled;
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-/* Crappy recursive lock-takers in cpufreq! Complain loudly about idiots */
-static struct task_struct *recursive;
-static int recursive_depth;
-
 void lock_cpu_hotplug(void)
 {
-	struct task_struct *tsk = current;
-
-	if (tsk == recursive) {
-		static int warnings = 10;
-		if (warnings) {
-			printk(KERN_ERR "Lukewarm IQ detected in hotplug locking\n");
-			WARN_ON(1);
-			warnings--;
-		}
-		recursive_depth++;
-		return;
-	}
-	mutex_lock(&cpu_bitmask_lock);
-	recursive = tsk;
+	return;
 }
 EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
 
 void unlock_cpu_hotplug(void)
 {
-	WARN_ON(recursive != current);
-	if (recursive_depth) {
-		recursive_depth--;
-		return;
-	}
-	recursive = NULL;
-	mutex_unlock(&cpu_bitmask_lock);
+	return;
 }
 EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
 
@@ -133,7 +111,11 @@ static int _cpu_down(unsigned int cpu)
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	if (freeze_processes()) {
+		thaw_processes();
+		return -EBUSY;
+	}
+
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
 					hcpu, -1, &nr_calls);
 	if (err == NOTIFY_BAD) {
@@ -141,8 +123,8 @@ static int _cpu_down(unsigned int cpu)
 					  nr_calls, NULL);
 		printk("%s: attempt to take down CPU %u failed\n",
 				__FUNCTION__, cpu);
-		err = -EINVAL;
-		goto out_release;
+		thaw_processes();
+		return -EINVAL;
 	}
 
 	/* Ensure that we are not runnable on dying cpu */
@@ -151,9 +133,7 @@ static int _cpu_down(unsigned int cpu)
 	cpu_clear(cpu, tmp);
 	set_cpus_allowed(current, tmp);
 
-	mutex_lock(&cpu_bitmask_lock);
 	p = __stop_machine_run(take_cpu_down, NULL, cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 
 	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
@@ -161,9 +141,12 @@ static int _cpu_down(unsigned int cpu)
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 
+		set_cpus_allowed(current, old_allowed);
+		thaw_processes();
+
 		if (IS_ERR(p)) {
 			err = PTR_ERR(p);
-			goto out_allowed;
+			return err;
 		}
 		goto out_thread;
 	}
@@ -185,13 +168,12 @@ static int _cpu_down(unsigned int cpu)
 
 	check_for_tasks(cpu);
 
+	thaw_processes();
+
+	if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD_KILL_THREADS, hcpu) == NOTIFY_BAD)
+		BUG();
 out_thread:
 	err = kthread_stop(p);
-out_allowed:
-	set_cpus_allowed(current, old_allowed);
-out_release:
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE,
-						(void *)(long)cpu);
 	return err;
 }
 
@@ -219,7 +201,12 @@ static int __cpuinit _cpu_up(unsigned in
 	if (cpu_online(cpu) || !cpu_present(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	if (system_state == SYSTEM_RUNNING && freeze_processes()) {
+		if (system_state == SYSTEM_RUNNING)
+			thaw_processes();
+		return -EBUSY;
+	}
+
 	ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu,
 							-1, &nr_calls);
 	if (ret == NOTIFY_BAD) {
@@ -229,10 +216,9 @@ static int __cpuinit _cpu_up(unsigned in
 		goto out_notify;
 	}
 
+
 	/* Arch-specific enabling code. */
-	mutex_lock(&cpu_bitmask_lock);
 	ret = __cpu_up(cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
@@ -241,10 +227,12 @@ static int __cpuinit _cpu_up(unsigned in
 	raw_notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu);
 
 out_notify:
+	if (system_state == SYSTEM_RUNNING)
+		thaw_processes();
+
 	if (ret != 0)
 		__raw_notifier_call_chain(&cpu_chain,
 				CPU_UP_CANCELED, hcpu, nr_calls, NULL);
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
 
 	return ret;
 }
Index: hotplug/kernel/softirq.c
===================================================================
--- hotplug.orig/kernel/softirq.c
+++ hotplug/kernel/softirq.c
@@ -18,6 +18,7 @@
 #include <linux/rcupdate.h>
 #include <linux/smp.h>
 #include <linux/tick.h>
+#include <linux/freezer.h>
 
 #include <asm/irq.h>
 /*
@@ -489,7 +490,6 @@ void __init softirq_init(void)
 static int ksoftirqd(void * __bind_cpu)
 {
 	set_user_nice(current, 19);
-	current->flags |= PF_NOFREEZE;
 
 	set_current_state(TASK_INTERRUPTIBLE);
 
@@ -515,6 +515,9 @@ static int ksoftirqd(void * __bind_cpu)
 			preempt_disable();
 		}
 		preempt_enable();
+		try_to_freeze();
+		if (cpu_is_offline((long)__bind_cpu))
+			goto wait_to_die;
 		set_current_state(TASK_INTERRUPTIBLE);
 	}
 	__set_current_state(TASK_RUNNING);
@@ -612,11 +615,12 @@ static int __cpuinit cpu_callback(struct
 		kthread_bind(per_cpu(ksoftirqd, hotcpu),
 			     any_online_cpu(cpu_online_map));
 	case CPU_DEAD:
+		takeover_tasklets(hotcpu);
+		break;
+	case CPU_DEAD_KILL_THREADS:
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
 		kthread_stop(p);
-		takeover_tasklets(hotcpu);
-		break;
 #endif /* CONFIG_HOTPLUG_CPU */
  	}
 	return NOTIFY_OK;
Index: hotplug/kernel/sched.c
===================================================================
--- hotplug.orig/kernel/sched.c
+++ hotplug/kernel/sched.c
@@ -5519,8 +5519,6 @@ migration_call(struct notifier_block *nf
 	case CPU_DEAD:
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		rq = task_rq_lock(rq->idle, &flags);
 		deactivate_task(rq->idle, rq);
@@ -5545,6 +5543,11 @@ migration_call(struct notifier_block *nf
 		}
 		spin_unlock_irq(&rq->lock);
 		break;
+	case CPU_DEAD_KILL_THREADS:
+		rq = cpu_rq(cpu);
+		kthread_stop(rq->migration_thread);
+		rq->migration_thread = NULL;
+		break;
 #endif
 	case CPU_LOCK_RELEASE:
 		mutex_unlock(&sched_hotcpu_mutex);
Index: hotplug/kernel/softlockup.c
===================================================================
--- hotplug.orig/kernel/softlockup.c
+++ hotplug/kernel/softlockup.c
@@ -13,6 +13,7 @@
 #include <linux/kthread.h>
 #include <linux/notifier.h>
 #include <linux/module.h>
+#include <linux/freezer.h>
 
 static DEFINE_SPINLOCK(print_lock);
 
@@ -132,7 +133,7 @@ cpu_callback(struct notifier_block *nfb,
 		/* Unbind so it can run.  Fall thru. */
 		kthread_bind(per_cpu(watchdog_task, hotcpu),
 			     any_online_cpu(cpu_online_map));
-	case CPU_DEAD:
+	case CPU_DEAD_KILL_THREADS:
 		p = per_cpu(watchdog_task, hotcpu);
 		per_cpu(watchdog_task, hotcpu) = NULL;
 		kthread_stop(p);
Index: hotplug/kernel/workqueue.c
===================================================================
--- hotplug.orig/kernel/workqueue.c
+++ hotplug/kernel/workqueue.c
@@ -368,6 +368,7 @@ static int worker_thread(void *__cwq)
 	DEFINE_WAIT(wait);
 	struct k_sigaction sa;
 	sigset_t blocked;
+	int bind_cpu = smp_processor_id();
 
 	if (!cwq->wq->freezeable)
 		current->flags |= PF_NOFREEZE;
@@ -392,8 +393,11 @@ static int worker_thread(void *__cwq)
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
 	for (;;) {
-		if (cwq->wq->freezeable)
+		if (cwq->wq->freezeable) {
 			try_to_freeze();
+			if (cpu_is_offline(bind_cpu))
+				goto wait_to_die;
+		}
 
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
 		if (!cwq->should_stop && list_empty(&cwq->worklist))
@@ -407,6 +411,16 @@ static int worker_thread(void *__cwq)
 	}
 
 	return 0;
+
+wait_to_die:
+	/* Wait for kthread_stop */
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
 }
 
 struct wq_barrier {
Index: hotplug/include/linux/notifier.h
===================================================================
--- hotplug.orig/include/linux/notifier.h
+++ hotplug/include/linux/notifier.h
@@ -194,8 +194,7 @@ extern int __srcu_notifier_call_chain(st
 #define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
 #define CPU_DOWN_FAILED		0x0006 /* CPU (unsigned)v NOT going down */
 #define CPU_DEAD		0x0007 /* CPU (unsigned)v dead */
-#define CPU_LOCK_ACQUIRE	0x0008 /* Acquire all hotcpu locks */
-#define CPU_LOCK_RELEASE	0x0009 /* Release all hotcpu locks */
+#define CPU_DEAD_KILL_THREADS	0x0008 /* Kill per-cpu threads after dead */
 
 #endif /* __KERNEL__ */
 #endif /* _LINUX_NOTIFIER_H */
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
@ 2007-02-14 14:43   ` Gautham R Shenoy
  2007-02-14 14:43     ` [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c Gautham R Shenoy
                       ` (3 more replies)
  2007-02-14 15:31   ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Srivatsa Vaddagiri
                     ` (2 subsequent siblings)
  3 siblings, 4 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-14 14:43 UTC (permalink / raw)
  To: akpm, paulmck, mingo
  Cc: vatsa, dipankar, venkatesh.pallipadi, linux-kernel, oleg, rjw

This patch reverts all the recent workqueue hacks added to make it 
hotplug safe. 

Signed-off-by : Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by : Gautham R Shenoy <ego@in.ibm.com> 

 kernel/workqueue.c |  225 +++++++++++++++++++++++------------------------------
 1 files changed, 98 insertions(+), 127 deletions(-)

Index: hotplug/kernel/workqueue.c
===================================================================
--- hotplug.orig/kernel/workqueue.c
+++ hotplug/kernel/workqueue.c
@@ -47,7 +47,6 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	struct task_struct *thread;
-	int should_stop;
 
 	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
@@ -65,7 +64,7 @@ struct workqueue_struct {
 
 /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
    threads to each one as cpus come/go. */
-static DEFINE_MUTEX(workqueue_mutex);
+static DEFINE_SPINLOCK(workqueue_lock);
 static LIST_HEAD(workqueues);
 
 static int singlethread_cpu __read_mostly;
@@ -344,24 +343,6 @@ static void run_workqueue(struct cpu_wor
 	spin_unlock_irqrestore(&cwq->lock, flags);
 }
 
-/*
- * NOTE: the caller must not touch *cwq if this func returns true
- */
-static int cwq_should_stop(struct cpu_workqueue_struct *cwq)
-{
-	int should_stop = cwq->should_stop;
-
-	if (unlikely(should_stop)) {
-		spin_lock_irq(&cwq->lock);
-		should_stop = cwq->should_stop && list_empty(&cwq->worklist);
-		if (should_stop)
-			cwq->thread = NULL;
-		spin_unlock_irq(&cwq->lock);
-	}
-
-	return should_stop;
-}
-
 static int worker_thread(void *__cwq)
 {
 	struct cpu_workqueue_struct *cwq = __cwq;
@@ -392,7 +373,7 @@ static int worker_thread(void *__cwq)
 	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
-	for (;;) {
+	while (!kthread_should_stop()) {
 		if (cwq->wq->freezeable) {
 			try_to_freeze();
 			if (cpu_is_offline(bind_cpu))
@@ -400,14 +381,12 @@ static int worker_thread(void *__cwq)
 		}
 
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
-		if (!cwq->should_stop && list_empty(&cwq->worklist))
+		if (list_empty(&cwq->worklist))
 			schedule();
 		finish_wait(&cwq->more_work, &wait);
 
-		if (cwq_should_stop(cwq))
-			break;
-
-		run_workqueue(cwq);
+		if (!list_empty(&cwq->worklist))
+			run_workqueue(cwq);
 	}
 
 	return 0;
@@ -445,9 +424,6 @@ static void insert_wq_barrier(struct cpu
 	insert_work(cwq, &barr->work, tail);
 }
 
-/* optimization, we could use cpu_possible_map */
-static cpumask_t cpu_populated_map __read_mostly;
-
 static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
 {
 	if (cwq->thread == current) {
@@ -492,7 +468,7 @@ void fastcall flush_workqueue(struct wor
 	else {
 		int cpu;
 
-		for_each_cpu_mask(cpu, cpu_populated_map)
+		for_each_online_cpu(cpu)
 			flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
 	}
 }
@@ -552,7 +528,7 @@ void flush_work(struct workqueue_struct 
 	else {
 		int cpu;
 
-		for_each_cpu_mask(cpu, cpu_populated_map)
+		for_each_online_cpu(cpu)
 			wait_on_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
 	}
 }
@@ -737,43 +713,25 @@ init_cpu_workqueue(struct workqueue_stru
 static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
 	struct task_struct *p;
+	struct workqueue_struct *wq = cwq->wq;
+	const char *fmt = is_single_threaded(wq) ? "%s" : "%s/%d";
 
-	spin_lock_irq(&cwq->lock);
-	cwq->should_stop = 0;
-	p = cwq->thread;
-	spin_unlock_irq(&cwq->lock);
-
-	if (!p) {
-		struct workqueue_struct *wq = cwq->wq;
-		const char *fmt = is_single_threaded(wq) ? "%s" : "%s/%d";
-
-		p = kthread_create(worker_thread, cwq, fmt, wq->name, cpu);
-		/*
-		 * Nobody can add the work_struct to this cwq,
-		 *	if (caller is __create_workqueue)
-		 *		nobody should see this wq
-		 *	else // caller is CPU_UP_PREPARE
-		 *		cpu is not on cpu_online_map
-		 * so we can abort safely.
-		 */
-		if (IS_ERR(p))
-			return PTR_ERR(p);
-
-		cwq->thread = p;
-		if (!is_single_threaded(wq))
-			kthread_bind(p, cpu);
-		/*
-		 * Cancels affinity if the caller is CPU_UP_PREPARE.
-		 * Needs a cleanup, but OK.
-		 */
-		wake_up_process(p);
-	}
+	p = kthread_create(worker_thread, cwq, fmt, wq->name, cpu);
+	/*
+	 * Nobody can add the work_struct to this cwq,
+	 *	if (caller is __create_workqueue)
+	 *		nobody should see this wq
+	 *	else // caller is CPU_UP_PREPARE
+	 *		cpu is not on cpu_online_map
+	 * so we can abort safely.
+	 */
+	if (IS_ERR(p))
+		return PTR_ERR(p);
 
+	cwq->thread = p;
 	return 0;
 }
 
-static int embryonic_cpu __read_mostly = -1;
-
 struct workqueue_struct *__create_workqueue(const char *name,
 					    int singlethread, int freezeable)
 {
@@ -798,17 +756,20 @@ struct workqueue_struct *__create_workqu
 		INIT_LIST_HEAD(&wq->list);
 		cwq = init_cpu_workqueue(wq, singlethread_cpu);
 		err = create_workqueue_thread(cwq, singlethread_cpu);
+		if (!err)
+			wake_up_process(cwq->thread);
 	} else {
-		mutex_lock(&workqueue_mutex);
+		spin_lock(&workqueue_lock);
 		list_add(&wq->list, &workqueues);
-
-		for_each_possible_cpu(cpu) {
+		spin_unlock(&workqueue_lock);
+		for_each_online_cpu(cpu) {
 			cwq = init_cpu_workqueue(wq, cpu);
-			if (err || !(cpu_online(cpu) || cpu == embryonic_cpu))
-				continue;
 			err = create_workqueue_thread(cwq, cpu);
+			if (err)
+				break;
+			kthread_bind(cwq->thread, cpu);
+			wake_up_process(cwq->thread);
 		}
-		mutex_unlock(&workqueue_mutex);
 	}
 
 	if (err) {
@@ -822,28 +783,10 @@ EXPORT_SYMBOL_GPL(__create_workqueue);
 static void cleanup_workqueue_thread(struct workqueue_struct *wq, int cpu)
 {
 	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
-	struct wq_barrier barr;
-	int alive = 0;
 
-	spin_lock_irq(&cwq->lock);
 	if (cwq->thread != NULL) {
-		insert_wq_barrier(cwq, &barr, 1);
-		cwq->should_stop = 1;
-		alive = 1;
-	}
-	spin_unlock_irq(&cwq->lock);
-
-	if (alive) {
-		wait_for_completion(&barr.done);
-
-		while (unlikely(cwq->thread != NULL))
-			cpu_relax();
-		/*
-		 * Wait until cwq->thread unlocks cwq->lock,
-		 * it won't touch *cwq after that.
-		 */
-		smp_rmb();
-		spin_unlock_wait(&cwq->lock);
+		kthread_stop(cwq->thread);
+		cwq->thread = NULL;
 	}
 }
 
@@ -855,17 +798,18 @@ static void cleanup_workqueue_thread(str
  */
 void destroy_workqueue(struct workqueue_struct *wq)
 {
+	flush_workqueue(wq);
+
 	if (is_single_threaded(wq))
 		cleanup_workqueue_thread(wq, singlethread_cpu);
 	else {
 		int cpu;
 
-		mutex_lock(&workqueue_mutex);
-		list_del(&wq->list);
-		mutex_unlock(&workqueue_mutex);
-
-		for_each_cpu_mask(cpu, cpu_populated_map)
+		for_each_online_cpu(cpu)
 			cleanup_workqueue_thread(wq, cpu);
+		spin_lock(&workqueue_lock);
+		list_del(&wq->list);
+		spin_unlock(&workqueue_lock);
 	}
 
 	free_percpu(wq->cpu_wq);
@@ -873,55 +817,82 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/* Take the work from this (downed) CPU. */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+	struct list_head list;
+	struct work_struct *work;
+
+	spin_lock_irq(&cwq->lock);
+	list_replace_init(&cwq->worklist, &list);
+
+	while (!list_empty(&list)) {
+		work = list_entry(list.next,struct work_struct,entry);
+		list_del(&work->entry);
+		__queue_work(per_cpu_ptr(wq->cpu_wq, smp_processor_id()), work);
+	}
+
+	spin_unlock_irq(&cwq->lock);
+}
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 						unsigned long action,
 						void *hcpu)
 {
 	struct workqueue_struct *wq;
 	struct cpu_workqueue_struct *cwq;
-	unsigned int cpu = (unsigned long)hcpu;
-	int ret = NOTIFY_OK;
+	unsigned int hotcpu = (unsigned long)hcpu;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		/* Create a new workqueue thread for it. */
+		list_for_each_entry(wq, &workqueues, list) {
+			cwq = per_cpu_ptr(wq->cpu_wq, hotcpu);
+			if (create_workqueue_thread(cwq, hotcpu)) {
+				printk("workqueue for %i failed\n", hotcpu);
+				return NOTIFY_BAD;
+			}
+		}
+		break;
 
-	mutex_lock(&workqueue_mutex);
-	embryonic_cpu = -1;
-	if (action == CPU_UP_PREPARE) {
-		cpu_set(cpu, cpu_populated_map);
-		embryonic_cpu = cpu;
-	}
-
-	list_for_each_entry(wq, &workqueues, list) {
-		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
-
-		switch (action) {
-		case CPU_UP_PREPARE:
-			if (create_workqueue_thread(cwq, cpu))
-				ret = NOTIFY_BAD;
-			break;
-
-		case CPU_ONLINE:
-			set_cpus_allowed(cwq->thread, cpumask_of_cpu(cpu));
-			break;
-
-		case CPU_UP_CANCELED:
-		case CPU_DEAD:
-			cwq->should_stop = 1;
-			wake_up(&cwq->more_work);
-			break;
+	case CPU_ONLINE:
+		/* Kick off worker threads. */
+		list_for_each_entry(wq, &workqueues, list) {
+			struct cpu_workqueue_struct *cwq;
+
+			cwq = per_cpu_ptr(wq->cpu_wq, hotcpu);
+			kthread_bind(cwq->thread, hotcpu);
+			wake_up_process(cwq->thread);
 		}
+		break;
 
-		if (ret != NOTIFY_OK) {
-			printk(KERN_ERR "workqueue for %i failed\n", cpu);
-			break;
+	case CPU_UP_CANCELED:
+		list_for_each_entry(wq, &workqueues, list) {
+			if (!per_cpu_ptr(wq->cpu_wq, hotcpu)->thread)
+				continue;
+			/* Unbind so it can run. */
+			kthread_bind(per_cpu_ptr(wq->cpu_wq, hotcpu)->thread,
+				any_online_cpu(cpu_online_map));
+			cleanup_workqueue_thread(wq, hotcpu);
 		}
+		break;
+
+	case CPU_DEAD:
+		list_for_each_entry(wq, &workqueues, list)
+			take_over_work(wq, hotcpu);
+		break;
+
+	case CPU_DEAD_KILL_THREADS:
+		list_for_each_entry(wq, &workqueues, list)
+			cleanup_workqueue_thread(wq, hotcpu);
 	}
-	mutex_unlock(&workqueue_mutex);
 
-	return ret;
+	return NOTIFY_OK;
 }
 
 void init_workqueues(void)
 {
-	cpu_populated_map = cpu_online_map;
 	singlethread_cpu = first_cpu(cpu_possible_map);
 	hotcpu_notifier(workqueue_cpu_callback, 0);
 	keventd_wq = create_workqueue("events");
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
@ 2007-02-14 14:43     ` Gautham R Shenoy
  2007-02-14 14:44       ` [RFC PATCH(Experimental) 4/4] Rip out lock_cpu_hotplug from linux Gautham R Shenoy
  2007-02-14 14:59     ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Srivatsa Vaddagiri
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-14 14:43 UTC (permalink / raw)
  To: akpm, paulmck, mingo
  Cc: vatsa, dipankar, venkatesh.pallipadi, linux-kernel, oleg, rjw, kiran

This patch removes the per-subsystem hotcpu mutexes from sched and
slab subsystems.

Signed-off-by : Gautham R Shenoy <ego@in.ibm.com>
--
kernel/sched.c |   16 ----------------
 mm/slab.c      |    6 ------
 2 files changed, 22 deletions(-)

Index: hotplug/kernel/sched.c
===================================================================
--- hotplug.orig/kernel/sched.c
+++ hotplug/kernel/sched.c
@@ -280,7 +280,6 @@ struct rq {
 };
 
 static DEFINE_PER_CPU(struct rq, runqueues);
-static DEFINE_MUTEX(sched_hotcpu_mutex);
 
 static inline int cpu_of(struct rq *rq)
 {
@@ -4454,13 +4453,11 @@ long sched_setaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		mutex_unlock(&sched_hotcpu_mutex);
 		return -ESRCH;
 	}
 
@@ -4487,7 +4484,6 @@ long sched_setaffinity(pid_t pid, cpumas
 
 out_unlock:
 	put_task_struct(p);
-	mutex_unlock(&sched_hotcpu_mutex);
 	return retval;
 }
 
@@ -4544,7 +4540,6 @@ long sched_getaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -4560,7 +4555,6 @@ long sched_getaffinity(pid_t pid, cpumas
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	mutex_unlock(&sched_hotcpu_mutex);
 	if (retval)
 		return retval;
 
@@ -5483,9 +5477,6 @@ migration_call(struct notifier_block *nf
 	struct rq *rq;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&sched_hotcpu_mutex);
-		break;
 
 	case CPU_UP_PREPARE:
 		p = kthread_create(migration_thread, hcpu, "migration/%d",cpu);
@@ -5549,9 +5540,6 @@ migration_call(struct notifier_block *nf
 		rq->migration_thread = NULL;
 		break;
 #endif
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&sched_hotcpu_mutex);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -6895,10 +6883,8 @@ int arch_reinit_sched_domains(void)
 {
 	int err;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
-	mutex_unlock(&sched_hotcpu_mutex);
 
 	return err;
 }
@@ -7003,12 +6989,10 @@ void __init sched_init_smp(void)
 {
 	cpumask_t non_isolated_cpus;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
-	mutex_unlock(&sched_hotcpu_mutex);
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
 
Index: hotplug/mm/slab.c
===================================================================
--- hotplug.orig/mm/slab.c
+++ hotplug/mm/slab.c
@@ -1179,9 +1179,6 @@ static int __cpuinit cpuup_callback(stru
 	int memsize = sizeof(struct kmem_list3);
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&cache_chain_mutex);
-		break;
 	case CPU_UP_PREPARE:
 		/*
 		 * We need to do this right in the beginning since
@@ -1342,9 +1339,6 @@ free_array_cache:
 			drain_freelist(cachep, l3, l3->free_objects);
 		}
 		break;
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&cache_chain_mutex);
-		break;
 	}
 	return NOTIFY_OK;
 bad:
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH(Experimental) 4/4] Rip out lock_cpu_hotplug from linux.
  2007-02-14 14:43     ` [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c Gautham R Shenoy
@ 2007-02-14 14:44       ` Gautham R Shenoy
  0 siblings, 0 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-14 14:44 UTC (permalink / raw)
  To: akpm, paulmck, mingo
  Cc: vatsa, dipankar, venkatesh.pallipadi, linux-kernel, oleg, rjw, kiran

This patch rips out lock_cpu_hotplug from the kernel.
Good Riddance!! (hopefully :) )

Signed-off-by : Gautham R Shenoy <ego@in.ibm.com>
--
 arch/i386/kernel/cpu/mtrr/main.c             |    6 ------
 arch/i386/kernel/microcode.c                 |    8 --------
 arch/mips/kernel/mips-mt.c                   |    5 -----
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    5 -----
 arch/powerpc/platforms/pseries/rtasd.c       |    4 ----
 include/linux/cpu.h                          |   20 --------------------
 kernel/cpu.c                                 |   17 -----------------
 kernel/rcutorture.c                          |    3 ---
 kernel/stop_machine.c                        |    3 ---
 net/core/flow.c                              |    2 --
 10 files changed, 73 deletions(-)

Index: hotplug/include/linux/cpu.h
===================================================================
--- hotplug.orig/include/linux/cpu.h
+++ hotplug/include/linux/cpu.h
@@ -76,18 +76,6 @@ extern struct sysdev_class cpu_sysdev_cl
 #ifdef CONFIG_HOTPLUG_CPU
 /* Stop CPUs going up and down. */
 
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{
-	mutex_lock(cpu_hp_mutex);
-}
-
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{
-	mutex_unlock(cpu_hp_mutex);
-}
-
-extern void lock_cpu_hotplug(void);
-extern void unlock_cpu_hotplug(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb =			\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -100,14 +88,6 @@ int cpu_down(unsigned int cpu);
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{ }
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{ }
-
-#define lock_cpu_hotplug()	do { } while (0)
-#define unlock_cpu_hotplug()	do { } while (0)
-#define lock_cpu_hotplug_interruptible() 0
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 #define register_hotcpu_notifier(nb)	do { (void)(nb); } while (0)
 #define unregister_hotcpu_notifier(nb)	do { (void)(nb); } while (0)
Index: hotplug/kernel/cpu.c
===================================================================
--- hotplug.orig/kernel/cpu.c
+++ hotplug/kernel/cpu.c
@@ -18,7 +18,6 @@
 
 /* This protects CPUs going up and down... */
 static DEFINE_MUTEX(cpu_add_remove_lock);
-static DEFINE_MUTEX(cpu_bitmask_lock);
 
 static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
 
@@ -27,22 +26,6 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(c
  */
 static int cpu_hotplug_disabled;
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-void lock_cpu_hotplug(void)
-{
-	return;
-}
-EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
-
-void unlock_cpu_hotplug(void)
-{
-	return;
-}
-EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
-
-#endif	/* CONFIG_HOTPLUG_CPU */
-
 /* Need to know about CPUs going up/down? */
 int __cpuinit register_cpu_notifier(struct notifier_block *nb)
 {
Index: hotplug/arch/i386/kernel/cpu/mtrr/main.c
===================================================================
--- hotplug.orig/arch/i386/kernel/cpu/mtrr/main.c
+++ hotplug/arch/i386/kernel/cpu/mtrr/main.c
@@ -346,8 +346,6 @@ int mtrr_add_page(unsigned long base, un
 	error = -EINVAL;
 	replace = -1;
 
-	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
 	/*  Search for existing MTRR  */
 	mutex_lock(&mtrr_mutex);
 	for (i = 0; i < num_var_ranges; ++i) {
@@ -403,7 +401,6 @@ int mtrr_add_page(unsigned long base, un
 	error = i;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
 	return error;
 }
 
@@ -492,8 +489,6 @@ int mtrr_del_page(int reg, unsigned long
 		return -ENXIO;
 
 	max = num_var_ranges;
-	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
 	mutex_lock(&mtrr_mutex);
 	if (reg < 0) {
 		/*  Search for existing MTRR  */
@@ -534,7 +529,6 @@ int mtrr_del_page(int reg, unsigned long
 	error = reg;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
 	return error;
 }
 /**
Index: hotplug/arch/i386/kernel/microcode.c
===================================================================
--- hotplug.orig/arch/i386/kernel/microcode.c
+++ hotplug/arch/i386/kernel/microcode.c
@@ -435,7 +435,6 @@ static ssize_t microcode_write (struct f
 		return -EINVAL;
 	}
 
-	lock_cpu_hotplug();
 	mutex_lock(&microcode_mutex);
 
 	user_buffer = (void __user *) buf;
@@ -446,7 +445,6 @@ static ssize_t microcode_write (struct f
 		ret = (ssize_t)len;
 
 	mutex_unlock(&microcode_mutex);
-	unlock_cpu_hotplug();
 
 	return ret;
 }
@@ -609,14 +607,12 @@ static ssize_t reload_store(struct sys_d
 
 		old = current->cpus_allowed;
 
-		lock_cpu_hotplug();
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
 
 		mutex_lock(&microcode_mutex);
 		if (uci->valid)
 			err = cpu_request_microcode(cpu);
 		mutex_unlock(&microcode_mutex);
-		unlock_cpu_hotplug();
 		set_cpus_allowed(current, old);
 	}
 	if (err)
@@ -740,9 +736,7 @@ static int __init microcode_init (void)
 		return PTR_ERR(microcode_pdev);
 	}
 
-	lock_cpu_hotplug();
 	error = sysdev_driver_register(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
 	if (error) {
 		microcode_dev_exit();
 		platform_device_unregister(microcode_pdev);
@@ -762,9 +756,7 @@ static void __exit microcode_exit (void)
 
 	unregister_hotcpu_notifier(&mc_cpu_notifier);
 
-	lock_cpu_hotplug();
 	sysdev_driver_unregister(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
 
 	platform_device_unregister(microcode_pdev);
 }
Index: hotplug/arch/mips/kernel/mips-mt.c
===================================================================
--- hotplug.orig/arch/mips/kernel/mips-mt.c
+++ hotplug/arch/mips/kernel/mips-mt.c
@@ -71,13 +71,11 @@ asmlinkage long mipsmt_sys_sched_setaffi
 	if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
 		return -EFAULT;
 
-	lock_cpu_hotplug();
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		unlock_cpu_hotplug();
 		return -ESRCH;
 	}
 
@@ -115,7 +113,6 @@ asmlinkage long mipsmt_sys_sched_setaffi
 
 out_unlock:
 	put_task_struct(p);
-	unlock_cpu_hotplug();
 	return retval;
 }
 
@@ -134,7 +131,6 @@ asmlinkage long mipsmt_sys_sched_getaffi
 	if (len < real_len)
 		return -EINVAL;
 
-	lock_cpu_hotplug();
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -148,7 +144,6 @@ asmlinkage long mipsmt_sys_sched_getaffi
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	unlock_cpu_hotplug();
 	if (retval)
 		return retval;
 	if (copy_to_user(user_mask_ptr, &mask, real_len))
Index: hotplug/arch/powerpc/platforms/pseries/hotplug-cpu.c
===================================================================
--- hotplug.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ hotplug/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -151,8 +151,6 @@ static int pseries_add_processor(struct 
 	for (i = 0; i < nthreads; i++)
 		cpu_set(i, tmp);
 
-	lock_cpu_hotplug();
-
 	BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map));
 
 	/* Get a bitmap of unoccupied slots. */
@@ -188,7 +186,6 @@ static int pseries_add_processor(struct 
 	}
 	err = 0;
 out_unlock:
-	unlock_cpu_hotplug();
 	return err;
 }
 
@@ -209,7 +206,6 @@ static void pseries_remove_processor(str
 
 	nthreads = len / sizeof(u32);
 
-	lock_cpu_hotplug();
 	for (i = 0; i < nthreads; i++) {
 		for_each_present_cpu(cpu) {
 			if (get_hard_smp_processor_id(cpu) != intserv[i])
@@ -223,7 +219,6 @@ static void pseries_remove_processor(str
 			printk(KERN_WARNING "Could not find cpu to remove "
 			       "with physical id 0x%x\n", intserv[i]);
 	}
-	unlock_cpu_hotplug();
 }
 
 static int pseries_smp_notifier(struct notifier_block *nb,
Index: hotplug/arch/powerpc/platforms/pseries/rtasd.c
===================================================================
--- hotplug.orig/arch/powerpc/platforms/pseries/rtasd.c
+++ hotplug/arch/powerpc/platforms/pseries/rtasd.c
@@ -404,7 +404,6 @@ static void do_event_scan_all_cpus(long 
 {
 	int cpu;
 
-	lock_cpu_hotplug();
 	cpu = first_cpu(cpu_online_map);
 	for (;;) {
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
@@ -412,15 +411,12 @@ static void do_event_scan_all_cpus(long 
 		set_cpus_allowed(current, CPU_MASK_ALL);
 
 		/* Drop hotplug lock, and sleep for the specified delay */
-		unlock_cpu_hotplug();
 		msleep_interruptible(delay);
-		lock_cpu_hotplug();
 
 		cpu = next_cpu(cpu, cpu_online_map);
 		if (cpu == NR_CPUS)
 			break;
 	}
-	unlock_cpu_hotplug();
 }
 
 static int rtasd(void *unused)
Index: hotplug/kernel/rcutorture.c
===================================================================
--- hotplug.orig/kernel/rcutorture.c
+++ hotplug/kernel/rcutorture.c
@@ -803,11 +803,9 @@ static void rcu_torture_shuffle_tasks(vo
 	cpumask_t tmp_mask = CPU_MASK_ALL;
 	int i;
 
-	lock_cpu_hotplug();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
 	if (num_online_cpus() == 1) {
-		unlock_cpu_hotplug();
 		return;
 	}
 
@@ -839,7 +837,6 @@ static void rcu_torture_shuffle_tasks(vo
 	else
 		rcu_idle_cpu--;
 
-	unlock_cpu_hotplug();
 }
 
 /* Shuffle tasks across CPUs, with the intent of allowing each CPU in the
Index: hotplug/kernel/stop_machine.c
===================================================================
--- hotplug.orig/kernel/stop_machine.c
+++ hotplug/kernel/stop_machine.c
@@ -197,14 +197,11 @@ int stop_machine_run(int (*fn)(void *), 
 	struct task_struct *p;
 	int ret;
 
-	/* No CPUs can come up or down during this. */
-	lock_cpu_hotplug();
 	p = __stop_machine_run(fn, data, cpu);
 	if (!IS_ERR(p))
 		ret = kthread_stop(p);
 	else
 		ret = PTR_ERR(p);
-	unlock_cpu_hotplug();
 
 	return ret;
 }
Index: hotplug/net/core/flow.c
===================================================================
--- hotplug.orig/net/core/flow.c
+++ hotplug/net/core/flow.c
@@ -296,7 +296,6 @@ void flow_cache_flush(void)
 	static DEFINE_MUTEX(flow_flush_sem);
 
 	/* Don't want cpus going down or up during this. */
-	lock_cpu_hotplug();
 	mutex_lock(&flow_flush_sem);
 	atomic_set(&info.cpuleft, num_online_cpus());
 	init_completion(&info.completion);
@@ -308,7 +307,6 @@ void flow_cache_flush(void)
 
 	wait_for_completion(&info.completion);
 	mutex_unlock(&flow_flush_sem);
-	unlock_cpu_hotplug();
 }
 
 static void __devinit flow_cache_cpu_prepare(int cpu)
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
  2007-02-14 14:43     ` [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c Gautham R Shenoy
@ 2007-02-14 14:59     ` Srivatsa Vaddagiri
  2007-02-14 15:24     ` Srivatsa Vaddagiri
  2007-02-14 20:09     ` Oleg Nesterov
  3 siblings, 0 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-14 14:59 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg, rjw

On Wed, Feb 14, 2007 at 08:13:05PM +0530, Gautham R Shenoy wrote:
> This patch reverts all the recent workqueue hacks added to make it 
> hotplug safe. 

Oleg,
	This patch probably needs review for any races we may have
missed to account for.  Also we have considered only workqueue.c present
in 2.6.20-rc6-mm3, which means some recent patches in mm tree which are
yet to be published arent accounted for.

Also I expect almost all worker threads to be frozen "for hotplug". The
only exception I have found so far is kthread workqueue, which needs to
remain unfrozen "for cpu hotplug" because kthread_create (in
CPU_UP_PREPARE and stop_machine) relies on its services (while everyone else 
is frozen). 

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
  2007-02-14 14:43     ` [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c Gautham R Shenoy
  2007-02-14 14:59     ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Srivatsa Vaddagiri
@ 2007-02-14 15:24     ` Srivatsa Vaddagiri
  2007-02-14 20:23       ` Oleg Nesterov
  2007-02-14 20:09     ` Oleg Nesterov
  3 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-14 15:24 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg, rjw

On Wed, Feb 14, 2007 at 08:13:05PM +0530, Gautham R Shenoy wrote:
> +	switch (action) {
> +	case CPU_UP_PREPARE:
> +		/* Create a new workqueue thread for it. */
> +		list_for_each_entry(wq, &workqueues, list) {

Its probably safe to take the workqueue (spin) lock here (and other
notifiers as well), before traversing the list.

> +			cwq = per_cpu_ptr(wq->cpu_wq, hotcpu);
> +			if (create_workqueue_thread(cwq, hotcpu)) {
> +				printk("workqueue for %i failed\n", hotcpu);
> +				return NOTIFY_BAD;
> +			}
> +		}
> +		break;

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
@ 2007-02-14 15:31   ` Srivatsa Vaddagiri
  2007-02-14 19:47   ` Oleg Nesterov
  2007-02-14 20:22   ` Oleg Nesterov
  3 siblings, 0 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-14 15:31 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg, rjw

On Wed, Feb 14, 2007 at 08:12:29PM +0530, Gautham R Shenoy wrote:
> o Can the SYSTEM_RUNNING hack in _cpu_up be avoided by some cleaner means.

Basically freeze_processes doesnt seem to work at the early stages of
bootup (during smp_init) and hence the hack.

One option is to investigate why it didnt work and possibly make it work
at that early stage as well ..


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
  2007-02-14 15:31   ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Srivatsa Vaddagiri
@ 2007-02-14 19:47   ` Oleg Nesterov
  2007-02-16  6:48     ` Srivatsa Vaddagiri
  2007-02-14 20:22   ` Oleg Nesterov
  3 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-14 19:47 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, rjw

Gautham, I'll try to apply this patch and read the code on Sunday, right
now a couple of comments about workqueue.c changes.

On 02/14, Gautham R Shenoy wrote:
>
> --- hotplug.orig/kernel/workqueue.c
> +++ hotplug/kernel/workqueue.c
> @@ -368,6 +368,7 @@ static int worker_thread(void *__cwq)
>  	DEFINE_WAIT(wait);
>  	struct k_sigaction sa;
>  	sigset_t blocked;
> +	int bind_cpu = smp_processor_id();
>  
>  	if (!cwq->wq->freezeable)
>  		current->flags |= PF_NOFREEZE;
> @@ -392,8 +393,11 @@ static int worker_thread(void *__cwq)
>  	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
>  
>  	for (;;) {
> -		if (cwq->wq->freezeable)
> +		if (cwq->wq->freezeable) {

Else? This is wrong. The change like this should start from making all
cwq->threads freezeable, otherwise it just doesn't work.

>  			try_to_freeze();
> +			if (cpu_is_offline(bind_cpu))
> +				goto wait_to_die;
> +		}
>
> ...
>
> +
> +wait_to_die:
> +	/* Wait for kthread_stop */
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	while (!kthread_should_stop()) {
> +		schedule();
> +		set_current_state(TASK_INTERRUPTIBLE);
> +	}
> +	__set_current_state(TASK_RUNNING);
> +	return 0;
>  }

I believe this is not needed, see the comments for the next patch.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
                       ` (2 preceding siblings ...)
  2007-02-14 15:24     ` Srivatsa Vaddagiri
@ 2007-02-14 20:09     ` Oleg Nesterov
  2007-02-16  5:26       ` Srivatsa Vaddagiri
  3 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-14 20:09 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, rjw

On 02/14, Gautham R Shenoy wrote:
>
> This patch reverts all the recent workqueue hacks added to make it 
> hotplug safe. 

In my opinion these hacks are cleanups :)

Ok. If we use freezer then yes, we can remove cpu_populated_map and just
use for_each_online_cpu(). This is easy and good.

What else you don't like? Why do you want to remove cwq_should_stop() and
restore an ugly (ugly for workqueue.c) kthread_stop/kthread_should_stop() ?

We can restore take_over_works(), although I don't see why this is needed.
But cwq_should_stop() will just work regardless, why do you want to add
this "wait_to_die" ... well, hack :)

> -static DEFINE_MUTEX(workqueue_mutex);
> +static DEFINE_SPINLOCK(workqueue_lock);

No. We can't do this. see below.

>  struct workqueue_struct *__create_workqueue(const char *name,
>  					    int singlethread, int freezeable)
>  {
> @@ -798,17 +756,20 @@ struct workqueue_struct *__create_workqu
>  		INIT_LIST_HEAD(&wq->list);
>  		cwq = init_cpu_workqueue(wq, singlethread_cpu);
>  		err = create_workqueue_thread(cwq, singlethread_cpu);
> +		if (!err)
> +			wake_up_process(cwq->thread);
>  	} else {
> -		mutex_lock(&workqueue_mutex);
> +		spin_lock(&workqueue_lock);
>  		list_add(&wq->list, &workqueues);
> -
> -		for_each_possible_cpu(cpu) {
> +		spin_unlock(&workqueue_lock);
> +		for_each_online_cpu(cpu) {
>  			cwq = init_cpu_workqueue(wq, cpu);
> -			if (err || !(cpu_online(cpu) || cpu == embryonic_cpu))
> -				continue;
>  			err = create_workqueue_thread(cwq, cpu);
> +			if (err)
> +				break;

No, we can't break. We are going to execute destroy_workqueue(), it will
iterate over all cwqs.

> +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> +{
> +	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> +	struct list_head list;
> +	struct work_struct *work;
> +
> +	spin_lock_irq(&cwq->lock);
> +	list_replace_init(&cwq->worklist, &list);
> +
> +	while (!list_empty(&list)) {
> +		work = list_entry(list.next,struct work_struct,entry);
> +		list_del(&work->entry);
> +		__queue_work(per_cpu_ptr(wq->cpu_wq, smp_processor_id()), work);
> +	}
> +
> +	spin_unlock_irq(&cwq->lock);
> +}

I think this is unneeded complication, but ok, should work.

>  static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
>  						unsigned long action,
>  						void *hcpu)
> +	case CPU_UP_CANCELED:
> +		list_for_each_entry(wq, &workqueues, list) {
> +			if (!per_cpu_ptr(wq->cpu_wq, hotcpu)->thread)
> +				continue;
> +			/* Unbind so it can run. */
> +			kthread_bind(per_cpu_ptr(wq->cpu_wq, hotcpu)->thread,
> +				any_online_cpu(cpu_online_map));
> +			cleanup_workqueue_thread(wq, hotcpu);
>  		}
> +		break;
> +
> +	case CPU_DEAD:
> +		list_for_each_entry(wq, &workqueues, list)
> +			take_over_work(wq, hotcpu);
> +		break;
> +
> +	case CPU_DEAD_KILL_THREADS:
> +		list_for_each_entry(wq, &workqueues, list)
> +			cleanup_workqueue_thread(wq, hotcpu);
>  	}

Both CPU_UP_CANCELED and CPU_DEAD_KILL_THREADS runs after thaw_processes(),
this means that workqueue_cpu_callback() is racy wrt create/destroy workqueue,
we should take the mutex, and it can't be spinlock_t.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
                     ` (2 preceding siblings ...)
  2007-02-14 19:47   ` Oleg Nesterov
@ 2007-02-14 20:22   ` Oleg Nesterov
  2007-02-16  7:16     ` Srivatsa Vaddagiri
  3 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-14 20:22 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, rjw

On 02/14, Gautham R Shenoy wrote:
>
> o Splits CPU_DEAD into two events namely
>   - CPU_DEAD: which will be handled while the processes are still
>               frozen.
> 
>   - CPU_DEAD_KILL_THREADS: To be handled after we thaw_processes.


Imho, this is not right. This change the meaning of CPU_DEAD, and so
we should fix all users of CPU_DEAD as well.

How about

	CPU_DEAD_WHATEVER
		the processes are still frozen

	CPU_DEAD
		after we thaw_processes

This way we can add processing of the new CPU_DEAD_WHATEVER event where
it may help. We don't need to change (for example) workqueue.c with this
patch, we can do it in a separate patch.


CPU_UP_PREPARE is called after freeze_processes()... Probably this works,
but imho this is no good. Suppose for a moment that khelper will be frozen
(yes, yes it can't be), then we can't do kthread_create().

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 15:24     ` Srivatsa Vaddagiri
@ 2007-02-14 20:23       ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-14 20:23 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/14, Srivatsa Vaddagiri wrote:
>
> On Wed, Feb 14, 2007 at 08:13:05PM +0530, Gautham R Shenoy wrote:
> > +	switch (action) {
> > +	case CPU_UP_PREPARE:
> > +		/* Create a new workqueue thread for it. */
> > +		list_for_each_entry(wq, &workqueues, list) {
> 
> Its probably safe to take the workqueue (spin) lock here (and other
> notifiers as well), before traversing the list.

We can't fork() under spin lock.

> 
> > +			cwq = per_cpu_ptr(wq->cpu_wq, hotcpu);
> > +			if (create_workqueue_thread(cwq, hotcpu)) {
> > +				printk("workqueue for %i failed\n", hotcpu);
> > +				return NOTIFY_BAD;
> > +			}
> > +		}
> > +		break;

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-14 14:40 [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Gautham R Shenoy
  2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
@ 2007-02-14 21:43 ` Rafael J. Wysocki
  2007-02-15  6:34   ` Gautham R Shenoy
       [not found] ` <200702231041.17136.rjw@sisk.pl>
  2 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-14 21:43 UTC (permalink / raw)
  To: ego
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

Hi,

On Wednesday, 14 February 2007 15:40, Gautham R Shenoy wrote:
> Hello Everybody,
> 
> This is an experiment towards process_freezer based implementation
> of cpu-hotplug. This is mainly based on ideas of Andrew Morton, 
> Ingo Molnar and Paul Mckenney featured in the discussion
> http://lkml.org/lkml/2007/1/31/323.
> 
> This is an absolute bare-minimal implementation to check the feasibility
> of using process freezer for cpu-hotplug. 
> 
> The patchset comprises of four patches.
> o PATCH 1/4: Core implementation of freezer-based-hotplug.
> o PATCH 2/4: Revert changes to workqueue to make it work with the
>              freezer-cpu-hotplug.
> o PATCH 3/4: Eliminate hotcpu subsystem mutexes from sched and slab.
> o PATCH 4/4: Eliminate lock_cpu_hotplug from the kernel.

I think two things are missing:

1) We should make sure there are not PF_NOFREEZE tasks running when a CPU
is removed (when one is added probably too).  For this purpose we can add a
parameter to freeze_processes() that will tell it to ignore PF_NOFREEZE, but
at the same time we'll have to change all kernel threads that set PF_NOFREEZE
to call try_to_freeze() anyway.  I can do that, but it will take me a couple of
days.

2) We have to change the PM code to stop using CPU hotplug for disabling
nonboot CPUs. ;-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-14 21:43 ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
@ 2007-02-15  6:34   ` Gautham R Shenoy
  2007-02-15  8:09     ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-15  6:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

On Wed, Feb 14, 2007 at 10:43:35PM +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> On Wednesday, 14 February 2007 15:40, Gautham R Shenoy wrote:
> > Hello Everybody,
> > 
> > This is an experiment towards process_freezer based implementation
> > of cpu-hotplug. This is mainly based on ideas of Andrew Morton, 
> > Ingo Molnar and Paul Mckenney featured in the discussion
> > http://lkml.org/lkml/2007/1/31/323.
> > 
> > This is an absolute bare-minimal implementation to check the feasibility
> > of using process freezer for cpu-hotplug. 
> > 
> > The patchset comprises of four patches.
> > o PATCH 1/4: Core implementation of freezer-based-hotplug.
> > o PATCH 2/4: Revert changes to workqueue to make it work with the
> >              freezer-cpu-hotplug.
> > o PATCH 3/4: Eliminate hotcpu subsystem mutexes from sched and slab.
> > o PATCH 4/4: Eliminate lock_cpu_hotplug from the kernel.
> 
> I think two things are missing:
> 
> 1) We should make sure there are not PF_NOFREEZE tasks running when a CPU
> is removed (when one is added probably too).  For this purpose we can add a
> parameter to freeze_processes() that will tell it to ignore PF_NOFREEZE, but
> at the same time we'll have to change all kernel threads that set PF_NOFREEZE
> to call try_to_freeze() anyway.  I can do that, but it will take me a couple of
> days.

Why should we make sure that PF_NOFREEZE tasks are also frozen for
cpu hotplug? Instead, we can create an infrastructure which allows threads to
specify for the scenarios they would want to be excempted from freeze.
Something like what Paul has suggested in
http://lkml.org/lkml/2007/1/31/323. That way, threads which have nothing
to do with the online_cpu_map or with handling of cpu-hotplug events can
mark themselves to be exempted from being frozen for cpu hotplug.

Once this is achieved, it's all about classifying the threads into
according to their NO_FREEZE needs :)

> 
> 2) We have to change the PM code to stop using CPU hotplug for disabling
> nonboot CPUs. ;-)

Just wondering, how hard is that ?

thanks
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-15  6:34   ` Gautham R Shenoy
@ 2007-02-15  8:09     ` Rafael J. Wysocki
  2007-02-15 12:20       ` Gautham R Shenoy
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-15  8:09 UTC (permalink / raw)
  To: ego
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

On Thursday, 15 February 2007 07:34, Gautham R Shenoy wrote:
> On Wed, Feb 14, 2007 at 10:43:35PM +0100, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Wednesday, 14 February 2007 15:40, Gautham R Shenoy wrote:
> > > Hello Everybody,
> > > 
> > > This is an experiment towards process_freezer based implementation
> > > of cpu-hotplug. This is mainly based on ideas of Andrew Morton, 
> > > Ingo Molnar and Paul Mckenney featured in the discussion
> > > http://lkml.org/lkml/2007/1/31/323.
> > > 
> > > This is an absolute bare-minimal implementation to check the feasibility
> > > of using process freezer for cpu-hotplug. 
> > > 
> > > The patchset comprises of four patches.
> > > o PATCH 1/4: Core implementation of freezer-based-hotplug.
> > > o PATCH 2/4: Revert changes to workqueue to make it work with the
> > >              freezer-cpu-hotplug.
> > > o PATCH 3/4: Eliminate hotcpu subsystem mutexes from sched and slab.
> > > o PATCH 4/4: Eliminate lock_cpu_hotplug from the kernel.
> > 
> > I think two things are missing:
> > 
> > 1) We should make sure there are not PF_NOFREEZE tasks running when a CPU
> > is removed (when one is added probably too).  For this purpose we can add a
> > parameter to freeze_processes() that will tell it to ignore PF_NOFREEZE, but
> > at the same time we'll have to change all kernel threads that set PF_NOFREEZE
> > to call try_to_freeze() anyway.  I can do that, but it will take me a couple of
> > days.
> 
> Why should we make sure that PF_NOFREEZE tasks are also frozen for
> cpu hotplug? Instead, we can create an infrastructure which allows threads to
> specify for the scenarios they would want to be excempted from freeze.
> Something like what Paul has suggested in
> http://lkml.org/lkml/2007/1/31/323. That way, threads which have nothing
> to do with the online_cpu_map or with handling of cpu-hotplug events can
> mark themselves to be exempted from being frozen for cpu hotplug.

I think all kernel threads should call try_to_freeze() in suitable places
anyway if we are going to use the freezer for anything more than just the
suspend.  In other words, they all should be _able_ to freeze if necessary.

> Once this is achieved, it's all about classifying the threads into
> according to their NO_FREEZE needs :)

Yes, but I think it's just a generalization of ingoring PF_NOFREEZE.
If all kernel threads are able to freeze, we can mark them as "freeze for CPU
hotplug" or "freeze for kprobes", or "freeze for suspend" etc. and call the
freezer with the appropriate parameter.

BTW, what happens to a process running on a CPU being removed?
 
> > 2) We have to change the PM code to stop using CPU hotplug for disabling
> > nonboot CPUs. ;-)
> 
> Just wondering, how hard is that ?

Hmmm.  In fact the problem is that the suspend code freezes tasks and then
calls disable_nonboot_cpus() which uses (_)cpu_down/up().  In principle we
could make disable_nonboot_cpus() call some lower-level routines to avoid the
freezing of tasks, _but_ the suspend code may freeze too few tasks (ie. we may
want to freeze more tasks for the CPU hotplug).  Thus I think we should do
something like this:

suspend:				CPU hotplug:
freeze_processes(SUSPEND)	...
...					freeze_processes(CPU_HOTPLUG)
...					...
...					thaw_processes(CPU_HOTPLUG)
thaw_processes(SUSPEND)	...

so freeze_processes() should be reentrant, at least for different values of
the argument.

All in all, I think we should start from modifying the freezer and the
classification of processes with respect to the freezing.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-15  8:09     ` Rafael J. Wysocki
@ 2007-02-15 12:20       ` Gautham R Shenoy
  2007-02-15 13:31         ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-15 12:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

On Thu, Feb 15, 2007 at 09:09:51AM +0100, Rafael J. Wysocki wrote:
> > 
> > Why should we make sure that PF_NOFREEZE tasks are also frozen for
> > cpu hotplug? Instead, we can create an infrastructure which allows threads to
> > specify for the scenarios they would want to be excempted from freeze.
> > Something like what Paul has suggested in
> > http://lkml.org/lkml/2007/1/31/323. That way, threads which have nothing
> > to do with the online_cpu_map or with handling of cpu-hotplug events can
> > mark themselves to be exempted from being frozen for cpu hotplug.
> 
> I think all kernel threads should call try_to_freeze() in suitable places
> anyway if we are going to use the freezer for anything more than just the
> suspend.  In other words, they all should be _able_ to freeze if necessary.

Yeah! I agree. I misunderstood your earlier point. I thought you were 
hinting at freezing *everyone* while doing a cpu hotplug.

> 
> > Once this is achieved, it's all about classifying the threads into
> > according to their NO_FREEZE needs :)
> 
> Yes, but I think it's just a generalization of ingoring PF_NOFREEZE.
> If all kernel threads are able to freeze, we can mark them as "freeze for CPU
> hotplug" or "freeze for kprobes", or "freeze for suspend" etc. and call the
> freezer with the appropriate parameter.
> 
> BTW, what happens to a process running on a CPU being removed?
> 

We call stop_machine_run in _cpu_down which schedules an idle thread on 
the cpu to be removed. Once the idle thread runs, we call __cpu_die and
subsequently the scheduler performs task migration while handling 
the CPU_DEAD notification (see migration_call in sched.c)

> > > 2) We have to change the PM code to stop using CPU hotplug for disabling
> > > nonboot CPUs. ;-)
> > 
> > Just wondering, how hard is that ?
> 
> Hmmm.  In fact the problem is that the suspend code freezes tasks and then
> calls disable_nonboot_cpus() which uses (_)cpu_down/up().  In principle we
> could make disable_nonboot_cpus() call some lower-level routines to avoid the
> freezing of tasks, _but_ the suspend code may freeze too few tasks (ie. we may
> want to freeze more tasks for the CPU hotplug).  Thus I think we should do
> something like this:
> 
> suspend:				CPU hotplug:
> freeze_processes(SUSPEND)	...
> ...					freeze_processes(CPU_HOTPLUG)
> ...					...
> ...					thaw_processes(CPU_HOTPLUG)
> thaw_processes(SUSPEND)	...
> 
> so freeze_processes() should be reentrant, at least for different values of
> the argument.
> 

That would still mean going over the task list twice. How if we have

freeze_process(SUSPEND|CPU_HOTPLUG);
perform_pre_hotplug_suspend();
primitive_cpu_down/_up();
perform_post_hotplug_suspend();

Does this look like a good thing to you?
> All in all, I think we should start from modifying the freezer and the
> classification of processes with respect to the freezing.
> 

Cool! Lets get started then ;-)

> Greetings,
> Rafael

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-15 12:20       ` Gautham R Shenoy
@ 2007-02-15 13:31         ` Rafael J. Wysocki
  2007-02-15 14:25           ` Gautham R Shenoy
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-15 13:31 UTC (permalink / raw)
  To: ego
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

On Thursday, 15 February 2007 13:20, Gautham R Shenoy wrote:
> On Thu, Feb 15, 2007 at 09:09:51AM +0100, Rafael J. Wysocki wrote:
> > > 
> > > Why should we make sure that PF_NOFREEZE tasks are also frozen for
> > > cpu hotplug? Instead, we can create an infrastructure which allows threads to
> > > specify for the scenarios they would want to be excempted from freeze.
> > > Something like what Paul has suggested in
> > > http://lkml.org/lkml/2007/1/31/323. That way, threads which have nothing
> > > to do with the online_cpu_map or with handling of cpu-hotplug events can
> > > mark themselves to be exempted from being frozen for cpu hotplug.
> > 
> > I think all kernel threads should call try_to_freeze() in suitable places
> > anyway if we are going to use the freezer for anything more than just the
> > suspend.  In other words, they all should be _able_ to freeze if necessary.
> 
> Yeah! I agree. I misunderstood your earlier point. I thought you were 
> hinting at freezing *everyone* while doing a cpu hotplug.

So I think tonight I'll start adding try_to_freeze() to the kernel threads that
set PF_NOFREEZE.

> 
> > 
> > > Once this is achieved, it's all about classifying the threads into
> > > according to their NO_FREEZE needs :)
> > 
> > Yes, but I think it's just a generalization of ingoring PF_NOFREEZE.
> > If all kernel threads are able to freeze, we can mark them as "freeze for CPU
> > hotplug" or "freeze for kprobes", or "freeze for suspend" etc. and call the
> > freezer with the appropriate parameter.
> > 
> > BTW, what happens to a process running on a CPU being removed?
> > 
> 
> We call stop_machine_run in _cpu_down which schedules an idle thread on 
> the cpu to be removed. Once the idle thread runs, we call __cpu_die and
> subsequently the scheduler performs task migration while handling 
> the CPU_DEAD notification (see migration_call in sched.c)

Ah, thanks for the explanation.

> > > > 2) We have to change the PM code to stop using CPU hotplug for disabling
> > > > nonboot CPUs. ;-)
> > > 
> > > Just wondering, how hard is that ?
> > 
> > Hmmm.  In fact the problem is that the suspend code freezes tasks and then
> > calls disable_nonboot_cpus() which uses (_)cpu_down/up().  In principle we
> > could make disable_nonboot_cpus() call some lower-level routines to avoid the
> > freezing of tasks, _but_ the suspend code may freeze too few tasks (ie. we may
> > want to freeze more tasks for the CPU hotplug).  Thus I think we should do
> > something like this:
> > 
> > suspend:				CPU hotplug:
> > freeze_processes(SUSPEND)	...
> > ...					freeze_processes(CPU_HOTPLUG)
> > ...					...
> > ...					thaw_processes(CPU_HOTPLUG)
> > thaw_processes(SUSPEND)	...
> > 
> > so freeze_processes() should be reentrant, at least for different values of
> > the argument.
> > 
> 
> That would still mean going over the task list twice.

Yes, but I think this is inevitable anyway, because we have moved the
disabling of nonboot CPUs after the suspending of devices (for
ACPI-related reasons).

Currently, we have, roughly:

freeze_processes();
shrink_memory(); (swsusp only)
suspend_devices();
disable_nonboot_cpus();
suspend

and the reverse during the resume.

Still, the second pass will be quick, since the majority of tasks are frozen
when disable_nonboot_cpus() is called.

> How if we have 
> 
> freeze_process(SUSPEND|CPU_HOTPLUG);
> perform_pre_hotplug_suspend();
> primitive_cpu_down/_up();
> perform_post_hotplug_suspend();
> 
> Does this look like a good thing to you?
> > All in all, I think we should start from modifying the freezer and the
> > classification of processes with respect to the freezing.
> > 
> 
> Cool! Lets get started then ;-)

No problem with that. ;-)

Speaking of the classification, do you think it would be practical to use
some kind of "freezing levels"?  I mean, for each task we can define the
"freezing level" at which it should be frozen and each user of the freezer
can call it with a specific "freezing level" as a parameter.  Of course for
this purpose the tasks frozen at level 1 have to be a subset of the tasks
frozen at level 2 etc. and I'm not sure if this requirement can be satisfied.

Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-15 13:31         ` Rafael J. Wysocki
@ 2007-02-15 14:25           ` Gautham R Shenoy
  2007-02-17 11:24             ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-15 14:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

On Thu, Feb 15, 2007 at 02:31:08PM +0100, Rafael J. Wysocki wrote:
> 
> So I think tonight I'll start adding try_to_freeze() to the kernel threads that
> set PF_NOFREEZE.

cool! While you are at it, let me try to enhance the freezer api's
to incorporate the PFE_* flags.

> > 
> > That would still mean going over the task list twice.
> 
> Yes, but I think this is inevitable anyway, because we have moved the
> disabling of nonboot CPUs after the suspending of devices (for
> ACPI-related reasons).
> 
> Currently, we have, roughly:
> 
> freeze_processes();
> shrink_memory(); (swsusp only)
> suspend_devices();
> disable_nonboot_cpus();
> suspend
> 
> and the reverse during the resume.
> 
> Still, the second pass will be quick, since the majority of tasks are frozen
> when disable_nonboot_cpus() is called.

Ok. That's fine by me. Lets get it working first. We can always optimize
it later :-)

> > 
> > Cool! Lets get started then ;-)
> 
> No problem with that. ;-)
> 
> Speaking of the classification, do you think it would be practical to use
> some kind of "freezing levels"?  I mean, for each task we can define the
> "freezing level" at which it should be frozen and each user of the freezer
> can call it with a specific "freezing level" as a parameter.  Of course for
> this purpose the tasks frozen at level 1 have to be a subset of the tasks
> frozen at level 2 etc. and I'm not sure if this requirement can be satisfied.

A freeze hierarchy! I hope this freeze_level parameter ain't an alternative
to the PFE_* flags because then a task would be in a dilemma as to
what freezing level it should be at, if it wants to be frozen for lets
say kprobes but not for cpu-hotplug. Cpu-hotplug and kprobes
may not have a dependency like the one that exists between cpu-hotplug
and suspend. So, at this moment, even I am not sure if there is a
need for the hierarchy.

> 
> Rafael

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-14 20:09     ` Oleg Nesterov
@ 2007-02-16  5:26       ` Srivatsa Vaddagiri
  2007-02-16 15:33         ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16  5:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Wed, Feb 14, 2007 at 11:09:04PM +0300, Oleg Nesterov wrote:
> What else you don't like? Why do you want to remove cwq_should_stop() and
> restore an ugly (ugly for workqueue.c) kthread_stop/kthread_should_stop() ?

What is ugly abt kthread_stop in workqueue.c?

I feel it is nice if the cleanup is synchronous i.e when cpu_down() is
complete, all the dead cpu's worker threads would have terminated.
Otherwise we expose races between CPU_UP_PREPARE/kthread_create and the
(old) thread exiting.

> We can restore take_over_works(), although I don't see why this is needed.
> But cwq_should_stop() will just work regardless, why do you want to add
> this "wait_to_die" ... well, hack :)

wait_to_die is not a new "hack"! Its already used in several other
places ..

> > -static DEFINE_MUTEX(workqueue_mutex);
> > +static DEFINE_SPINLOCK(workqueue_lock);
> 
> No. We can't do this. see below.

Ok ..

> >  struct workqueue_struct *__create_workqueue(const char *name,
> >  					    int singlethread, int freezeable)
> >  {
> > @@ -798,17 +756,20 @@ struct workqueue_struct *__create_workqu
> >  		INIT_LIST_HEAD(&wq->list);
> >  		cwq = init_cpu_workqueue(wq, singlethread_cpu);
> >  		err = create_workqueue_thread(cwq, singlethread_cpu);
> > +		if (!err)
> > +			wake_up_process(cwq->thread);
> >  	} else {
> > -		mutex_lock(&workqueue_mutex);
> > +		spin_lock(&workqueue_lock);
> >  		list_add(&wq->list, &workqueues);
> > -
> > -		for_each_possible_cpu(cpu) {
> > +		spin_unlock(&workqueue_lock);
> > +		for_each_online_cpu(cpu) {
> >  			cwq = init_cpu_workqueue(wq, cpu);
> > -			if (err || !(cpu_online(cpu) || cpu == embryonic_cpu))
> > -				continue;
> >  			err = create_workqueue_thread(cwq, cpu);
> > +			if (err)
> > +				break;
> 
> No, we can't break. We are going to execute destroy_workqueue(), it will
> iterate over all cwqs.

and try to kthread_stop() uninitialized cwq->thread?

How abt retaining the break above but setting cwq->thread = NULL in
create_workqueue_thread in failure case?

> > +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> > +{

<snip>

> > +}
> 
> I think this is unneeded complication, but ok, should work.

This is required if we want to stop per-cpu threads synchronously.

> > +	case CPU_DEAD:
> > +		list_for_each_entry(wq, &workqueues, list)
> > +			take_over_work(wq, hotcpu);
> > +		break;
> > +
> > +	case CPU_DEAD_KILL_THREADS:
> > +		list_for_each_entry(wq, &workqueues, list)
> > +			cleanup_workqueue_thread(wq, hotcpu);
> >  	}
> 
> Both CPU_UP_CANCELED and CPU_DEAD_KILL_THREADS runs after thaw_processes(),
> this means that workqueue_cpu_callback() is racy wrt create/destroy workqueue,
> we should take the mutex, and it can't be spinlock_t.

Ok yes ..thanks for pointing out!

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 19:47   ` Oleg Nesterov
@ 2007-02-16  6:48     ` Srivatsa Vaddagiri
  2007-02-16 15:47       ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16  6:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Wed, Feb 14, 2007 at 10:47:42PM +0300, Oleg Nesterov wrote:
> >  	for (;;) {
> > -		if (cwq->wq->freezeable)
> > +		if (cwq->wq->freezeable) {
> 
> Else? This is wrong. The change like this should start from making all
> cwq->threads freezeable, otherwise it just doesn't work.

I agree we need to have all threads frozen for hotplug. Only exception I
have found is kthread workqueue, which needs to be active after
freeze_processes(). stop_machine and CPU_UP_PREPARE/kthread_create()
depend on it to work.

A worker thread (like kthread workqueue), which has exempted itself from 
hotplug-freeze, should essentially be prepared to get preempted any time and 
made to run on any cpu. If that is the case, do you see any problems in having 
the if () statement above?

> > +wait_to_die:
> > +	/* Wait for kthread_stop */
> > +	set_current_state(TASK_INTERRUPTIBLE);
> > +	while (!kthread_should_stop()) {
> > +		schedule();
> > +		set_current_state(TASK_INTERRUPTIBLE);
> > +	}
> > +	__set_current_state(TASK_RUNNING);
> > +	return 0;
> >  }
> 
> I believe this is not needed, see the comments for the next patch.

Without this, thread cleanup (cwq->should_stop)/create(CPU_UP_PREPARE) becomes 
racy 

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-14 20:22   ` Oleg Nesterov
@ 2007-02-16  7:16     ` Srivatsa Vaddagiri
  2007-02-16  8:12       ` Srivatsa Vaddagiri
  2007-02-16 16:06       ` Oleg Nesterov
  0 siblings, 2 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16  7:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Wed, Feb 14, 2007 at 11:22:09PM +0300, Oleg Nesterov wrote:
> > o Splits CPU_DEAD into two events namely
> >   - CPU_DEAD: which will be handled while the processes are still
> >               frozen.
> > 
> >   - CPU_DEAD_KILL_THREADS: To be handled after we thaw_processes.
> 
> 
> Imho, this is not right. This change the meaning of CPU_DEAD, and so
> we should fix all users of CPU_DEAD as well.

Why should we fix all users? Only users who were doing a kthread_stop()
in CPU_DEAD need to be fixed. From my count, only 5 users (out of a
total of 35) need to be fixed to not do kthread_stop in CPU_DEAD.

> 
> How about
> 
> 	CPU_DEAD_WHATEVER
> 		the processes are still frozen
> 
> 	CPU_DEAD
> 		after we thaw_processes
> 
> This way we can add processing of the new CPU_DEAD_WHATEVER event where
> it may help. 

Well, -most- of the work needs to be done in a state when processes are
frozen. The only exception is cleaning up of per-cpu threads (which is
not possible with processes frozen - if we can find a way to make that
possible, then everything can be done in CPU_DEAD).

If we go by the change suggested above, then we need to fix all users of
CPU_DEAD to do what they are doing in CPU_DEAD_WHATEVER (when processes
are frozen). I would rather avoid this invasive change and let
CPU_DEAD be sent with processes frozen still.

> CPU_UP_PREPARE is called after freeze_processes()... Probably this works,
> but imho this is no good. Suppose for a moment that khelper will be frozen
> (yes, yes it can't be), then we can't do kthread_create().

Yes, I am worried about doing so many things with processes frozen.
Maybe time (and more testing) will tell us if this is a bad thing or
not. The only dependency I have found so far is that kthread workqueue needs to
be up (and hence its worker thread needs to be exempted from hotplug
freeze). We should mark kthread workqueue accordingly as not freezable
for hotplug.


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  7:16     ` Srivatsa Vaddagiri
@ 2007-02-16  8:12       ` Srivatsa Vaddagiri
  2007-02-16  9:29         ` Rafael J. Wysocki
                           ` (2 more replies)
  2007-02-16 16:06       ` Oleg Nesterov
  1 sibling, 3 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16  8:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Fri, Feb 16, 2007 at 12:46:17PM +0530, Srivatsa Vaddagiri wrote:
> frozen. The only exception is cleaning up of per-cpu threads (which is
> not possible with processes frozen - if we can find a way to make that
> possible, then everything can be done in CPU_DEAD).

How abt a patch like below?


--- process.c.org	2007-02-16 13:38:39.000000000 +0530
+++ process.c	2007-02-16 13:38:59.000000000 +0530
@@ -47,7 +47,7 @@ void refrigerator(void)
 	recalc_sigpending(); /* We sent fake signal, clean it up */
 	spin_unlock_irq(&current->sighand->siglock);
 
-	while (frozen(current)) {
+	while (frozen(current) && !kthread_should_stop()) {
 		current->state = TASK_UNINTERRUPTIBLE;
 		schedule();
 	}

This should let us do kthread_stop() in CPU_DEAD itself (while processes
are frozen)? That would allow us to do everything from CPU_DEAD itself
(and not have CPU_DEAD_KILL_THREADS).


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  8:12       ` Srivatsa Vaddagiri
@ 2007-02-16  9:29         ` Rafael J. Wysocki
  2007-02-16  9:59           ` Srivatsa Vaddagiri
  2007-02-16 19:46         ` Oleg Nesterov
  2007-02-17  5:32         ` Gautham R Shenoy
  2 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-16  9:29 UTC (permalink / raw)
  To: vatsa
  Cc: Oleg Nesterov, Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel

On Friday, 16 February 2007 09:12, Srivatsa Vaddagiri wrote:
> On Fri, Feb 16, 2007 at 12:46:17PM +0530, Srivatsa Vaddagiri wrote:
> > frozen. The only exception is cleaning up of per-cpu threads (which is
> > not possible with processes frozen - if we can find a way to make that
> > possible, then everything can be done in CPU_DEAD).
> 
> How abt a patch like below?
> 
> 
> --- process.c.org	2007-02-16 13:38:39.000000000 +0530
> +++ process.c	2007-02-16 13:38:59.000000000 +0530
> @@ -47,7 +47,7 @@ void refrigerator(void)
>  	recalc_sigpending(); /* We sent fake signal, clean it up */
>  	spin_unlock_irq(&current->sighand->siglock);
>  
> -	while (frozen(current)) {
> +	while (frozen(current) && !kthread_should_stop()) {
>  		current->state = TASK_UNINTERRUPTIBLE;
>  		schedule();
>  	}
> 
> This should let us do kthread_stop() in CPU_DEAD itself (while processes
> are frozen)? That would allow us to do everything from CPU_DEAD itself
> (and not have CPU_DEAD_KILL_THREADS).

Well, the suspend code has been developed with the assumption that frozen
threads stay frozen until _we_ let them thaw by calling thaw_processes().  I'm
a bit afraid of this change.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  9:29         ` Rafael J. Wysocki
@ 2007-02-16  9:59           ` Srivatsa Vaddagiri
  2007-02-16 11:06             ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16  9:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel

On Fri, Feb 16, 2007 at 10:29:20AM +0100, Rafael J. Wysocki wrote:
> Well, the suspend code has been developed with the assumption that frozen
> threads stay frozen until _we_ let them thaw by calling thaw_processes().  I'm
> a bit afraid of this change.

Note that only kernel threads created thr' kthread_create are allowed
to exit like this from the refrigerator, that too only when
(kthread_stop_info.k == current). So all other threads should be unaffected
because of this change.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  9:59           ` Srivatsa Vaddagiri
@ 2007-02-16 11:06             ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-16 11:06 UTC (permalink / raw)
  To: vatsa
  Cc: Oleg Nesterov, Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel

On Friday, 16 February 2007 10:59, Srivatsa Vaddagiri wrote:
> On Fri, Feb 16, 2007 at 10:29:20AM +0100, Rafael J. Wysocki wrote:
> > Well, the suspend code has been developed with the assumption that frozen
> > threads stay frozen until _we_ let them thaw by calling thaw_processes().  I'm
> > a bit afraid of this change.
> 
> Note that only kernel threads created thr' kthread_create are allowed
> to exit like this from the refrigerator, that too only when
> (kthread_stop_info.k == current). So all other threads should be unaffected
> because of this change.

Yes, that's why I said "a bit". ;-)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-16  5:26       ` Srivatsa Vaddagiri
@ 2007-02-16 15:33         ` Oleg Nesterov
  2007-02-16 16:47           ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 15:33 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> On Wed, Feb 14, 2007 at 11:09:04PM +0300, Oleg Nesterov wrote:
> > What else you don't like? Why do you want to remove cwq_should_stop() and
> > restore an ugly (ugly for workqueue.c) kthread_stop/kthread_should_stop() ?
> 
> What is ugly abt kthread_stop in workqueue.c?

I take my words back. It is not "ugly" any longer because with this change
we don't do kthread_stop()->wakeup_process() while cwq->thread may sleep in
work->func(). Still I don't see (ok, I am biased and probably wrong, please
correct me) why kthread_stop+wait_to_die is better than cwq_should_stop(),
see below.

> I feel it is nice if the cleanup is synchronous i.e when cpu_down() is
> complete, all the dead cpu's worker threads would have terminated.
> Otherwise we expose races between CPU_UP_PREPARE/kthread_create and the
> (old) thread exiting.

Please look at 2.6.20-mm1, cleanup is synchronous. Probably we misunderstood
each other looking at different code.

> > > +			if (err)
> > > +				break;
> > 
> > No, we can't break. We are going to execute destroy_workqueue(), it will
> > iterate over all cwqs.
> 
> and try to kthread_stop() uninitialized cwq->thread?
> 
> How abt retaining the break above but setting cwq->thread = NULL in
> create_workqueue_thread in failure case?

Perhaps do it, but why? The failure should be rare, and it is a bit
dangerous to have workqueue_struct which was not properly initialized.
Suppose we change CPU_UP_PREPARE so it is called before freeze_processes()
stage, then we have a problem.

> > > +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> > 
> > I think this is unneeded complication, but ok, should work.
> 
> This is required if we want to stop per-cpu threads synchronously.

See above.

Srivatsa, don't get we wrong. I can't judge about using freezer for cpu hotplug,
but yes, we can improve workqueue.c in this case! But this changes should be
small and understandable. When cpu hotplug is converted, we don't need _any_
changes in workqueue.c, it should work (except s/CPU_DEAD/CPU_DEAD_KILL_THREADS
if you insist). Then,

	[PATCH] make all multithread workqueus freezable

	[PATCH] remove cpu_populated_map
		just remove, very easy. Good change!

	[PATCH] restore take_over_works()
		This is not strictly needed! But ok, this can speedup cpu_down,
		and now we can't have a race with flush_worqueue (freezer).
		Just do

			case CPU_XXX:	// all tasks are frozen

			+	take_over_work(wq, hotcpu);

		No more changes are required, cwq_should_stop() just works
		because it is more flexible than kthread_should_stop().

	[PATCH] don't take workqueue_mutex in ...

	[PATCH] ...

Probably I missed something, and we should fix/improve/drop cwq_should_stop(),
but please do this in a separate patch, with a proper changelog which explains
why we are doing so.

Currently I believe that workqueue.c will need very minimal and simple changes
if we use freezer.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  6:48     ` Srivatsa Vaddagiri
@ 2007-02-16 15:47       ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 15:47 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> On Wed, Feb 14, 2007 at 10:47:42PM +0300, Oleg Nesterov wrote:
> > >  	for (;;) {
> > > -		if (cwq->wq->freezeable)
> > > +		if (cwq->wq->freezeable) {
> > 
> > Else? This is wrong. The change like this should start from making all
> > cwq->threads freezeable, otherwise it just doesn't work.
> 
> I agree we need to have all threads frozen for hotplug.

Well, only multithreaded, strictly speaking.

>                                                          Only exception I
> have found is kthread workqueue, which needs to be active after
> freeze_processes(). stop_machine and CPU_UP_PREPARE/kthread_create()
> depend on it to work.

Yes. That is why I worried about freeze_processes() before CPU_UP_PREPARE.

> A worker thread (like kthread workqueue), which has exempted itself from 
> hotplug-freeze, should essentially be prepared to get preempted any time and 
> made to run on any cpu. If that is the case, do you see any problems in having 
> the if () statement above?

helper_wq ("kthread") is singlethread (see above), but this is not nice to
rely on that. (I am not sure I undestand you though).

> > > +wait_to_die:
> > > +	/* Wait for kthread_stop */
> > > +	set_current_state(TASK_INTERRUPTIBLE);
> > > +	while (!kthread_should_stop()) {
> > > +		schedule();
> > > +		set_current_state(TASK_INTERRUPTIBLE);
> > > +	}
> > > +	__set_current_state(TASK_RUNNING);
> > > +	return 0;
> > >  }
> > 
> > I believe this is not needed, see the comments for the next patch.
> 
> Without this, thread cleanup (cwq->should_stop)/create(CPU_UP_PREPARE) becomes 
> racy

Could you explain? (Again, perhaps you are talking about the old code).

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  7:16     ` Srivatsa Vaddagiri
  2007-02-16  8:12       ` Srivatsa Vaddagiri
@ 2007-02-16 16:06       ` Oleg Nesterov
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 16:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> On Wed, Feb 14, 2007 at 11:22:09PM +0300, Oleg Nesterov wrote:
> > > o Splits CPU_DEAD into two events namely
> > >   - CPU_DEAD: which will be handled while the processes are still
> > >               frozen.
> > > 
> > >   - CPU_DEAD_KILL_THREADS: To be handled after we thaw_processes.
> > 
> > 
> > Imho, this is not right. This change the meaning of CPU_DEAD, and so
> > we should fix all users of CPU_DEAD as well.
> 
> Why should we fix all users? Only users who were doing a kthread_stop()
> in CPU_DEAD need to be fixed. From my count, only 5 users (out of a
> total of 35) need to be fixed to not do kthread_stop in CPU_DEAD.

But still we need to fix or at least check them,

> > How about
> > 
> > 	CPU_DEAD_WHATEVER
> > 		the processes are still frozen
> > 
> > 	CPU_DEAD
> > 		after we thaw_processes
> > 
> > This way we can add processing of the new CPU_DEAD_WHATEVER event where
> > it may help. 
> 
> Well, -most- of the work needs to be done in a state when processes are
> frozen. The only exception is cleaning up of per-cpu threads (which is
> not possible with processes frozen - if we can find a way to make that
> possible, then everything can be done in CPU_DEAD).
> 
> If we go by the change suggested above, then we need to fix all users of
> CPU_DEAD

Sorry, I can't understand you.

This patch adds the new state, why should we fix all users of CPU_DEAD
if they were correct? CPU_DEAD retains its old meaning, all users should
work as before?

>           to do what they are doing in CPU_DEAD_WHATEVER (when processes
> are frozen).

We don't have such users! because we don't have CPU_DEAD_WHATEVER yet.

IOW: I think this new state should have a new name, CPU_DEAD should continue
to be called as a last step. Then we can teach cpu callback's to to take an
advantage of CPU_DEAD_WHATEVER, and we can do this in a separate patches.

No?

> > CPU_UP_PREPARE is called after freeze_processes()... Probably this works,
> > but imho this is no good. Suppose for a moment that khelper will be frozen
> > (yes, yes it can't be), then we can't do kthread_create().
> 
> Yes, I am worried about doing so many things with processes frozen.
> Maybe time (and more testing) will tell us if this is a bad thing or
> not. The only dependency I have found so far is that kthread workqueue needs to
> be up (and hence its worker thread needs to be exempted from hotplug
> freeze). We should mark kthread workqueue accordingly as not freezable
> for hotplug.

Yes, this is what I was talking about.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-16 15:33         ` Oleg Nesterov
@ 2007-02-16 16:47           ` Srivatsa Vaddagiri
  2007-02-16 18:45             ` Oleg Nesterov
  2007-02-16 23:59             ` Oleg Nesterov
  0 siblings, 2 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-16 16:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Fri, Feb 16, 2007 at 06:33:21PM +0300, Oleg Nesterov wrote:
> I take my words back. It is not "ugly" any longer because with this change
> we don't do kthread_stop()->wakeup_process() while cwq->thread may sleep in
> work->func(). Still I don't see (ok, I am biased and probably wrong, please
> correct me) why kthread_stop+wait_to_die is better than cwq_should_stop(),
> see below.

I just like using existing code (kthread_stop) as much as possible and not add 
new code (->should_stop). Also the 'while (cwq->thread != NULL)' loop in
cleanup_workqueue_thread is not neat IMHO, compared to kthread_stop+wait_to_die.

Pls compare cleanup_workqueue_thread() in 2.6.20-mm1 and what is
proposed in the patch :-

2.6.20-mm1 (cwq->should_stop)
=============================

static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
{
        struct wq_barrier barr;
        int alive = 0;

        spin_lock_irq(&cwq->lock);
        if (cwq->thread != NULL) {
                insert_wq_barrier(cwq, &barr, 1);
                cwq->should_stop = 1;
                alive = 1;
        }
        spin_unlock_irq(&cwq->lock);

        if (alive) {
                wait_for_completion(&barr.done);

                while (unlikely(cwq->thread != NULL))
                        cpu_relax();
                /*
                 * Wait until cwq->thread unlocks cwq->lock,
                 * it won't touch *cwq after that.
                 */
                smp_rmb();
                spin_unlock_wait(&cwq->lock);
        }
}



Patch (based on kthread_should_stop)
====================================

static void cleanup_workqueue_thread(struct workqueue_struct *wq, int cpu)
{
        struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);

        if (cwq->thread != NULL) {
                kthread_stop(cwq->thread);
                cwq->thread = NULL;
        }
}


The version using kthread_should_stop() is more simple IMO.


> > I feel it is nice if the cleanup is synchronous i.e when cpu_down() is
> > complete, all the dead cpu's worker threads would have terminated.
> > Otherwise we expose races between CPU_UP_PREPARE/kthread_create and the
> > (old) thread exiting.
> 
> Please look at 2.6.20-mm1, cleanup is synchronous. Probably we misunderstood
> each other looking at different code.

Ok ..I hadnt looked at 2.6.20-mm1 (it wasnt out when we posted the
patch). Neverthless I think most of our intended changes would apply for
2.6.20-mm1 also. We will post a new version (breaking down workqueue changes
as you want) against 2.6.20-mm1.

> > How abt retaining the break above but setting cwq->thread = NULL in
> > create_workqueue_thread in failure case?
> 
> Perhaps do it, but why? The failure should be rare, and it is a bit
> dangerous to have workqueue_struct which was not properly initialized.
> Suppose we change CPU_UP_PREPARE so it is called before freeze_processes()
> stage, then we have a problem.

Ok ..no problem. Will not add the 'break' there!

> Srivatsa, don't get we wrong. I can't judge about using freezer for cpu hotplug,
> but yes, we can improve workqueue.c in this case! But this changes should be
> small and understandable. When cpu hotplug is converted, we don't need _any_
> changes in workqueue.c, it should work (except s/CPU_DEAD/CPU_DEAD_KILL_THREADS
> if you insist). 

Note with the change proposed in refrigerator, we can avoid
CPU_DEAD_KILL_THREADS and do all cleanup in CPU_DEAD itself.

> 		No more changes are required, cwq_should_stop() just works
> 		because it is more flexible than kthread_should_stop().

What is more flexible abt cwq_should_stop()?


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-16 16:47           ` Srivatsa Vaddagiri
@ 2007-02-16 18:45             ` Oleg Nesterov
  2007-02-16 23:59             ` Oleg Nesterov
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 18:45 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> 2.6.20-mm1 (cwq->should_stop)
> =============================
> 
> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
> {
>         struct wq_barrier barr;
>         int alive = 0;
> 
>         spin_lock_irq(&cwq->lock);
>         if (cwq->thread != NULL) {
>                 insert_wq_barrier(cwq, &barr, 1);
>                 cwq->should_stop = 1;
>                 alive = 1;
>         }
>         spin_unlock_irq(&cwq->lock);
> 
>         if (alive) {
>                 wait_for_completion(&barr.done);
> 
>                 while (unlikely(cwq->thread != NULL))
>                         cpu_relax();
>                 /*
>                  * Wait until cwq->thread unlocks cwq->lock,
>                  * it won't touch *cwq after that.
>                  */
>                 smp_rmb();
>                 spin_unlock_wait(&cwq->lock);
>         }
> }
> 
> Patch (based on kthread_should_stop)
> ====================================
> 
> static void cleanup_workqueue_thread(struct workqueue_struct *wq, int cpu)
> {
>         struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> 
>         if (cwq->thread != NULL) {
>                 kthread_stop(cwq->thread);
>                 cwq->thread = NULL;
>         }
> }
> 
> > 		No more changes are required, cwq_should_stop() just works
> > 		because it is more flexible than kthread_should_stop().
> 
> What is more flexible abt cwq_should_stop()?

	- it doesn't use a global semaphore

	- it works with or without freezer

	- it works with or without take_over_work()

	- it doesn't require that we have no pending works when
	  cleanup_workqueue_thread() is called.

	- worker_thread() doesn't need to have 2 different conditions
	  to exit in 2 different ways.

	- it allows us to do further improvements (don't take workqueue
	  mutex for the whole cpu-hotplug event), but this needs more work
	  and probably is not valid any longer if we use freezer.

Ok. This is a matter of taste. I will not argue if you send a patch
to convert the code to use kthread_stop() again (if it is correct :),
but let it be a separate change, please.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  8:12       ` Srivatsa Vaddagiri
  2007-02-16  9:29         ` Rafael J. Wysocki
@ 2007-02-16 19:46         ` Oleg Nesterov
  2007-02-17  2:31           ` Srivatsa Vaddagiri
  2007-02-17  5:32         ` Gautham R Shenoy
  2 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 19:46 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> On Fri, Feb 16, 2007 at 12:46:17PM +0530, Srivatsa Vaddagiri wrote:
> > frozen. The only exception is cleaning up of per-cpu threads (which is
> > not possible with processes frozen - if we can find a way to make that
> > possible, then everything can be done in CPU_DEAD).
> 
> How abt a patch like below?
>
> --- process.c.org	2007-02-16 13:38:39.000000000 +0530
> +++ process.c	2007-02-16 13:38:59.000000000 +0530
> @@ -47,7 +47,7 @@ void refrigerator(void)
>  	recalc_sigpending(); /* We sent fake signal, clean it up */
>  	spin_unlock_irq(&current->sighand->siglock);
>  
> -	while (frozen(current)) {
> +	while (frozen(current) && !kthread_should_stop()) {
>  		current->state = TASK_UNINTERRUPTIBLE;
>  		schedule();
>  	}

Instead, we can just clear PF_FROZEN before kthread_should_stop().
I don't claim this is better, but this way we don't need to add a
subtle change to process.c.

> This should let us do kthread_stop() in CPU_DEAD itself (while processes
> are frozen)? That would allow us to do everything from CPU_DEAD itself
> (and not have CPU_DEAD_KILL_THREADS).

... and probably avoid many races, good.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-16 16:47           ` Srivatsa Vaddagiri
  2007-02-16 18:45             ` Oleg Nesterov
@ 2007-02-16 23:59             ` Oleg Nesterov
  2007-02-17  2:29               ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-16 23:59 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/16, Srivatsa Vaddagiri wrote:
>
> Note with the change proposed in refrigerator, we can avoid
> CPU_DEAD_KILL_THREADS and do all cleanup in CPU_DEAD itself.

In that case (all processes are frozen when workqueue_cpu_callback()
calls cleanup_workqueue_thread()) I agree, it is better to just use
kthread_stop/kthread_should_stop.

This also means that probably it won't be convenient to do this change
"by small steps" as I suggested before.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-16 23:59             ` Oleg Nesterov
@ 2007-02-17  2:29               ` Srivatsa Vaddagiri
  2007-02-17 21:59                 ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-17  2:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Sat, Feb 17, 2007 at 02:59:39AM +0300, Oleg Nesterov wrote:
> In that case (all processes are frozen when workqueue_cpu_callback()
> calls cleanup_workqueue_thread()) I agree, it is better to just use
> kthread_stop/kthread_should_stop.

Great, thanks!

> This also means that probably it won't be convenient to do this change
> "by small steps" as I suggested before.

Yeah, thats what I thought. We will try to split it to the extent
possible in the next iteration.

Thanks for your feedback/review!

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16 19:46         ` Oleg Nesterov
@ 2007-02-17  2:31           ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-17  2:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Fri, Feb 16, 2007 at 10:46:05PM +0300, Oleg Nesterov wrote:
> Instead, we can just clear PF_FROZEN before kthread_should_stop().

That should work too. Thanks!

> I don't claim this is better, but this way we don't need to add a
> subtle change to process.c.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-16  8:12       ` Srivatsa Vaddagiri
  2007-02-16  9:29         ` Rafael J. Wysocki
  2007-02-16 19:46         ` Oleg Nesterov
@ 2007-02-17  5:32         ` Gautham R Shenoy
  2007-02-17 11:19           ` Gautham R Shenoy
  2 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-17  5:32 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Oleg Nesterov, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Fri, Feb 16, 2007 at 01:42:09PM +0530, Srivatsa Vaddagiri wrote:
> On Fri, Feb 16, 2007 at 12:46:17PM +0530, Srivatsa Vaddagiri wrote:
> > frozen. The only exception is cleaning up of per-cpu threads (which is
> > not possible with processes frozen - if we can find a way to make that
> > possible, then everything can be done in CPU_DEAD).
> 
> How abt a patch like below?
> 
> 
> --- process.c.org	2007-02-16 13:38:39.000000000 +0530
> +++ process.c	2007-02-16 13:38:59.000000000 +0530
> @@ -47,7 +47,7 @@ void refrigerator(void)
>  	recalc_sigpending(); /* We sent fake signal, clean it up */
>  	spin_unlock_irq(&current->sighand->siglock);
> 
> -	while (frozen(current)) {
> +	while (frozen(current) && !kthread_should_stop()) {
>  		current->state = TASK_UNINTERRUPTIBLE;
>  		schedule();
>  	}

This looks ok, but probably we could do it in a better way.
How about an api to thaw only a specific task something like
thaw_process(struct task_struct p). 

That way, the CPU_DEAD handler which wants to kthread_stop a thread
can selectively thaw the thread before it does kthread_stop.

Rafael, does this have any negative impact on the freezer design?

> This should let us do kthread_stop() in CPU_DEAD itself (while processes
> are frozen)? That would allow us to do everything from CPU_DEAD itself
> (and not have CPU_DEAD_KILL_THREADS).
> 
> 
> -- 
> Regards,
> vatsa

thanks
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core
  2007-02-17  5:32         ` Gautham R Shenoy
@ 2007-02-17 11:19           ` Gautham R Shenoy
  0 siblings, 0 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-17 11:19 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Oleg Nesterov, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Sat, Feb 17, 2007 at 11:02:33AM +0530, Gautham R Shenoy wrote:

> This looks ok, but probably we could do it in a better way.
> How about an api to thaw only a specific task something like
> thaw_process(struct task_struct p). 

I see that thaw_process already exists in freezer.h! Awesome!!
So lets make use of it, instead of adding the kthread_should_stop 
change to the refrigerator :)

thanks and regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-15 14:25           ` Gautham R Shenoy
@ 2007-02-17 11:24             ` Rafael J. Wysocki
  2007-02-17 21:34               ` Oleg Nesterov
                                 ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-17 11:24 UTC (permalink / raw)
  To: ego
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg, Pavel Machek

On Thursday, 15 February 2007 15:25, Gautham R Shenoy wrote:
> On Thu, Feb 15, 2007 at 02:31:08PM +0100, Rafael J. Wysocki wrote:
> > 
> > So I think tonight I'll start adding try_to_freeze() to the kernel threads that
> > set PF_NOFREEZE.
> 
> cool! While you are at it, let me try to enhance the freezer api's
> to incorporate the PFE_* flags.

Here's a patch that adds try_to_freeze() to all kernel threads that didn't call
it before.  It shouldn't change the behavior of the threads in question, since
they won't be frozen because the are flagged as PF_NOFREEZE (of course
we are going to change this later).

Compile-tested on x86_64 with allmodconfig.

Pavel, do you think we can remove the PF_NOFREEZE from bluetooth, BTW?

 arch/i386/kernel/apm.c              |    2 ++
 drivers/block/loop.c                |    2 ++
 drivers/char/apm-emulation.c        |    3 +++
 drivers/ieee1394/ieee1394_core.c    |    3 +++
 drivers/md/md.c                     |    2 ++
 drivers/mmc/mmc_queue.c             |    3 +++
 drivers/mtd/mtd_blkdevs.c           |    3 +++
 drivers/scsi/libsas/sas_scsi_host.c |    3 +++
 drivers/scsi/scsi_error.c           |    3 +++
 drivers/usb/storage/usb.c           |    2 ++
 kernel/rcutorture.c                 |    2 ++
 kernel/softirq.c                    |    2 ++
 kernel/softlockup.c                 |    2 ++
 kernel/workqueue.c                  |    3 +--
 net/bluetooth/bnep/core.c           |    5 ++++-
 net/bluetooth/cmtp/core.c           |    3 +++
 net/bluetooth/hidp/core.c           |    3 +++
 net/bluetooth/rfcomm/core.c         |    3 +++
 18 files changed, 46 insertions(+), 3 deletions(-)

Index: linux-2.6.20-mm1/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.20-mm1.orig/arch/i386/kernel/apm.c	2007-02-17 00:43:19.000000000 +0100
+++ linux-2.6.20-mm1/arch/i386/kernel/apm.c	2007-02-17 00:43:31.000000000 +0100
@@ -227,6 +227,7 @@
 #include <linux/dmi.h>
 #include <linux/suspend.h>
 #include <linux/kthread.h>
+#include <linux/freezer.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1402,6 +1403,7 @@ static void apm_mainloop(void)
 	add_wait_queue(&apm_waitqueue, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
 	for (;;) {
+		try_to_freeze();
 		schedule_timeout(APM_CHECK_TIMEOUT);
 		if (kthread_should_stop())
 			break;
Index: linux-2.6.20-mm1/drivers/md/md.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/md/md.c	2007-02-17 00:43:19.000000000 +0100
+++ linux-2.6.20-mm1/drivers/md/md.c	2007-02-17 00:43:31.000000000 +0100
@@ -4513,6 +4513,8 @@ static int md_thread(void * arg)
 			 || kthread_should_stop(),
 			 thread->timeout);
 
+		try_to_freeze();
+
 		clear_bit(THREAD_WAKEUP, &thread->flags);
 
 		thread->run(thread->mddev);
Index: linux-2.6.20-mm1/drivers/mmc/mmc_queue.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/mmc/mmc_queue.c	2007-02-17 00:43:19.000000000 +0100
+++ linux-2.6.20-mm1/drivers/mmc/mmc_queue.c	2007-02-17 00:43:31.000000000 +0100
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/blkdev.h>
 #include <linux/kthread.h>
+#include <linux/freezer.h>
 
 #include <linux/mmc/card.h>
 #include <linux/mmc/host.h>
@@ -70,6 +71,8 @@ static int mmc_queue_thread(void *d)
 	do {
 		struct request *req = NULL;
 
+		try_to_freeze();
+
 		spin_lock_irq(q->queue_lock);
 		set_current_state(TASK_INTERRUPTIBLE);
 		req = elv_next_request(q);
Index: linux-2.6.20-mm1/drivers/mtd/mtd_blkdevs.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/mtd/mtd_blkdevs.c	2007-02-17 00:43:19.000000000 +0100
+++ linux-2.6.20-mm1/drivers/mtd/mtd_blkdevs.c	2007-02-17 00:43:31.000000000 +0100
@@ -20,6 +20,7 @@
 #include <linux/hdreg.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <linux/freezer.h>
 #include <asm/uaccess.h>
 
 static LIST_HEAD(blktrans_majors);
@@ -113,6 +114,8 @@ static int mtd_blktrans_thread(void *arg
 			schedule();
 			remove_wait_queue(&tr->blkcore_priv->thread_wq, &wait);
 
+			try_to_freeze();
+
 			spin_lock_irq(rq->queue_lock);
 
 			continue;
Index: linux-2.6.20-mm1/drivers/usb/storage/usb.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/usb/storage/usb.c	2007-02-17 00:43:19.000000000 +0100
+++ linux-2.6.20-mm1/drivers/usb/storage/usb.c	2007-02-17 11:39:00.000000000 +0100
@@ -304,6 +304,8 @@ static int usb_stor_control_thread(void 
 	current->flags |= PF_NOFREEZE;
 
 	for(;;) {
+		try_to_freeze();
+
 		US_DEBUGP("*** thread sleeping.\n");
 		if(down_interruptible(&us->sema))
 			break;
Index: linux-2.6.20-mm1/drivers/ieee1394/ieee1394_core.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/ieee1394/ieee1394_core.c	2007-02-15 23:57:01.000000000 +0100
+++ linux-2.6.20-mm1/drivers/ieee1394/ieee1394_core.c	2007-02-17 00:43:31.000000000 +0100
@@ -35,6 +35,7 @@
 #include <linux/kthread.h>
 #include <linux/preempt.h>
 #include <linux/time.h>
+#include <linux/freezer.h>
 
 #include <asm/system.h>
 #include <asm/byteorder.h>
@@ -1081,6 +1082,8 @@ static int hpsbpkt_thread(void *__hi)
 			complete_routine(complete_data);
 		}
 
+		try_to_freeze();
+
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!skb_peek(&hpsbpkt_queue))
 			schedule();
Index: linux-2.6.20-mm1/drivers/char/apm-emulation.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/char/apm-emulation.c	2007-02-15 23:57:00.000000000 +0100
+++ linux-2.6.20-mm1/drivers/char/apm-emulation.c	2007-02-17 00:43:31.000000000 +0100
@@ -27,6 +27,7 @@
 #include <linux/completion.h>
 #include <linux/kthread.h>
 #include <linux/delay.h>
+#include <linux/freezer.h>
 
 #include <asm/system.h>
 
@@ -539,6 +540,8 @@ static int kapmd(void *arg)
 		apm_event_t event;
 		int ret;
 
+		try_to_freeze();
+
 		wait_event_interruptible(kapmd_wait,
 				!queue_empty(&kapmd_queue) || kthread_should_stop());
 
Index: linux-2.6.20-mm1/drivers/block/loop.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/block/loop.c	2007-02-15 23:57:00.000000000 +0100
+++ linux-2.6.20-mm1/drivers/block/loop.c	2007-02-17 00:43:31.000000000 +0100
@@ -74,6 +74,7 @@
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/kthread.h>
+#include <linux/freezer.h>
 
 #include <asm/uaccess.h>
 
@@ -580,6 +581,7 @@ static int loop_thread(void *data)
 	set_user_nice(current, -20);
 
 	while (!kthread_should_stop() || lo->lo_bio) {
+		try_to_freeze();
 
 		wait_event_interruptible(lo->lo_event,
 				lo->lo_bio || kthread_should_stop());
Index: linux-2.6.20-mm1/drivers/scsi/libsas/sas_scsi_host.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/scsi/libsas/sas_scsi_host.c	2007-02-15 23:57:03.000000000 +0100
+++ linux-2.6.20-mm1/drivers/scsi/libsas/sas_scsi_host.c	2007-02-17 00:43:31.000000000 +0100
@@ -39,6 +39,7 @@
 #include <linux/err.h>
 #include <linux/blkdev.h>
 #include <linux/scatterlist.h>
+#include <linux/freezer.h>
 
 /* ---------- SCSI Host glue ---------- */
 
@@ -875,6 +876,8 @@ static int sas_queue_thread(void *_sas_h
 	complete(&queue_th_comp);
 
 	while (1) {
+		try_to_freeze();
+
 		down_interruptible(&core->queue_thread_sema);
 		sas_queue(sas_ha);
 		if (core->queue_thread_kill)
Index: linux-2.6.20-mm1/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.20-mm1.orig/drivers/scsi/scsi_error.c	2007-02-15 23:57:04.000000000 +0100
+++ linux-2.6.20-mm1/drivers/scsi/scsi_error.c	2007-02-17 00:43:31.000000000 +0100
@@ -24,6 +24,7 @@
 #include <linux/interrupt.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
+#include <linux/freezer.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -1536,6 +1537,8 @@ int scsi_error_handler(void *data)
 	 */
 	set_current_state(TASK_INTERRUPTIBLE);
 	while (!kthread_should_stop()) {
+		try_to_freeze();
+
 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
 		    shost->host_failed != shost->host_busy) {
 			SCSI_LOG_ERROR_RECOVERY(1,
Index: linux-2.6.20-mm1/kernel/softlockup.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/softlockup.c	2007-02-04 19:44:54.000000000 +0100
+++ linux-2.6.20-mm1/kernel/softlockup.c	2007-02-17 00:43:31.000000000 +0100
@@ -13,6 +13,7 @@
 #include <linux/kthread.h>
 #include <linux/notifier.h>
 #include <linux/module.h>
+#include <linux/freezer.h>
 
 static DEFINE_SPINLOCK(print_lock);
 
@@ -93,6 +94,7 @@ static int watchdog(void * __bind_cpu)
 	 * debug-printout triggers in softlockup_tick().
 	 */
 	while (!kthread_should_stop()) {
+		try_to_freeze();
 		set_current_state(TASK_INTERRUPTIBLE);
 		touch_softlockup_watchdog();
 		schedule();
Index: linux-2.6.20-mm1/kernel/rcutorture.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/rcutorture.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/kernel/rcutorture.c	2007-02-17 00:43:31.000000000 +0100
@@ -46,6 +46,7 @@
 #include <linux/byteorder/swabb.h>
 #include <linux/stat.h>
 #include <linux/srcu.h>
+#include <linux/freezer.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and "
@@ -588,6 +589,7 @@ rcu_torture_writer(void *arg)
 	current->flags |= PF_NOFREEZE;
 
 	do {
+		try_to_freeze();
 		schedule_timeout_uninterruptible(1);
 		if ((rp = rcu_torture_alloc()) == NULL)
 			continue;
Index: linux-2.6.20-mm1/kernel/softirq.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/softirq.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/kernel/softirq.c	2007-02-17 00:43:31.000000000 +0100
@@ -18,6 +18,7 @@
 #include <linux/rcupdate.h>
 #include <linux/smp.h>
 #include <linux/tick.h>
+#include <linux/freezer.h>
 
 #include <asm/irq.h>
 /*
@@ -494,6 +495,7 @@ static int ksoftirqd(void * __bind_cpu)
 	set_current_state(TASK_INTERRUPTIBLE);
 
 	while (!kthread_should_stop()) {
+		try_to_freeze();
 		preempt_disable();
 		if (!local_softirq_pending()) {
 			preempt_enable_no_resched();
Index: linux-2.6.20-mm1/kernel/workqueue.c
===================================================================
--- linux-2.6.20-mm1.orig/kernel/workqueue.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/kernel/workqueue.c	2007-02-17 00:49:10.000000000 +0100
@@ -315,8 +315,7 @@ static int worker_thread(void *__cwq)
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
 	for (;;) {
-		if (cwq->wq->freezeable)
-			try_to_freeze();
+		try_to_freeze();
 
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
 		if (!cwq->should_stop && list_empty(&cwq->worklist))
Index: linux-2.6.20-mm1/net/bluetooth/bnep/core.c
===================================================================
--- linux-2.6.20-mm1.orig/net/bluetooth/bnep/core.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/net/bluetooth/bnep/core.c	2007-02-17 00:52:14.000000000 +0100
@@ -39,6 +39,7 @@
 #include <linux/errno.h>
 #include <linux/smp_lock.h>
 #include <linux/net.h>
+#include <linux/freezer.h>
 #include <net/sock.h>
 
 #include <linux/socket.h>
@@ -478,6 +479,8 @@ static int bnep_session(void *arg)
 	init_waitqueue_entry(&wait, current);
 	add_wait_queue(sk->sk_sleep, &wait);
 	while (!atomic_read(&s->killed)) {
+		try_to_freeze();
+
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		// RX
Index: linux-2.6.20-mm1/net/bluetooth/cmtp/core.c
===================================================================
--- linux-2.6.20-mm1.orig/net/bluetooth/cmtp/core.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/net/bluetooth/cmtp/core.c	2007-02-17 00:53:09.000000000 +0100
@@ -34,6 +34,7 @@
 #include <linux/ioctl.h>
 #include <linux/file.h>
 #include <linux/init.h>
+#include <linux/freezer.h>
 #include <net/sock.h>
 
 #include <linux/isdn/capilli.h>
@@ -292,6 +293,8 @@ static int cmtp_session(void *arg)
 	init_waitqueue_entry(&wait, current);
 	add_wait_queue(sk->sk_sleep, &wait);
 	while (!atomic_read(&session->terminate)) {
+		try_to_freeze();
+
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		if (sk->sk_state != BT_CONNECTED)
Index: linux-2.6.20-mm1/net/bluetooth/hidp/core.c
===================================================================
--- linux-2.6.20-mm1.orig/net/bluetooth/hidp/core.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/net/bluetooth/hidp/core.c	2007-02-17 00:53:54.000000000 +0100
@@ -35,6 +35,7 @@
 #include <linux/file.h>
 #include <linux/init.h>
 #include <linux/wait.h>
+#include <linux/freezer.h>
 #include <net/sock.h>
 
 #include <linux/input.h>
@@ -480,6 +481,8 @@ static int hidp_session(void *arg)
 	add_wait_queue(ctrl_sk->sk_sleep, &ctrl_wait);
 	add_wait_queue(intr_sk->sk_sleep, &intr_wait);
 	while (!atomic_read(&session->terminate)) {
+		try_to_freeze();
+
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		if (ctrl_sk->sk_state != BT_CONNECTED || intr_sk->sk_state != BT_CONNECTED)
Index: linux-2.6.20-mm1/net/bluetooth/rfcomm/core.c
===================================================================
--- linux-2.6.20-mm1.orig/net/bluetooth/rfcomm/core.c	2007-02-15 23:57:12.000000000 +0100
+++ linux-2.6.20-mm1/net/bluetooth/rfcomm/core.c	2007-02-17 00:55:35.000000000 +0100
@@ -37,6 +37,7 @@
 #include <linux/device.h>
 #include <linux/net.h>
 #include <linux/mutex.h>
+#include <linux/freezer.h>
 
 #include <net/sock.h>
 #include <asm/uaccess.h>
@@ -1851,6 +1852,8 @@ static void rfcomm_worker(void)
 	BT_DBG("");
 
 	while (!atomic_read(&terminate)) {
+		try_to_freeze();
+
 		if (!test_bit(RFCOMM_SCHED_WAKEUP, &rfcomm_event)) {
 			/* No pending events. Let's sleep.
 			 * Incoming connections and data will wake us up. */

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 11:24             ` Rafael J. Wysocki
@ 2007-02-17 21:34               ` Oleg Nesterov
  2007-02-17 22:24                 ` Rafael J. Wysocki
  2007-02-18 12:56               ` Pavel Machek
  2007-02-21 14:52               ` Gautham R Shenoy
  2 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-17 21:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

Rafael, I am trying to understand try_to_freeze_tasks(), and I have a
couple of questions.

	static inline int is_user_space(struct task_struct *p)
	{
		return p->mm && !(p->flags & PF_BORROWED_MM);
	}

This doesn't look right. First, an exiting task has ->mm == NULL after
do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
check is racy without task_lock(), so we can have a false positive as
well. Is it ok? We can freeze aio_wq prematurely.


	try_to_freeze_tasks:

		do_each_thread(g, p) {

			if (p->state == TASK_TRACED && frozen(p->parent)) {

Why we are doing this check outside of "if (is_user_space(p))" ?
Not a bug of course, but looks strange.

				cancel_freezing(p);
				continue;

Is it right? Shouldn't we increment "todo" counter?

			}
			if (is_user_space(p)) {
				if (!freeze_user_space)
					continue;

				/* Freeze the task unless there is a vfork
				 * completion pending
				 */
				if (!p->vfork_done)
					freeze_process(p);


Racy. do_fork(CLONE_VFORK) first does copy_process() which puts 'p' on
the task list and unlocks tasklist_lock. This means that 'p' is visible
to try_to_freeze_tasks(), and p->vfork_done == NULL. try_to_freeze_tasks()
sets TIF_FREEZE.

Now, do_fork() continues, sets ->vfork_done, p goes to user space, notices
the fake signal and goes to refrigerator while its parent is blocked on
"struct completion vfork". Freezing failed.

So, shouldn't we do

	if (p->vfork_done)
		cancel_freezing(p);

instead?

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-17  2:29               ` Srivatsa Vaddagiri
@ 2007-02-17 21:59                 ` Oleg Nesterov
  2007-02-20 15:12                   ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-17 21:59 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/17, Srivatsa Vaddagiri wrote:
>
> Yeah, thats what I thought. We will try to split it to the extent
> possible in the next iteration.

Before you begin. You are doing CPU_DOWN_PREPARE after freeze_processes().
Not good. This makes impossible to do flush_workueue() at CPU_DOWN_PREPARE
stage, we have callers.

I'm afraid it won't be so easy to solve all locking/racing problems. Will
wait for the patch :)

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 21:34               ` Oleg Nesterov
@ 2007-02-17 22:24                 ` Rafael J. Wysocki
  2007-02-17 23:42                   ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-17 22:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> Rafael, I am trying to understand try_to_freeze_tasks(), and I have a
> couple of questions.
> 
> 	static inline int is_user_space(struct task_struct *p)
> 	{
> 		return p->mm && !(p->flags & PF_BORROWED_MM);
> 	}
> 
> This doesn't look right. First, an exiting task has ->mm == NULL after
> do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> check is racy without task_lock(), so we can have a false positive as
> well. Is it ok? We can freeze aio_wq prematurely.

Right now aio_wq is not freezeable (PF_NOFREEZE).

> 
> 
> 	try_to_freeze_tasks:
> 
> 		do_each_thread(g, p) {
> 
> 			if (p->state == TASK_TRACED && frozen(p->parent)) {
> 
> Why we are doing this check outside of "if (is_user_space(p))" ?
> Not a bug of course, but looks strange.

For no particular reason.

> 
> 				cancel_freezing(p);
> 				continue;
> 
> Is it right? Shouldn't we increment "todo" counter?

No.  It would be wrong to do that, because TASK_TRACED tasks with frozen
parents cannot be frozen any further.

> 
> 			}
> 			if (is_user_space(p)) {
> 				if (!freeze_user_space)
> 					continue;
> 
> 				/* Freeze the task unless there is a vfork
> 				 * completion pending
> 				 */
> 				if (!p->vfork_done)
> 					freeze_process(p);
> 
> 
> Racy. do_fork(CLONE_VFORK) first does copy_process() which puts 'p' on
> the task list and unlocks tasklist_lock. This means that 'p' is visible
> to try_to_freeze_tasks(), and p->vfork_done == NULL. try_to_freeze_tasks()
> sets TIF_FREEZE.
> 
> Now, do_fork() continues, sets ->vfork_done, p goes to user space, notices
> the fake signal and goes to refrigerator while its parent is blocked on
> "struct completion vfork". Freezing failed.

You are right, but this has never happened, AFAICS.

> So, shouldn't we do
> 
> 	if (p->vfork_done)
> 		cancel_freezing(p);
> 
> instead?

I don't think so.  If p hasn't got TIF_FREEZE set yet or it has already been
frozen, cancel_freezing(p) is a noop.

Alternatively, we can move the check into refrigerator(), like this:

---
 kernel/power/process.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

Index: linux-2.6.20-git13/kernel/power/process.c
===================================================================
--- linux-2.6.20-git13.orig/kernel/power/process.c
+++ linux-2.6.20-git13/kernel/power/process.c
@@ -39,6 +39,11 @@ void refrigerator(void)
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	/* Freeze the task unless there is a vfork completion pending */
+	if (current->vfork_done)
+		return;
+
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
@@ -112,22 +117,10 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
-
+			if (is_user_space(p) == !!freeze_user_space) {
 				freeze_process(p);
+				todo++;
 			}
-			todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 22:24                 ` Rafael J. Wysocki
@ 2007-02-17 23:42                   ` Oleg Nesterov
  2007-02-17 23:47                     ` Oleg Nesterov
  2007-02-18 10:32                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-17 23:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/17, Rafael J. Wysocki wrote:
>
> On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> > 
> > 	static inline int is_user_space(struct task_struct *p)
> > 	{
> > 		return p->mm && !(p->flags & PF_BORROWED_MM);
> > 	}
> > 
> > This doesn't look right. First, an exiting task has ->mm == NULL after
> > do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> > check is racy without task_lock(), so we can have a false positive as
> > well. Is it ok? We can freeze aio_wq prematurely.
> 
> Right now aio_wq is not freezeable (PF_NOFREEZE).

Right now yes, but we are going to change this?

> > 				cancel_freezing(p);
> > 				continue;
> > 
> > Is it right? Shouldn't we increment "todo" counter?
> 
> No.  It would be wrong to do that, because TASK_TRACED tasks with frozen
> parents cannot be frozen any further.

TASK_TRACED task could be woken by SIGKILL. cancel_freezing() clears TIF_FREEZE.
The task may start do_exit() when try_to_freeze_tasks() returns "success".
Probably not a problem.

> > 				if (!p->vfork_done)
> > 					freeze_process(p);
> > 
> > 
> > Racy. do_fork(CLONE_VFORK) first does copy_process() which puts 'p' on
> > the task list and unlocks tasklist_lock. This means that 'p' is visible
> > to try_to_freeze_tasks(), and p->vfork_done == NULL. try_to_freeze_tasks()
> > sets TIF_FREEZE.
> > 
> > Now, do_fork() continues, sets ->vfork_done, p goes to user space, notices
> > the fake signal and goes to refrigerator while its parent is blocked on
> > "struct completion vfork". Freezing failed.
> 
> You are right, but this has never happened, AFAICS.
> 
> > So, shouldn't we do
> > 
> > 	if (p->vfork_done)
> > 		cancel_freezing(p);
> > 
> > instead?
> 
> I don't think so.  If p hasn't got TIF_FREEZE set yet or it has already been
> frozen, cancel_freezing(p) is a noop.

Yes, I misread cancel_freezing(), it doesn't wake up the task if it is frozen.

> Alternatively, we can move the check into refrigerator(), like this:
> 
> --- linux-2.6.20-git13.orig/kernel/power/process.c
> +++ linux-2.6.20-git13/kernel/power/process.c
> @@ -39,6 +39,11 @@ void refrigerator(void)
>  	/* Hmm, should we be allowed to suspend when there are realtime
>  	   processes around? */
>  	long save;
> +
> +	/* Freeze the task unless there is a vfork completion pending */
> +	if (current->vfork_done)
> +		return;
> +

This means that "current" returns to user space (get_signal_to_deliver
will clear TIF_SIGPENDING) and runs. While try_to_freeze_tasks() thinks
it is frozen.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 23:42                   ` Oleg Nesterov
@ 2007-02-17 23:47                     ` Oleg Nesterov
  2007-02-18 10:43                       ` Rafael J. Wysocki
  2007-02-18 10:32                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-17 23:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Oleg Nesterov wrote:
>
> On 02/17, Rafael J. Wysocki wrote:
> >
> > Alternatively, we can move the check into refrigerator(), like this:
> > 
> > --- linux-2.6.20-git13.orig/kernel/power/process.c
> > +++ linux-2.6.20-git13/kernel/power/process.c
> > @@ -39,6 +39,11 @@ void refrigerator(void)
> >  	/* Hmm, should we be allowed to suspend when there are realtime
> >  	   processes around? */
> >  	long save;
> > +
> > +	/* Freeze the task unless there is a vfork completion pending */
> > +	if (current->vfork_done)
> > +		return;
> > +
> 
> This means that "current" returns to user space (get_signal_to_deliver
> will clear TIF_SIGPENDING) and runs. While try_to_freeze_tasks() thinks
> it is frozen.

Ah, sorry. I am wrong, current has no PF_FROZEN yet.

However, this means that sys_vfork() makes impossible to freeze processes
until child exits/execs. Not good.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 23:42                   ` Oleg Nesterov
  2007-02-17 23:47                     ` Oleg Nesterov
@ 2007-02-18 10:32                     ` Rafael J. Wysocki
  2007-02-18 11:32                       ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 10:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 00:42, Oleg Nesterov wrote:
> On 02/17, Rafael J. Wysocki wrote:
> >
> > On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> > > 
> > > 	static inline int is_user_space(struct task_struct *p)
> > > 	{
> > > 		return p->mm && !(p->flags & PF_BORROWED_MM);
> > > 	}
> > > 
> > > This doesn't look right. First, an exiting task has ->mm == NULL after
> > > do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> > > check is racy without task_lock(), so we can have a false positive as
> > > well. Is it ok? We can freeze aio_wq prematurely.
> > 
> > Right now aio_wq is not freezeable (PF_NOFREEZE).
> 
> Right now yes, but we are going to change this?

Well, is there any more reliable (and not racy) method of differentiating
between kernel threads and user space processes?

> > > 				cancel_freezing(p);
> > > 				continue;
> > > 
> > > Is it right? Shouldn't we increment "todo" counter?
> > 
> > No.  It would be wrong to do that, because TASK_TRACED tasks with frozen
> > parents cannot be frozen any further.
> 
> TASK_TRACED task could be woken by SIGKILL. cancel_freezing() clears TIF_FREEZE.
> The task may start do_exit() when try_to_freeze_tasks() returns "success".
> Probably not a problem.

Yup.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 23:47                     ` Oleg Nesterov
@ 2007-02-18 10:43                       ` Rafael J. Wysocki
  2007-02-18 11:31                         ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 10:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 00:47, Oleg Nesterov wrote:
> On 02/18, Oleg Nesterov wrote:
> >
> > On 02/17, Rafael J. Wysocki wrote:
> > >
> > > Alternatively, we can move the check into refrigerator(), like this:
> > > 
> > > --- linux-2.6.20-git13.orig/kernel/power/process.c
> > > +++ linux-2.6.20-git13/kernel/power/process.c
> > > @@ -39,6 +39,11 @@ void refrigerator(void)
> > >  	/* Hmm, should we be allowed to suspend when there are realtime
> > >  	   processes around? */
> > >  	long save;
> > > +
> > > +	/* Freeze the task unless there is a vfork completion pending */
> > > +	if (current->vfork_done)
> > > +		return;
> > > +
> > 
> > This means that "current" returns to user space (get_signal_to_deliver
> > will clear TIF_SIGPENDING) and runs. While try_to_freeze_tasks() thinks
> > it is frozen.
> 
> Ah, sorry. I am wrong, current has no PF_FROZEN yet.
> 
> However, this means that sys_vfork() makes impossible to freeze processes
> until child exits/execs. Not good.

Yes, but this also is the current behavior.

I think the real solution would be to use an interruptible completion in the
vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
patch that implemented it.  Still, for the suspend this really is not an issue
in practice, so it wasn't merged.

It may be a good time to solve this problem now. :-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 10:43                       ` Rafael J. Wysocki
@ 2007-02-18 11:31                         ` Oleg Nesterov
  2007-02-18 12:14                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 11:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 00:47, Oleg Nesterov wrote:
> > 
> > However, this means that sys_vfork() makes impossible to freeze processes
> > until child exits/execs. Not good.
> 
> Yes, but this also is the current behavior.

Yes, yes, I see.

I forgot to say that we have another problem: coredumping.

A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
refrigerator. I think this could be solved easily if we add a check to
refrigerator() as you suggested for ->vfork_donw.

> I think the real solution would be to use an interruptible completion in the
> vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> patch that implemented it.  Still, for the suspend this really is not an issue
> in practice, so it wasn't merged.

It is not (afaics) so trivial to do rightly, and with this change the parent
will be seen as TASK_INTERRUPTIBLE even without freezer in progress.

A very vague idea: what if parent will do

	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
	wait_for_completion(&vfork);
	try_to_freeze();

?

> It may be a good time to solve this problem now. :-)

Yes, I think so :)

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 10:32                     ` Rafael J. Wysocki
@ 2007-02-18 11:32                       ` Oleg Nesterov
  2007-02-18 12:12                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 11:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 00:42, Oleg Nesterov wrote:
> > On 02/17, Rafael J. Wysocki wrote:
> > >
> > > On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> > > > 
> > > > 	static inline int is_user_space(struct task_struct *p)
> > > > 	{
> > > > 		return p->mm && !(p->flags & PF_BORROWED_MM);
> > > > 	}
> > > > 
> > > > This doesn't look right. First, an exiting task has ->mm == NULL after
> > > > do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> > > > check is racy without task_lock(), so we can have a false positive as
> > > > well. Is it ok? We can freeze aio_wq prematurely.
> > > 
> > > Right now aio_wq is not freezeable (PF_NOFREEZE).
> > 
> > Right now yes, but we are going to change this?
> 
> Well, is there any more reliable (and not racy) method of differentiating
> between kernel threads and user space processes?

Not that I know of. At least, we can take task_lock() to really rule out
kernel threads at FREEZER_USER_SPACE stage.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 11:32                       ` Oleg Nesterov
@ 2007-02-18 12:12                         ` Rafael J. Wysocki
  2007-02-18 15:06                           ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 12:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 12:32, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 00:42, Oleg Nesterov wrote:
> > > On 02/17, Rafael J. Wysocki wrote:
> > > >
> > > > On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> > > > > 
> > > > > 	static inline int is_user_space(struct task_struct *p)
> > > > > 	{
> > > > > 		return p->mm && !(p->flags & PF_BORROWED_MM);
> > > > > 	}
> > > > > 
> > > > > This doesn't look right. First, an exiting task has ->mm == NULL after
> > > > > do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> > > > > check is racy without task_lock(), so we can have a false positive as
> > > > > well. Is it ok? We can freeze aio_wq prematurely.
> > > > 
> > > > Right now aio_wq is not freezeable (PF_NOFREEZE).
> > > 
> > > Right now yes, but we are going to change this?
> > 
> > Well, is there any more reliable (and not racy) method of differentiating
> > between kernel threads and user space processes?
> 
> Not that I know of. At least, we can take task_lock() to really rule out
> kernel threads at FREEZER_USER_SPACE stage.

Something like this?

---
 kernel/power/process.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -8,6 +8,7 @@
 
 #undef DEBUG
 
+#include <linux/sched.h>
 #include <linux/smp_lock.h>
 #include <linux/interrupt.h>
 #include <linux/suspend.h>
@@ -92,7 +93,12 @@ static void cancel_freezing(struct task_
 
 static inline int is_user_space(struct task_struct *p)
 {
-	return p->mm && !(p->flags & PF_BORROWED_MM);
+	int ret;
+
+	task_lock(p);
+	ret = p->mm && !(p->flags & PF_BORROWED_MM);
+	task_unlock(p);
+	return ret;
 }
 
 static unsigned int try_to_freeze_tasks(int freeze_user_space)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 11:31                         ` Oleg Nesterov
@ 2007-02-18 12:14                           ` Rafael J. Wysocki
  2007-02-18 14:52                             ` freezer problems Oleg Nesterov
  2007-02-18 15:09                             ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 12:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 00:47, Oleg Nesterov wrote:
> > > 
> > > However, this means that sys_vfork() makes impossible to freeze processes
> > > until child exits/execs. Not good.
> > 
> > Yes, but this also is the current behavior.
> 
> Yes, yes, I see.
> 
> I forgot to say that we have another problem: coredumping.
> 
> A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
> on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
> refrigerator. I think this could be solved easily if we add a check to
> refrigerator() as you suggested for ->vfork_donw.
> 
> > I think the real solution would be to use an interruptible completion in the
> > vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> > patch that implemented it.  Still, for the suspend this really is not an issue
> > in practice, so it wasn't merged.
> 
> It is not (afaics) so trivial to do rightly, and with this change the parent
> will be seen as TASK_INTERRUPTIBLE even without freezer in progress.
> 
> A very vague idea: what if parent will do
> 
> 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> 	wait_for_completion(&vfork);
> 	try_to_freeze();
> 
> ?

This should work, but we'll need a separate process flag for it.  If that's
acceptable, I'd call it PF_VFORK_PARENT

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 11:24             ` Rafael J. Wysocki
  2007-02-17 21:34               ` Oleg Nesterov
@ 2007-02-18 12:56               ` Pavel Machek
  2007-02-21 14:52               ` Gautham R Shenoy
  2 siblings, 0 replies; 92+ messages in thread
From: Pavel Machek @ 2007-02-18 12:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg

Hi!

> > > So I think tonight I'll start adding try_to_freeze() to the kernel threads that
> > > set PF_NOFREEZE.
> > 
> > cool! While you are at it, let me try to enhance the freezer api's
> > to incorporate the PFE_* flags.
> 
> Here's a patch that adds try_to_freeze() to all kernel threads that didn't call
> it before.  It shouldn't change the behavior of the threads in question, since
> they won't be frozen because the are flagged as PF_NOFREEZE (of course
> we are going to change this later).

Looks ok.

> Compile-tested on x86_64 with allmodconfig.
> 
> Pavel, do you think we can remove the PF_NOFREEZE from bluetooth,
> BTW?

Yes... bluetooth has no reason to play with NOFREEZE.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* freezer problems
  2007-02-18 12:14                           ` Rafael J. Wysocki
@ 2007-02-18 14:52                             ` Oleg Nesterov
  2007-02-18 15:14                               ` Rafael J. Wysocki
  2007-02-18 18:56                               ` Rafael J. Wysocki
  2007-02-18 15:09                             ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
  1 sibling, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > > > 
> > > > However, this means that sys_vfork() makes impossible to freeze processes
> > > > until child exits/execs. Not good.
> > > 
> > I forgot to say that we have another problem: coredumping.
> > 
> > A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
> > on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
> > refrigerator. I think this could be solved easily if we add a check to
> > refrigerator() as you suggested for ->vfork_donw.
> > 
> > > I think the real solution would be to use an interruptible completion in the
> > > vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> > > patch that implemented it.  Still, for the suspend this really is not an issue
> > > in practice, so it wasn't merged.
> > 
> > It is not (afaics) so trivial to do rightly, and with this change the parent
> > will be seen as TASK_INTERRUPTIBLE even without freezer in progress.
> > 
> > A very vague idea: what if parent will do
> > 
> > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > 	wait_for_completion(&vfork);
> > 	try_to_freeze();
> > 
> > ?
> 
> This should work,

Good. So try_to_freeze_tasks() can forget about "if (!p->vfork_done)" check.
This needs more thinking, of course. For example, thaw_process() should be
carefull to clear TIF_FREEZE if we have the new flag set, but not PF_FROZEN.
frozen() should be changed to return true if PF_NEW_FLAG && TIF_FREEZE, but
it also called by refrigerator.

But IF we really can do this, it will be a general solution.

>                    but we'll need a separate process flag for it.  If that's
> acceptable,

I can't judge. Changed the subject to have more attention from experts.

>              I'd call it PF_VFORK_PARENT

I'd suggest not to put "VFORK" into the name. Probably we will find other
usage for this flag which in fact means: "I am sleeping TASK_UNINTERRUPTIBLE
at the safe place to freeze, I promise to do try_to_freeze() when I have
CPU".

And now another problem: exec. de_thread() sleeps in TASK_UNINTERRUPTIBLE
waiting for all sub-threads to die, and we have the same "deadlock" if
one of them is frozen. This is nasty. Probably we can change the ->state
to TASK_INTERRUPTIBLE and add try_to_freeze(), or play with the new PF_
flag, but I am not sure it is safe to freeze() the task which is deep
in the exec() path.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 12:12                         ` Rafael J. Wysocki
@ 2007-02-18 15:06                           ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 15:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 12:32, Oleg Nesterov wrote:
> > On 02/18, Rafael J. Wysocki wrote:
> > >
> > > On Sunday, 18 February 2007 00:42, Oleg Nesterov wrote:
> > > > On 02/17, Rafael J. Wysocki wrote:
> > > > >
> > > > > On Saturday, 17 February 2007 22:34, Oleg Nesterov wrote:
> > > > > > 
> > > > > > 	static inline int is_user_space(struct task_struct *p)
> > > > > > 	{
> > > > > > 		return p->mm && !(p->flags & PF_BORROWED_MM);
> > > > > > 	}
> > > > > > 
> > > > > > This doesn't look right. First, an exiting task has ->mm == NULL after
> > > > > > do_exit()->exit_mm(). Probably not a problem. However, PF_BORROWED_MM
> > > > > > check is racy without task_lock(), so we can have a false positive as
> > > > > > well. Is it ok? We can freeze aio_wq prematurely.
> > > > > 
> > > > > Right now aio_wq is not freezeable (PF_NOFREEZE).
> > > > 
> > > > Right now yes, but we are going to change this?
> > > 
> > > Well, is there any more reliable (and not racy) method of differentiating
> > > between kernel threads and user space processes?
> > 
> > Not that I know of. At least, we can take task_lock() to really rule out
> > kernel threads at FREEZER_USER_SPACE stage.
> 
> Something like this?

I think yes, as a first step.

In the long term, I think it would be really nice to have CLONE_KERNEL_THREAD
(filtered out in sys_clone). This also allows us to cleanup copy_process().
For example, we can then introduce CLONE_UNHASHED (currently denoted as pid==0)
and kill the ugly "int pid" copy_process's parameter.

Oleg.

> ---
>  kernel/power/process.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.20-mm2/kernel/power/process.c
> ===================================================================
> --- linux-2.6.20-mm2.orig/kernel/power/process.c
> +++ linux-2.6.20-mm2/kernel/power/process.c
> @@ -8,6 +8,7 @@
>  
>  #undef DEBUG
>  
> +#include <linux/sched.h>
>  #include <linux/smp_lock.h>
>  #include <linux/interrupt.h>
>  #include <linux/suspend.h>
> @@ -92,7 +93,12 @@ static void cancel_freezing(struct task_
>  
>  static inline int is_user_space(struct task_struct *p)
>  {
> -	return p->mm && !(p->flags & PF_BORROWED_MM);
> +	int ret;
> +
> +	task_lock(p);
> +	ret = p->mm && !(p->flags & PF_BORROWED_MM);
> +	task_unlock(p);
> +	return ret;
>  }
>  
>  static unsigned int try_to_freeze_tasks(int freeze_user_space)


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 12:14                           ` Rafael J. Wysocki
  2007-02-18 14:52                             ` freezer problems Oleg Nesterov
@ 2007-02-18 15:09                             ` Rafael J. Wysocki
  2007-02-18 16:11                               ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 15:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 13:14, Rafael J. Wysocki wrote:
> On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > On 02/18, Rafael J. Wysocki wrote:
> > >
> > > On Sunday, 18 February 2007 00:47, Oleg Nesterov wrote:
> > > > 
> > > > However, this means that sys_vfork() makes impossible to freeze processes
> > > > until child exits/execs. Not good.
> > > 
> > > Yes, but this also is the current behavior.
> > 
> > Yes, yes, I see.
> > 
> > I forgot to say that we have another problem: coredumping.
> > 
> > A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
> > on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
> > refrigerator. I think this could be solved easily if we add a check to
> > refrigerator() as you suggested for ->vfork_donw.
> > 
> > > I think the real solution would be to use an interruptible completion in the
> > > vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> > > patch that implemented it.  Still, for the suspend this really is not an issue
> > > in practice, so it wasn't merged.
> > 
> > It is not (afaics) so trivial to do rightly, and with this change the parent
> > will be seen as TASK_INTERRUPTIBLE even without freezer in progress.
> > 
> > A very vague idea: what if parent will do
> > 
> > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > 	wait_for_completion(&vfork);
> > 	try_to_freeze();
> > 
> > ?
> 
> This should work, but we'll need a separate process flag for it.  If that's
> acceptable, I'd call it PF_VFORK_PARENT

Hm, what about the following patch instead?

The problem is that if the child enters the refrigeratior, we can't freeze the
parent, because it's uninterruptible, but the child knows the parent will be
uninterruptible until it exits, so the child can mark the parent as frozen.

 kernel/power/process.c |   27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c	2007-02-18 15:43:30.000000000 +0100
+++ linux-2.6.20-mm2/kernel/power/process.c	2007-02-18 16:09:53.000000000 +0100
@@ -39,6 +39,13 @@ void refrigerator(void)
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	/* The parent is uninterruptible and will stay so until this task exits,
+	 * so we can mark it as frozen.
+	 */
+	if (current->vfork_done)
+		frozen_process(current->parent);
+
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
@@ -53,6 +60,9 @@ void refrigerator(void)
 	}
 	pr_debug("%s left refrigerator\n", current->comm);
 	current->state = save;
+
+	if (current->vfork_done && frozen(current->parent))
+		current->parent->flags &= ~PF_FROZEN;
 }
 
 static inline void freeze_process(struct task_struct *p)
@@ -117,21 +127,10 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
+			freeze_process(p);
 			todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 14:52                             ` freezer problems Oleg Nesterov
@ 2007-02-18 15:14                               ` Rafael J. Wysocki
  2007-02-18 16:19                                 ` Oleg Nesterov
  2007-02-18 18:56                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 15:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 15:52, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > > > > 
> > > > > However, this means that sys_vfork() makes impossible to freeze processes
> > > > > until child exits/execs. Not good.
> > > > 
> > > I forgot to say that we have another problem: coredumping.
> > > 
> > > A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
> > > on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
> > > refrigerator. I think this could be solved easily if we add a check to
> > > refrigerator() as you suggested for ->vfork_donw.
> > > 
> > > > I think the real solution would be to use an interruptible completion in the
> > > > vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> > > > patch that implemented it.  Still, for the suspend this really is not an issue
> > > > in practice, so it wasn't merged.
> > > 
> > > It is not (afaics) so trivial to do rightly, and with this change the parent
> > > will be seen as TASK_INTERRUPTIBLE even without freezer in progress.
> > > 
> > > A very vague idea: what if parent will do
> > > 
> > > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > > 	wait_for_completion(&vfork);
> > > 	try_to_freeze();
> > > 
> > > ?
> > 
> > This should work,
> 
> Good. So try_to_freeze_tasks() can forget about "if (!p->vfork_done)" check.
> This needs more thinking, of course. For example, thaw_process() should be
> carefull to clear TIF_FREEZE if we have the new flag set, but not PF_FROZEN.
> frozen() should be changed to return true if PF_NEW_FLAG && TIF_FREEZE, but
> it also called by refrigerator.
> 
> But IF we really can do this, it will be a general solution.
> 
> >                    but we'll need a separate process flag for it.  If that's
> > acceptable,
> 
> I can't judge. Changed the subject to have more attention from experts.
> 
> >              I'd call it PF_VFORK_PARENT
> 
> I'd suggest not to put "VFORK" into the name. Probably we will find other
> usage for this flag which in fact means: "I am sleeping TASK_UNINTERRUPTIBLE
> at the safe place to freeze, I promise to do try_to_freeze() when I have
> CPU".
> 
> And now another problem: exec. de_thread() sleeps in TASK_UNINTERRUPTIBLE
> waiting for all sub-threads to die, and we have the same "deadlock" if
> one of them is frozen. This is nasty. Probably we can change the ->state
> to TASK_INTERRUPTIBLE and add try_to_freeze(), or play with the new PF_
> flag, but I am not sure it is safe to freeze() the task which is deep
> in the exec() path.

Hm, I haven't been aware of this case.

Well, probably we can do something like in the patch that I've just sent: the
child that enters the refrigerator should know that the parent is
uninterruptible and will wait for it to exit.  Thus it can either mark the
parent as frozen or just exit the refrigerator without freezing itself.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 15:09                             ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
@ 2007-02-18 16:11                               ` Oleg Nesterov
  2007-02-18 18:51                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 16:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> > On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > > 
> > > A very vague idea: what if parent will do
> > > 
> > > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > > 	wait_for_completion(&vfork);
> > > 	try_to_freeze();
> > > 
> > > ?
> 
> Hm, what about the following patch instead?
> 
> The problem is that if the child enters the refrigeratior, we can't freeze the
> parent, because it's uninterruptible, but the child knows the parent will be
> uninterruptible until it exits, so the child can mark the parent as frozen.
> 
> --- linux-2.6.20-mm2.orig/kernel/power/process.c	2007-02-18 15:43:30.000000000 +0100
> +++ linux-2.6.20-mm2/kernel/power/process.c	2007-02-18 16:09:53.000000000 +0100
> @@ -39,6 +39,13 @@ void refrigerator(void)
>  	/* Hmm, should we be allowed to suspend when there are realtime
>  	   processes around? */
>  	long save;
> +
> +	/* The parent is uninterruptible and will stay so until this task exits,
> +	 * so we can mark it as frozen.
> +	 */
> +	if (current->vfork_done)
> +		frozen_process(current->parent);

This is not safe. task->flags is not atomic, we can change ->flags only
if we know the task won't touch it itself (ptrace, thaw_process).
The parent could be interrupted, irq may play with current->flags (slab,
for example).

Please note that ->parent may do things like ptrace_notify() before
it actually sleeps on ->vfork_done. This means that even if we could
set PF_FROZEN in a safe manner, this doesn't look like a good idea.

> +
> +	if (current->vfork_done && frozen(current->parent))
> +		current->parent->flags &= ~PF_FROZEN;
>  }

Why? If the code above works, we shouldn't take care about frozen
->parent?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 15:14                               ` Rafael J. Wysocki
@ 2007-02-18 16:19                                 ` Oleg Nesterov
  2007-02-18 18:14                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 16:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 15:52, Oleg Nesterov wrote:
> > 
> > And now another problem: exec. de_thread() sleeps in TASK_UNINTERRUPTIBLE
> > waiting for all sub-threads to die, and we have the same "deadlock" if
> > one of them is frozen. This is nasty. Probably we can change the ->state
> > to TASK_INTERRUPTIBLE and add try_to_freeze(), or play with the new PF_
> > flag, but I am not sure it is safe to freeze() the task which is deep
> > in the exec() path.
> 
> Hm, I haven't been aware of this case.
> 
> Well, probably we can do something like in the patch that I've just sent: the
> child that enters the refrigerator should know that the parent is
> uninterruptible and will wait for it to exit.  Thus it can either mark the
> parent as frozen or just exit the refrigerator without freezing itself.

Sub-thread could already sleep in refrigerator when another thread does exec.
So we have no choice but somehow freeze the execer. But again, I don't know
if it is safe to freeze it here, at de_thread() stage. It is called from
load_xxx_binary(), we may hold some important locks...

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 16:19                                 ` Oleg Nesterov
@ 2007-02-18 18:14                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 18:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 17:19, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 15:52, Oleg Nesterov wrote:
> > > 
> > > And now another problem: exec. de_thread() sleeps in TASK_UNINTERRUPTIBLE
> > > waiting for all sub-threads to die, and we have the same "deadlock" if
> > > one of them is frozen. This is nasty. Probably we can change the ->state
> > > to TASK_INTERRUPTIBLE and add try_to_freeze(), or play with the new PF_
> > > flag, but I am not sure it is safe to freeze() the task which is deep
> > > in the exec() path.
> > 
> > Hm, I haven't been aware of this case.
> > 
> > Well, probably we can do something like in the patch that I've just sent: the
> > child that enters the refrigerator should know that the parent is
> > uninterruptible and will wait for it to exit.  Thus it can either mark the
> > parent as frozen or just exit the refrigerator without freezing itself.
> 
> Sub-thread could already sleep in refrigerator when another thread does exec.
> So we have no choice but somehow freeze the execer. But again, I don't know
> if it is safe to freeze it here, at de_thread() stage. It is called from
> load_xxx_binary(), we may hold some important locks...

So it probably isn't safe.

Rafael


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-18 16:11                               ` Oleg Nesterov
@ 2007-02-18 18:51                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 18:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 17:11, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > > On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > > > 
> > > > A very vague idea: what if parent will do
> > > > 
> > > > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > > > 	wait_for_completion(&vfork);
> > > > 	try_to_freeze();
> > > > 
> > > > ?
> > 
> > Hm, what about the following patch instead?
> > 
> > The problem is that if the child enters the refrigeratior, we can't freeze the
> > parent, because it's uninterruptible, but the child knows the parent will be
> > uninterruptible until it exits, so the child can mark the parent as frozen.
> > 
> > --- linux-2.6.20-mm2.orig/kernel/power/process.c	2007-02-18 15:43:30.000000000 +0100
> > +++ linux-2.6.20-mm2/kernel/power/process.c	2007-02-18 16:09:53.000000000 +0100
> > @@ -39,6 +39,13 @@ void refrigerator(void)
> >  	/* Hmm, should we be allowed to suspend when there are realtime
> >  	   processes around? */
> >  	long save;
> > +
> > +	/* The parent is uninterruptible and will stay so until this task exits,
> > +	 * so we can mark it as frozen.
> > +	 */
> > +	if (current->vfork_done)
> > +		frozen_process(current->parent);
> 
> This is not safe. task->flags is not atomic, we can change ->flags only
> if we know the task won't touch it itself (ptrace, thaw_process).
> The parent could be interrupted, irq may play with current->flags (slab,
> for example).
> 
> Please note that ->parent may do things like ptrace_notify() before
> it actually sleeps on ->vfork_done. This means that even if we could
> set PF_FROZEN in a safe manner, this doesn't look like a good idea.
> 
> > +
> > +	if (current->vfork_done && frozen(current->parent))
> > +		current->parent->flags &= ~PF_FROZEN;
> >  }
> 
> Why? If the code above works, we shouldn't take care about frozen
> ->parent?

I've added this for symmetry.  thaw_tasks() should reset PF_FROZEN for it
anyway. 

Okay, so I'll post the patch that implements your idea in the other thread.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 14:52                             ` freezer problems Oleg Nesterov
  2007-02-18 15:14                               ` Rafael J. Wysocki
@ 2007-02-18 18:56                               ` Rafael J. Wysocki
  2007-02-18 22:01                                 ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 18:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 15:52, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 12:31, Oleg Nesterov wrote:
> > > > > 
> > > > > However, this means that sys_vfork() makes impossible to freeze processes
> > > > > until child exits/execs. Not good.
> > > > 
> > > I forgot to say that we have another problem: coredumping.
> > > 
> > > A thread which does do_coredump() send SIGKILL to ->mm users, and sleeps
> > > on ->mm->core_startup_done. Now it can't be frozen if sub-thread goes to
> > > refrigerator. I think this could be solved easily if we add a check to
> > > refrigerator() as you suggested for ->vfork_donw.
> > > 
> > > > I think the real solution would be to use an interruptible completion in the
> > > > vfork code.  It was discussed some time ago and, IIRC, Ingo had an experimental
> > > > patch that implemented it.  Still, for the suspend this really is not an issue
> > > > in practice, so it wasn't merged.
> > > 
> > > It is not (afaics) so trivial to do rightly, and with this change the parent
> > > will be seen as TASK_INTERRUPTIBLE even without freezer in progress.
> > > 
> > > A very vague idea: what if parent will do
> > > 
> > > 	current->flags |= PF_PLEASE_CONSIDER_ME_AS_FROZEN_BUT_SET_TIF_FREEZE
> > > 	wait_for_completion(&vfork);
> > > 	try_to_freeze();
> > > 
> > > ?
> > 
> > This should work,
> 
> Good. So try_to_freeze_tasks() can forget about "if (!p->vfork_done)" check.
> This needs more thinking, of course. For example, thaw_process() should be
> carefull to clear TIF_FREEZE if we have the new flag set, but not PF_FROZEN.
> frozen() should be changed to return true if PF_NEW_FLAG && TIF_FREEZE, but
> it also called by refrigerator.
> 
> But IF we really can do this, it will be a general solution.

Appended is a patch that does something along these lines.  The necessary
thread_info flags are defined for i386 and x86_64, for now.

Greetings,
Rafael


 include/asm-i386/thread_info.h   |    2 ++
 include/asm-x86_64/thread_info.h |    2 ++
 include/linux/freezer.h          |   24 ++++++++++++++++++++++++
 kernel/fork.c                    |    4 ++++
 kernel/power/process.c           |   24 +++++++++---------------
 5 files changed, 41 insertions(+), 15 deletions(-)

Index: linux-2.6.20-mm2/include/asm-i386/thread_info.h
===================================================================
--- linux-2.6.20-mm2.orig/include/asm-i386/thread_info.h	2007-02-18 19:49:34.000000000 +0100
+++ linux-2.6.20-mm2/include/asm-i386/thread_info.h	2007-02-18 19:50:37.000000000 +0100
@@ -135,6 +135,7 @@ static inline struct thread_info *curren
 #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
 #define TIF_FREEZE		19	/* is freezing for suspend */
 #define TIF_FORCED_TF		20	/* true if TF in eflags artificially */
+#define TIF_FREEZER_SKIP	21	/* task freezer should not count us */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -149,6 +150,7 @@ static inline struct thread_info *curren
 #define _TIF_IO_BITMAP		(1<<TIF_IO_BITMAP)
 #define _TIF_FREEZE		(1<<TIF_FREEZE)
 #define _TIF_FORCED_TF		(1<<TIF_FORCED_TF)
+#define _TIF_FREEZER_SKIP	(1<<TIF_FREEZER_SKIP)
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK \
Index: linux-2.6.20-mm2/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.20-mm2.orig/include/asm-x86_64/thread_info.h	2007-02-18 19:49:34.000000000 +0100
+++ linux-2.6.20-mm2/include/asm-x86_64/thread_info.h	2007-02-18 19:50:37.000000000 +0100
@@ -123,6 +123,7 @@ static inline struct thread_info *stack_
 #define TIF_DEBUG		21	/* uses debug registers */
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
 #define TIF_FREEZE		23	/* is freezing for suspend */
+#define TIF_FREEZER_SKIP	24	/* task freezer should not count us */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -140,6 +141,7 @@ static inline struct thread_info *stack_
 #define _TIF_DEBUG		(1<<TIF_DEBUG)
 #define _TIF_IO_BITMAP		(1<<TIF_IO_BITMAP)
 #define _TIF_FREEZE		(1<<TIF_FREEZE)
+#define _TIF_FREEZER_SKIP	(1<<TIF_FREEZER_SKIP)
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK \
Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h	2007-02-18 19:49:34.000000000 +0100
+++ linux-2.6.20-mm2/include/linux/freezer.h	2007-02-18 19:50:37.000000000 +0100
@@ -36,6 +36,30 @@ static inline void do_not_freeze(struct 
 }
 
 /*
+ * Tell the freezer not to count this task as freezeable
+ */
+static inline void freezer_do_not_count(struct task_struct *p)
+{
+	set_tsk_thread_flag(p, TIF_FREEZER_SKIP);
+}
+
+/*
+ * Tell the freezer to count this task as freezeable
+ */
+static inline void freezer_count(struct task_struct *p)
+{
+	clear_tsk_thread_flag(p, TIF_FREEZER_SKIP);
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+	return test_tsk_thread_flag(p, TIF_FREEZER_SKIP);
+}
+
+/*
  * Wake up a frozen process
  */
 static inline int thaw_process(struct task_struct *p)
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c	2007-02-18 19:49:34.000000000 +0100
+++ linux-2.6.20-mm2/kernel/fork.c	2007-02-18 19:50:37.000000000 +0100
@@ -50,6 +50,7 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/ptrace.h>
+#include <linux/freezer.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1393,7 +1394,10 @@ long do_fork(unsigned long clone_flags,
 		tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count(current);
 			wait_for_completion(&vfork);
+			try_to_freeze();
+			freezer_count(current);
 			tracehook_report_vfork_done(p, nr);
 		}
 	} else {
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c	2007-02-18 19:49:34.000000000 +0100
+++ linux-2.6.20-mm2/kernel/power/process.c	2007-02-18 19:50:37.000000000 +0100
@@ -117,22 +117,12 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
-			todo++;
+			freeze_process(p);
+			if (!freezer_should_skip(p))
+				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */
@@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
 
 	read_lock(&tasklist_lock);
 	do_each_thread(g, p) {
+		if (freezer_should_skip(p))
+			cancel_freezing(p);
+	} while_each_thread(g, p);
+	do_each_thread(g, p) {
 		if (!freezeable(p))
 			continue;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 18:56                               ` Rafael J. Wysocki
@ 2007-02-18 22:01                                 ` Oleg Nesterov
  2007-02-18 23:19                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-18 22:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/18, Rafael J. Wysocki wrote:
> 
> Appended is a patch that does something along these lines.  The necessary
> thread_info flags are defined for i386 and x86_64, for now.

I'll try to look at this patch when I am not sooooo sleepy ...

just one small nit right now,

> --- linux-2.6.20-mm2.orig/include/asm-i386/thread_info.h	2007-02-18 19:49:34.000000000 +0100
> +++ linux-2.6.20-mm2/include/asm-i386/thread_info.h	2007-02-18 19:50:37.000000000 +0100
> @@ -135,6 +135,7 @@ static inline struct thread_info *curren
>  #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
>  #define TIF_FREEZE		19	/* is freezing for suspend */
>  #define TIF_FORCED_TF		20	/* true if TF in eflags artificially */
> +#define TIF_FREEZER_SKIP	21	/* task freezer should not count us */

Do we need to put this flag into thread_info? It is always modified by
"current", so it could live in task_struct->flags instead.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 22:01                                 ` Oleg Nesterov
@ 2007-02-18 23:19                                   ` Rafael J. Wysocki
  2007-02-19 20:23                                     ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-18 23:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Sunday, 18 February 2007 23:01, Oleg Nesterov wrote:
> On 02/18, Rafael J. Wysocki wrote:
> > 
> > Appended is a patch that does something along these lines.  The necessary
> > thread_info flags are defined for i386 and x86_64, for now.
> 
> I'll try to look at this patch when I am not sooooo sleepy ...
> 
> just one small nit right now,
> 
> > --- linux-2.6.20-mm2.orig/include/asm-i386/thread_info.h	2007-02-18 19:49:34.000000000 +0100
> > +++ linux-2.6.20-mm2/include/asm-i386/thread_info.h	2007-02-18 19:50:37.000000000 +0100
> > @@ -135,6 +135,7 @@ static inline struct thread_info *curren
> >  #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
> >  #define TIF_FREEZE		19	/* is freezing for suspend */
> >  #define TIF_FORCED_TF		20	/* true if TF in eflags artificially */
> > +#define TIF_FREEZER_SKIP	21	/* task freezer should not count us */
> 
> Do we need to put this flag into thread_info? It is always modified by
> "current", so it could live in task_struct->flags instead.

I thought we were running low on the task_struct->flags bits. :-)

Apart from this, we may need to set it from somewhere else in the future.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-18 23:19                                   ` Rafael J. Wysocki
@ 2007-02-19 20:23                                     ` Oleg Nesterov
  2007-02-19 21:21                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-19 20:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/19, Rafael J. Wysocki wrote:
>
> On Sunday, 18 February 2007 23:01, Oleg Nesterov wrote:
> > > --- linux-2.6.20-mm2.orig/include/asm-i386/thread_info.h	2007-02-18 19:49:34.000000000 +0100
> > > +++ linux-2.6.20-mm2/include/asm-i386/thread_info.h	2007-02-18 19:50:37.000000000 +0100
> > > @@ -135,6 +135,7 @@ static inline struct thread_info *curren
> > >  #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
> > >  #define TIF_FREEZE		19	/* is freezing for suspend */
> > >  #define TIF_FORCED_TF		20	/* true if TF in eflags artificially */
> > > +#define TIF_FREEZER_SKIP	21	/* task freezer should not count us */
> > 
> > Do we need to put this flag into thread_info? It is always modified by
> > "current", so it could live in task_struct->flags instead.
> 
> I thought we were running low on the task_struct->flags bits. :-)

Didn't think about that :)

> Apart from this, we may need to set it from somewhere else in the future.

I doubt. In any case, since you provided the nice helpers, it would be very
easy to convert from thread to process flags. My main concern is that we
have 24 include/asm-*/thread_info.h files, but only 1 include/linux/sched.h.
It seems more easy to start with PF_ flags at first.

> @@ -1393,7 +1394,10 @@ long do_fork(unsigned long clone_flags,
>
> 		if (clone_flags & CLONE_VFORK) {
> +                       freezer_do_not_count(current);
> 			  wait_for_completion(&vfork);
> +                       try_to_freeze();
> +                       freezer_count(current);

freezer_do_not_count() implies that we must do try_to_freeze() later, I'd
suggest to shift try_to_freeze() into freezer_count(). Actually, I think that
freezer_do_not_count/freezer_count should be "(void)", like try_to_freeze().
IOW,

	freezer_do_not_count()
	... sleep in 'D' state ...
	freezer_count()

means that current doesn't hold any "important" locks, may be considered as
frozen, it can do nothing except enter refrigerator if it gets CPU.

(Please feel free to ignore, this is a matter of taste of course).

> @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
>
>         read_lock(&tasklist_lock);
>         do_each_thread(g, p) {
> +               if (freezer_should_skip(p))
> +                       cancel_freezing(p);
> +       } while_each_thread(g, p);
> +       do_each_thread(g, p) {
>                 if (!freezeable(p))
>                         continue;

Any reason for 2 separate do_each_thread() loops ?

I think this patch is correct, but I still can't convince myself I really
understand freezer :)

Btw,

On 02/18, Some Idiot wrote:
>
> On 02/18, Rafael J. Wysocki wrote:
> >
> > +   /* The parent is uninterruptible and will stay so until this task exits,
> > +    * so we can mark it as frozen.
> > +    */
> > +   if (current->vfork_done)
> > +           frozen_process(current->parent);
>
> This is not safe. task->flags is not atomic, we can change ->flags only
> if we know the task won't touch it itself (ptrace, thaw_process).
> The parent could be interrupted, irq may play with current->flags (slab,
> for example).

Irq only does atomic allocations, so the slab won't play with task->flags.
Still I believe the concern was valid in general and I personally think
the new patch is better (and more generic).

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-19 20:23                                     ` Oleg Nesterov
@ 2007-02-19 21:21                                       ` Rafael J. Wysocki
  2007-02-19 22:41                                         ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-19 21:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> On 02/19, Rafael J. Wysocki wrote:
> >
> > On Sunday, 18 February 2007 23:01, Oleg Nesterov wrote:
> > > > --- linux-2.6.20-mm2.orig/include/asm-i386/thread_info.h	2007-02-18 19:49:34.000000000 +0100
> > > > +++ linux-2.6.20-mm2/include/asm-i386/thread_info.h	2007-02-18 19:50:37.000000000 +0100
> > > > @@ -135,6 +135,7 @@ static inline struct thread_info *curren
> > > >  #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
> > > >  #define TIF_FREEZE		19	/* is freezing for suspend */
> > > >  #define TIF_FORCED_TF		20	/* true if TF in eflags artificially */
> > > > +#define TIF_FREEZER_SKIP	21	/* task freezer should not count us */
> > > 
> > > Do we need to put this flag into thread_info? It is always modified by
> > > "current", so it could live in task_struct->flags instead.
> > 
> > I thought we were running low on the task_struct->flags bits. :-)
> 
> Didn't think about that :)

Seriously, I'm not sure.  There are 23 PF_* flags already defined, while
for example on x86_64 there are 17 TIF_* flags defined which is not that
much better.

> > Apart from this, we may need to set it from somewhere else in the future.
> 
> I doubt. In any case, since you provided the nice helpers, it would be very
> easy to convert from thread to process flags. My main concern is that we
> have 24 include/asm-*/thread_info.h files, but only 1 include/linux/sched.h.
> It seems more easy to start with PF_ flags at first.

OK

> > @@ -1393,7 +1394,10 @@ long do_fork(unsigned long clone_flags,
> >
> > 		if (clone_flags & CLONE_VFORK) {
> > +                       freezer_do_not_count(current);
> > 			  wait_for_completion(&vfork);
> > +                       try_to_freeze();
> > +                       freezer_count(current);
> 
> freezer_do_not_count() implies that we must do try_to_freeze() later, I'd
> suggest to shift try_to_freeze() into freezer_count(). Actually, I think that
> freezer_do_not_count/freezer_count should be "(void)", like try_to_freeze().
> IOW,
> 
> 	freezer_do_not_count()
> 	... sleep in 'D' state ...
> 	freezer_count()
> 
> means that current doesn't hold any "important" locks, may be considered as
> frozen, it can do nothing except enter refrigerator if it gets CPU.
> 
> (Please feel free to ignore, this is a matter of taste of course).

Well, if we use a PF_* flag for that, it's also a matter of correctness (only
current should be able to set its flags).

> > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> >
> >         read_lock(&tasklist_lock);
> >         do_each_thread(g, p) {
> > +               if (freezer_should_skip(p))
> > +                       cancel_freezing(p);
> > +       } while_each_thread(g, p);
> > +       do_each_thread(g, p) {
> >                 if (!freezeable(p))
> >                         continue;
> 
> Any reason for 2 separate do_each_thread() loops ?

Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
set), we have to cancel it before the child is unfrozen, since otherwise the
parent may go freezing after we try to reset PF_FROZEN for it.

> I think this patch is correct, but I still can't convince myself I really
> understand freezer :)

Oh, that takes time.  It took me a year or so. ;-)

Here's the updated patch.  It hasn't been tested yet, but at least it compiles
(on x86_64).

 include/linux/sched.h   |    1 +
 include/linux/freezer.h |   30 ++++++++++++++++++++++++++++--
 kernel/fork.c           |    3 +++
 kernel/power/process.c  |   27 ++++++++++++---------------
 4 files changed, 44 insertions(+), 17 deletions(-)

Index: linux-2.6.20-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc
 #define PF_SPREAD_SLAB	0x02000000	/* Spread some slab caches over cpuset */
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
+#define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -71,7 +71,31 @@ static inline int try_to_freeze(void)
 		return 0;
 }
 
-extern void thaw_some_processes(int all);
+/*
+ * Tell the freezer not to count current task as freezeable
+ */
+static inline void freezer_do_not_count(void)
+{
+	current->flags |= PF_FREEZER_SKIP;
+}
+
+/*
+ * Try to freeze the current task and tell the freezer to count it as freezeable
+ * again
+ */
+static inline void freezer_count(void)
+{
+	try_to_freeze();
+	current->flags &= ~PF_FREEZER_SKIP;
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+	return !!(p->flags & PF_FREEZER_SKIP);
+}
 
 #else
 static inline int frozen(struct task_struct *p) { return 0; }
@@ -86,5 +110,7 @@ static inline void thaw_processes(void) 
 
 static inline int try_to_freeze(void) { return 0; }
 
-
+static inline void freezer_do_not_count(void) {}
+static inline void freezer_count(void) {}
+static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 #endif
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -50,6 +50,7 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/ptrace.h>
+#include <linux/freezer.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
 		tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
 			wait_for_completion(&vfork);
+			freezer_count();
 			tracehook_report_vfork_done(p, nr);
 		}
 	} else {
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -117,22 +117,12 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
-			todo++;
+			freeze_process(p);
+			if (!freezer_should_skip(p))
+				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */
@@ -199,6 +189,13 @@ static void thaw_tasks(int thaw_user_spa
 
 	read_lock(&tasklist_lock);
 	do_each_thread(g, p) {
+		if (is_user_space(p) == !thaw_user_space)
+			continue;
+
+		if (freezer_should_skip(p))
+			cancel_freezing(p);
+	} while_each_thread(g, p);
+	do_each_thread(g, p) {
 		if (!freezeable(p))
 			continue;
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-19 21:21                                       ` Rafael J. Wysocki
@ 2007-02-19 22:41                                         ` Oleg Nesterov
  2007-02-19 23:35                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-19 22:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/19, Rafael J. Wysocki wrote:
>
> On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> 
> > > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> > >
> > >         do_each_thread(g, p) {
> > > +               if (freezer_should_skip(p))
> > > +                       cancel_freezing(p);
> > > +       } while_each_thread(g, p);
> > > +       do_each_thread(g, p) {
> > >                 if (!freezeable(p))
> > >                         continue;
> > 
> > Any reason for 2 separate do_each_thread() loops ?
> 
> Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
> set), we have to cancel it before the child is unfrozen, since otherwise the
> parent may go freezing after we try to reset PF_FROZEN for it.

I see, thanks... thaw_process() doesn't take TIF_FREEZE into account.

But doesn't this mean we have a race?

Suppose that try_to_freeze_tasks() failed. It does cancel_freezing() for each
process before return, but what if some thread already checked TIF_FREEZE and
(for simplicity) it is preempted before frozen_process() in refrigerator().

thaw_tasks() runs, ignores this task (P), returns. P gets CPU, and becomes
frozen, but nobody will thaw it.

No?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-19 22:41                                         ` Oleg Nesterov
@ 2007-02-19 23:35                                           ` Rafael J. Wysocki
  2007-02-20  0:12                                             ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-19 23:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Monday, 19 February 2007 23:41, Oleg Nesterov wrote:
> On 02/19, Rafael J. Wysocki wrote:
> >
> > On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> > 
> > > > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> > > >
> > > >         do_each_thread(g, p) {
> > > > +               if (freezer_should_skip(p))
> > > > +                       cancel_freezing(p);
> > > > +       } while_each_thread(g, p);
> > > > +       do_each_thread(g, p) {
> > > >                 if (!freezeable(p))
> > > >                         continue;
> > > 
> > > Any reason for 2 separate do_each_thread() loops ?
> > 
> > Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
> > set), we have to cancel it before the child is unfrozen, since otherwise the
> > parent may go freezing after we try to reset PF_FROZEN for it.
> 
> I see, thanks... thaw_process() doesn't take TIF_FREEZE into account.
> 
> But doesn't this mean we have a race?
> 
> Suppose that try_to_freeze_tasks() failed. It does cancel_freezing() for each
> process before return, but what if some thread already checked TIF_FREEZE and
> (for simplicity) it is preempted before frozen_process() in refrigerator().
> 
> thaw_tasks() runs, ignores this task (P), returns. P gets CPU, and becomes
> frozen, but nobody will thaw it.
> 
> No?

Well, I think this is highly theoretical.  Namely, try_to_freeze_tasks() only
fails after the timeout that's currently set to 20 sec., and it yields the CPU
in each iteration of the main loop.  The task in question would have to refuse
being frozen for 20 sec. and then suddenly decide to freeze itself right before
try_to_freeze_tasks() checks the timeout for the very last time.  Then, it
would have to get preempted at this very moment and stay unfrozen at least
until thaw_tasks() starts running and in fact even longer.

I think we may avoid this by making try_to_freeze_tasks() sleep for some time
after it has reset TIF_FREEZE for all tasks in the error path, if anyone is
ever able to trigger it.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-19 23:35                                           ` Rafael J. Wysocki
@ 2007-02-20  0:12                                             ` Oleg Nesterov
  2007-02-20  0:32                                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-20  0:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/20, Rafael J. Wysocki wrote:
>
> On Monday, 19 February 2007 23:41, Oleg Nesterov wrote:
> > On 02/19, Rafael J. Wysocki wrote:
> > >
> > > On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> > > 
> > > > > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> > > > >
> > > > >         do_each_thread(g, p) {
> > > > > +               if (freezer_should_skip(p))
> > > > > +                       cancel_freezing(p);
> > > > > +       } while_each_thread(g, p);
> > > > > +       do_each_thread(g, p) {
> > > > >                 if (!freezeable(p))
> > > > >                         continue;
> > > > 
> > > > Any reason for 2 separate do_each_thread() loops ?
> > > 
> > > Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
> > > set), we have to cancel it before the child is unfrozen, since otherwise the
> > > parent may go freezing after we try to reset PF_FROZEN for it.
> > 
> > I see, thanks... thaw_process() doesn't take TIF_FREEZE into account.
> > 
> > But doesn't this mean we have a race?
> > 
> > Suppose that try_to_freeze_tasks() failed. It does cancel_freezing() for each
> > process before return, but what if some thread already checked TIF_FREEZE and
> > (for simplicity) it is preempted before frozen_process() in refrigerator().
> > 
> > thaw_tasks() runs, ignores this task (P), returns. P gets CPU, and becomes
> > frozen, but nobody will thaw it.
> > 
> > No?
> 
> Well, I think this is highly theoretical.  Namely, try_to_freeze_tasks() only
> fails after the timeout that's currently set to 20 sec., and it yields the CPU
> in each iteration of the main loop.  The task in question would have to refuse
> being frozen for 20 sec. and then suddenly decide to freeze itself right before
> try_to_freeze_tasks() checks the timeout for the very last time.  Then, it
> would have to get preempted at this very moment and stay unfrozen at least
> until thaw_tasks() starts running and in fact even longer.

Yes, yes, it is pure theroretical,

> I think we may avoid this by making try_to_freeze_tasks() sleep for some time
> after it has reset TIF_FREEZE for all tasks in the error path, if anyone is
> ever able to trigger it.

This makes this race  (pure theroretical) ** 2  :)

Still. May be it make sense to introduce cancel_freezing_and_thaw() function
(not right now) which stops the task from sleeping in refrigirator reliably.
I didn't think much about this, but it looks like we can fix coredump/exec
problems. Of course, this is not so important, we can ignore them at least
for now (->vfork_done is different, should be imho solved, because any user
can block freezer forever).

The fix:

	refrigerator:

	+	// we are going to call do_exit() really soon,
	+	// we have a pending SIGKILL
	+	if (current->signal->flags & SIGNAL_GROUP_EXIT)
	+		return;

		frozen_process(current);
		...


	zap_other_threads:

		for_each_subthread() {
			...

	+		// ---- SIGNAL_GROUP_EXIT is set ------
	+		// we can check sig->group_exit_task to detect de_thread,
	+		// but perhaps it doesn't hurt if the caller is do_group_exit
	+		cancel_freezing_and_thaw(p);
			sigaddset(&t->pending.signal, SIGKILL);
			signal_wake_up(t, 1);
		}

This way execer reliably kills all sub-threads and proceeds without blocking
try_to_freeze_tasks(). The same change could be done for zap_process() to fix
coredump.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-20  0:12                                             ` Oleg Nesterov
@ 2007-02-20  0:32                                               ` Rafael J. Wysocki
  2007-02-20  0:50                                                 ` Oleg Nesterov
  2007-02-20 18:29                                                 ` Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-20  0:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> On 02/20, Rafael J. Wysocki wrote:
> >
> > On Monday, 19 February 2007 23:41, Oleg Nesterov wrote:
> > > On 02/19, Rafael J. Wysocki wrote:
> > > >
> > > > On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> > > > 
> > > > > > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> > > > > >
> > > > > >         do_each_thread(g, p) {
> > > > > > +               if (freezer_should_skip(p))
> > > > > > +                       cancel_freezing(p);
> > > > > > +       } while_each_thread(g, p);
> > > > > > +       do_each_thread(g, p) {
> > > > > >                 if (!freezeable(p))
> > > > > >                         continue;
> > > > > 
> > > > > Any reason for 2 separate do_each_thread() loops ?
> > > > 
> > > > Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
> > > > set), we have to cancel it before the child is unfrozen, since otherwise the
> > > > parent may go freezing after we try to reset PF_FROZEN for it.
> > > 
> > > I see, thanks... thaw_process() doesn't take TIF_FREEZE into account.
> > > 
> > > But doesn't this mean we have a race?
> > > 
> > > Suppose that try_to_freeze_tasks() failed. It does cancel_freezing() for each
> > > process before return, but what if some thread already checked TIF_FREEZE and
> > > (for simplicity) it is preempted before frozen_process() in refrigerator().
> > > 
> > > thaw_tasks() runs, ignores this task (P), returns. P gets CPU, and becomes
> > > frozen, but nobody will thaw it.
> > > 
> > > No?
> > 
> > Well, I think this is highly theoretical.  Namely, try_to_freeze_tasks() only
> > fails after the timeout that's currently set to 20 sec., and it yields the CPU
> > in each iteration of the main loop.  The task in question would have to refuse
> > being frozen for 20 sec. and then suddenly decide to freeze itself right before
> > try_to_freeze_tasks() checks the timeout for the very last time.  Then, it
> > would have to get preempted at this very moment and stay unfrozen at least
> > until thaw_tasks() starts running and in fact even longer.
> 
> Yes, yes, it is pure theroretical,
> 
> > I think we may avoid this by making try_to_freeze_tasks() sleep for some time
> > after it has reset TIF_FREEZE for all tasks in the error path, if anyone is
> > ever able to trigger it.
> 
> This makes this race  (pure theroretical) ** 2  :)
> 
> Still. May be it make sense to introduce cancel_freezing_and_thaw() function
> (not right now) which stops the task from sleeping in refrigirator reliably.

Hm.  In the case discussed above we have a task that's right before calling
frozen_process(), so we can't thaw it, because it's not frozen.  It will be
frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
way to check this.

I think to close this race the refrigerator should check TIF_FREEZE and set
PF_FROZEN _and_ reset TIF_FREEZE under a lock that would also have to be
taken by try_to_freeze_tasks() in the beginning of the error path.  This will
ensure that all tasks either freeze themselves before the error path in
try_to_freeze_tasks() is executed, or remain unfrozen.

I'll try to prepare a patch to illustrate this, but right now I'm too tired to
do it. :-)

> I didn't think much about this, but it looks like we can fix coredump/exec
> problems. Of course, this is not so important, we can ignore them at least
> for now (->vfork_done is different, should be imho solved, because any user
> can block freezer forever).
> 
> The fix:
> 
> 	refrigerator:
> 
> 	+	// we are going to call do_exit() really soon,
> 	+	// we have a pending SIGKILL
> 	+	if (current->signal->flags & SIGNAL_GROUP_EXIT)
> 	+		return;
> 
> 		frozen_process(current);
> 		...
> 
> 
> 	zap_other_threads:
> 
> 		for_each_subthread() {
> 			...
> 
> 	+		// ---- SIGNAL_GROUP_EXIT is set ------
> 	+		// we can check sig->group_exit_task to detect de_thread,
> 	+		// but perhaps it doesn't hurt if the caller is do_group_exit
> 	+		cancel_freezing_and_thaw(p);
> 			sigaddset(&t->pending.signal, SIGKILL);
> 			signal_wake_up(t, 1);
> 		}
> 
> This way execer reliably kills all sub-threads and proceeds without blocking
> try_to_freeze_tasks(). The same change could be done for zap_process() to fix
> coredump.

Yes, at first sight it looks good.

BTW, what do you think of the updated patch I sent two messages ago?

Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-20  0:32                                               ` Rafael J. Wysocki
@ 2007-02-20  0:50                                                 ` Oleg Nesterov
  2007-02-20 18:28                                                   ` Rafael J. Wysocki
  2007-02-20 18:29                                                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-20  0:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On 02/20, Rafael J. Wysocki wrote:
>
> BTW, what do you think of the updated patch I sent two messages ago?

Ah, sorry, I just forgot... I think it is nice.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-17 21:59                 ` Oleg Nesterov
@ 2007-02-20 15:12                   ` Srivatsa Vaddagiri
  2007-02-20 20:09                     ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-20 15:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Sun, Feb 18, 2007 at 12:59:28AM +0300, Oleg Nesterov wrote:
> Before you begin. You are doing CPU_DOWN_PREPARE after freeze_processes().
> Not good. This makes impossible to do flush_workueue() at CPU_DOWN_PREPARE
> stage, we have callers.

We have few solutions to deal with this:

a. Mark such workqueues not freezable for hotplug
b. If above is not possible, don't call flush_workqueue in DOWN_PREPARE
c. If above is not possible, send DOWN_PREPARE before freeze_processes()

I would prefer a solution in the above order listed.

Which caller are you referring to here? Maybe we can decide on the
option after we see the users of flush_workqueue() in DOWN_PREPARE.

> I'm afraid it won't be so easy to solve all locking/racing problems. Will
> wait for the patch :)

I dont see problems for workqueue.c even if we follow option c. Do you
see any?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-20  0:50                                                 ` Oleg Nesterov
@ 2007-02-20 18:28                                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-20 18:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Tuesday, 20 February 2007 01:50, Oleg Nesterov wrote:
> On 02/20, Rafael J. Wysocki wrote:
> >
> > BTW, what do you think of the updated patch I sent two messages ago?
> 
> Ah, sorry, I just forgot... I think it is nice.

Thanks. :-)

I've started to collect the refrigerator-related patches posted recently
and I'm going to post them again soon as a series.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-20  0:32                                               ` Rafael J. Wysocki
  2007-02-20  0:50                                                 ` Oleg Nesterov
@ 2007-02-20 18:29                                                 ` Rafael J. Wysocki
  2007-02-21 18:14                                                   ` Paul E. McKenney
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-20 18:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ego, akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, Pavel Machek

On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > On 02/20, Rafael J. Wysocki wrote:
> > >
> > > On Monday, 19 February 2007 23:41, Oleg Nesterov wrote:
> > > > On 02/19, Rafael J. Wysocki wrote:
> > > > >
> > > > > On Monday, 19 February 2007 21:23, Oleg Nesterov wrote:
> > > > > 
> > > > > > > @@ -199,6 +189,10 @@ static void thaw_tasks(int thaw_user_spa
> > > > > > >
> > > > > > >         do_each_thread(g, p) {
> > > > > > > +               if (freezer_should_skip(p))
> > > > > > > +                       cancel_freezing(p);
> > > > > > > +       } while_each_thread(g, p);
> > > > > > > +       do_each_thread(g, p) {
> > > > > > >                 if (!freezeable(p))
> > > > > > >                         continue;
> > > > > > 
> > > > > > Any reason for 2 separate do_each_thread() loops ?
> > > > > 
> > > > > Yes.  If there is a "freeze" request pending for the vfork parent (TIF_FREEZE
> > > > > set), we have to cancel it before the child is unfrozen, since otherwise the
> > > > > parent may go freezing after we try to reset PF_FROZEN for it.
> > > > 
> > > > I see, thanks... thaw_process() doesn't take TIF_FREEZE into account.
> > > > 
> > > > But doesn't this mean we have a race?
> > > > 
> > > > Suppose that try_to_freeze_tasks() failed. It does cancel_freezing() for each
> > > > process before return, but what if some thread already checked TIF_FREEZE and
> > > > (for simplicity) it is preempted before frozen_process() in refrigerator().
> > > > 
> > > > thaw_tasks() runs, ignores this task (P), returns. P gets CPU, and becomes
> > > > frozen, but nobody will thaw it.
> > > > 
> > > > No?
> > > 
> > > Well, I think this is highly theoretical.  Namely, try_to_freeze_tasks() only
> > > fails after the timeout that's currently set to 20 sec., and it yields the CPU
> > > in each iteration of the main loop.  The task in question would have to refuse
> > > being frozen for 20 sec. and then suddenly decide to freeze itself right before
> > > try_to_freeze_tasks() checks the timeout for the very last time.  Then, it
> > > would have to get preempted at this very moment and stay unfrozen at least
> > > until thaw_tasks() starts running and in fact even longer.
> > 
> > Yes, yes, it is pure theroretical,
> > 
> > > I think we may avoid this by making try_to_freeze_tasks() sleep for some time
> > > after it has reset TIF_FREEZE for all tasks in the error path, if anyone is
> > > ever able to trigger it.
> > 
> > This makes this race  (pure theroretical) ** 2  :)
> > 
> > Still. May be it make sense to introduce cancel_freezing_and_thaw() function
> > (not right now) which stops the task from sleeping in refrigirator reliably.
> 
> Hm.  In the case discussed above we have a task that's right before calling
> frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> way to check this.
> 
> I think to close this race the refrigerator should check TIF_FREEZE and set
> PF_FROZEN _and_ reset TIF_FREEZE under a lock that would also have to be
> taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> ensure that all tasks either freeze themselves before the error path in
> try_to_freeze_tasks() is executed, or remain unfrozen.
> 
> I'll try to prepare a patch to illustrate this, but right now I'm too tired to
> do it. :-)

Something like this, perhaps:

---
 include/linux/freezer.h |   10 +++-------
 kernel/power/process.c  |   18 ++++++++++++++++--
 2 files changed, 19 insertions(+), 9 deletions(-)

Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -58,17 +58,13 @@ static inline void frozen_process(struct
 	clear_tsk_thread_flag(p, TIF_FREEZE);
 }
 
-extern void refrigerator(void);
+extern int refrigerator(void);
 extern int freeze_processes(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
 {
-	if (freezing(current)) {
-		refrigerator();
-		return 1;
-	} else
-		return 0;
+	return refrigerator();
 }
 
 /*
@@ -104,7 +100,7 @@ static inline void freeze(struct task_st
 static inline int thaw_process(struct task_struct *p) { return 1; }
 static inline void frozen_process(struct task_struct *p) { BUG(); }
 
-static inline void refrigerator(void) {}
+static inline int refrigerator(void) { return 0; }
 static inline int freeze_processes(void) { BUG(); return 0; }
 static inline void thaw_processes(void) {}
 
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -24,6 +24,8 @@
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
 
+spinlock_t refrigerator_lock;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -34,15 +36,23 @@ static inline int freezeable(struct task
 }
 
 /* Refrigerator is place where frozen processes are stored :-). */
-void refrigerator(void)
+int refrigerator(void)
 {
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	spin_lock(&refrigerator_lock);
+	if (freezing(current)) {
+		frozen_process(current);
+		spin_unlock(&refrigerator_lock);
+	} else {
+		spin_unlock(&refrigerator_lock);
+		return 0;
+	}
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
-	frozen_process(current);
 	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* We sent fake signal, clean it up */
 	spin_unlock_irq(&current->sighand->siglock);
@@ -53,6 +63,7 @@ void refrigerator(void)
 	}
 	pr_debug("%s left refrigerator\n", current->comm);
 	current->state = save;
+	return 1;
 }
 
 static inline void freeze_process(struct task_struct *p)
@@ -143,6 +154,7 @@ static unsigned int try_to_freeze_tasks(
 					"kernel threads",
 				TIMEOUT / HZ, todo);
 		read_lock(&tasklist_lock);
+		spin_lock(&refrigerator_lock);
 		do_each_thread(g, p) {
 			if (is_user_space(p) == !freeze_user_space)
 				continue;
@@ -152,6 +164,7 @@ static unsigned int try_to_freeze_tasks(
 
 			cancel_freezing(p);
 		} while_each_thread(g, p);
+		spin_unlock(&refrigerator_lock);
 		read_unlock(&tasklist_lock);
 	}
 
@@ -169,6 +182,7 @@ int freeze_processes(void)
 	unsigned int nr_unfrozen;
 
 	printk("Stopping tasks ... ");
+	spin_lock_init(&refrigerator_lock);
 	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE);
 	if (nr_unfrozen)
 		return nr_unfrozen;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-20 15:12                   ` Srivatsa Vaddagiri
@ 2007-02-20 20:09                     ` Oleg Nesterov
  2007-02-21  6:29                       ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-20 20:09 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/20, Srivatsa Vaddagiri wrote:
>
> On Sun, Feb 18, 2007 at 12:59:28AM +0300, Oleg Nesterov wrote:
> > Before you begin. You are doing CPU_DOWN_PREPARE after freeze_processes().
> > Not good. This makes impossible to do flush_workueue() at CPU_DOWN_PREPARE
> > stage, we have callers.
> 
> We have few solutions to deal with this:
> 
> a. Mark such workqueues not freezable for hotplug
> b. If above is not possible, don't call flush_workqueue in DOWN_PREPARE
> c. If above is not possible, send DOWN_PREPARE before freeze_processes()
> 
> I would prefer a solution in the above order listed.
> 
> Which caller are you referring to here? Maybe we can decide on the
> option after we see the users of flush_workqueue() in DOWN_PREPARE.

mm/slab.c:cpuup_callback()

> > I'm afraid it won't be so easy to solve all locking/racing problems. Will
> > wait for the patch :)
> 
> I dont see problems for workqueue.c even if we follow option c. Do you
> see any?

I don't right now... Oh, I already forgot everything :) I'll try to recall
when I see the patch.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-20 20:09                     ` Oleg Nesterov
@ 2007-02-21  6:29                       ` Srivatsa Vaddagiri
  2007-02-21 14:30                         ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-21  6:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Tue, Feb 20, 2007 at 11:09:36PM +0300, Oleg Nesterov wrote:
> > Which caller are you referring to here? Maybe we can decide on the
> > option after we see the users of flush_workqueue() in DOWN_PREPARE.
> 
> mm/slab.c:cpuup_callback()

The cancel_rearming_delayed_work, if used as it is in cpuup_callback,
will require that we send DOWN_PREPARE before freeze_processes().

But ..I am wondering if we can avoid doing cancel_rearming_delayed_work
(and thus flush_workqueue) in CPU_DOWN_PREPARE of slab.c. Basically,

mm/slab.c:

	CPU_DOWN_PREPARE:	/* All processes frozen now */
		cancel_delayed_work(&per_cpu(reap_work, cpu).timer);
		del_work(&per_cpu(reap_work, cpu).work);
		break;


At the point of CPU_DOWN_PREPARE, keventd should be frozen and hence
del_work() is a matter of just deleting the work from cwq->worklist.


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-21  6:29                       ` Srivatsa Vaddagiri
@ 2007-02-21 14:30                         ` Oleg Nesterov
  2007-02-21 14:37                           ` Gautham R Shenoy
  2007-02-21 15:53                           ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-21 14:30 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On 02/21, Srivatsa Vaddagiri wrote:
>
> On Tue, Feb 20, 2007 at 11:09:36PM +0300, Oleg Nesterov wrote:
> > > Which caller are you referring to here? Maybe we can decide on the
> > > option after we see the users of flush_workqueue() in DOWN_PREPARE.
> > 
> > mm/slab.c:cpuup_callback()
> 
> The cancel_rearming_delayed_work, if used as it is in cpuup_callback,
> will require that we send DOWN_PREPARE before freeze_processes().
> 
> But ..I am wondering if we can avoid doing cancel_rearming_delayed_work
> (and thus flush_workqueue) in CPU_DOWN_PREPARE of slab.c. Basically,
> 
> mm/slab.c:
> 
> 	CPU_DOWN_PREPARE:	/* All processes frozen now */
> 		cancel_delayed_work(&per_cpu(reap_work, cpu).timer);
> 		del_work(&per_cpu(reap_work, cpu).work);
> 		break;
> 
> 
> At the point of CPU_DOWN_PREPARE, keventd should be frozen and hence
> del_work() is a matter of just deleting the work from cwq->worklist.

Agreed. Note that we don't need the new "del_work". It is always safe to
use cancel_work_sync() if we know that the workqueue is frozen, it won't
block. We can also do

	if (!cancel_delayed_work())
		cancel_work_sync();

but it is ok to do cancel_work_sync() unconditionally.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-21 14:30                         ` Oleg Nesterov
@ 2007-02-21 14:37                           ` Gautham R Shenoy
  2007-02-21 15:53                           ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-21 14:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srivatsa Vaddagiri, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Wed, Feb 21, 2007 at 05:30:10PM +0300, Oleg Nesterov wrote:
> On 02/21, Srivatsa Vaddagiri wrote:
> >
> > On Tue, Feb 20, 2007 at 11:09:36PM +0300, Oleg Nesterov wrote:
> > > > Which caller are you referring to here? Maybe we can decide on the
> > > > option after we see the users of flush_workqueue() in DOWN_PREPARE.
> > > 
> > > mm/slab.c:cpuup_callback()
> > 
> > The cancel_rearming_delayed_work, if used as it is in cpuup_callback,
> > will require that we send DOWN_PREPARE before freeze_processes().
> > 
> > But ..I am wondering if we can avoid doing cancel_rearming_delayed_work
> > (and thus flush_workqueue) in CPU_DOWN_PREPARE of slab.c. Basically,
> > 
> > mm/slab.c:
> > 
> > 	CPU_DOWN_PREPARE:	/* All processes frozen now */
> > 		cancel_delayed_work(&per_cpu(reap_work, cpu).timer);
> > 		del_work(&per_cpu(reap_work, cpu).work);
> > 		break;
> > 
> > 
> > At the point of CPU_DOWN_PREPARE, keventd should be frozen and hence
> > del_work() is a matter of just deleting the work from cwq->worklist.
> 
> Agreed. Note that we don't need the new "del_work". It is always safe to
> use cancel_work_sync() if we know that the workqueue is frozen, it won't
> block. We can also do
> 
> 	if (!cancel_delayed_work())
> 		cancel_work_sync();
> 
> but it is ok to do cancel_work_sync() unconditionally.

True. But this might be a one off solution for slab. However, if someone
in the future might require to do a flush_workqueue from
CPU_DOWN_PREPARE context, we would need to find a new workaround.

So, I'll try running CPU_DOWN_PREPARE and CPU_UP_PREPARE from
a non frozen context to check if there are any potential problems.
Hopfully there shouldn't be (m)any!
> 
> Oleg.
> 
thanks and regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-17 11:24             ` Rafael J. Wysocki
  2007-02-17 21:34               ` Oleg Nesterov
  2007-02-18 12:56               ` Pavel Machek
@ 2007-02-21 14:52               ` Gautham R Shenoy
  2007-02-21 19:42                 ` Pavel Machek
  2 siblings, 1 reply; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-21 14:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: akpm, paulmck, mingo, vatsa, dipankar, venkatesh.pallipadi,
	linux-kernel, oleg, Pavel Machek

Rafael,
On Sat, Feb 17, 2007 at 12:24:45PM +0100, Rafael J. Wysocki wrote:
> 
> Pavel, do you think we can remove the PF_NOFREEZE from bluetooth, BTW?

The create_workqueue by default marks the worker_threads to be
non_freezable. For cpu hotplug, all workqueues can be frozen 
except the "kthread" workqueue (which is single threaded, so won't 
be frozen anyway).

And a quick cscope scan shows that only the "xfslogd" and "xfsdatad"
are the only freezable workqueues. Any particular reason
for not marking rest of the non-single_threaded workqueues freezeable ??

thanks and regards
gautham
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c
  2007-02-21 14:30                         ` Oleg Nesterov
  2007-02-21 14:37                           ` Gautham R Shenoy
@ 2007-02-21 15:53                           ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 92+ messages in thread
From: Srivatsa Vaddagiri @ 2007-02-21 15:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, mingo, dipankar,
	venkatesh.pallipadi, linux-kernel, rjw

On Wed, Feb 21, 2007 at 05:30:10PM +0300, Oleg Nesterov wrote:
> Agreed. Note that we don't need the new "del_work". It is always safe to
> use cancel_work_sync() if we know that the workqueue is frozen, it won't
> block. We can also do
> 
> 	if (!cancel_delayed_work())
> 		cancel_work_sync();
> 
> but it is ok to do cancel_work_sync() unconditionally.

Argh ..I should keep referring to recent sources. I didnt see
cancel_work_sync() in my sources (2.6.20-rc4) and hence invented that 
del_work()! Anyway thanx for pointing out.

This change will probably let us do CPU_DOWN_PREPARE after
freeze_processes(). However I will keep my fingers crossed on whether it
is really a good idea to send CPU_DOWN/UP_PREPARE after
freeze_processes() until we get more review/testing results.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 18:14                                                   ` Paul E. McKenney
@ 2007-02-21 18:13                                                     ` Rafael J. Wysocki
  2007-02-21 18:27                                                       ` Paul E. McKenney
  2007-02-21 20:03                                                       ` Oleg Nesterov
  0 siblings, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-21 18:13 UTC (permalink / raw)
  To: paulmck
  Cc: Oleg Nesterov, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > Hm.  In the case discussed above we have a task that's right before calling
> > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > way to check this.
> > > 
> > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > PF_FROZEN _and_ reset TIF_FREEZE under a lock that would also have to be
> > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > ensure that all tasks either freeze themselves before the error path in
> > > try_to_freeze_tasks() is executed, or remain unfrozen.
> > > 
> > > I'll try to prepare a patch to illustrate this, but right now I'm too tired to
> > > do it. :-)
> > 
> > Something like this, perhaps:
> > 
> > ---
> >  include/linux/freezer.h |   10 +++-------
> >  kernel/power/process.c  |   18 ++++++++++++++++--
> >  2 files changed, 19 insertions(+), 9 deletions(-)
> > 
> > Index: linux-2.6.20-mm2/include/linux/freezer.h
> > ===================================================================
> > --- linux-2.6.20-mm2.orig/include/linux/freezer.h
> > +++ linux-2.6.20-mm2/include/linux/freezer.h
> > @@ -58,17 +58,13 @@ static inline void frozen_process(struct
> >  	clear_tsk_thread_flag(p, TIF_FREEZE);
> >  }
> > 
> > -extern void refrigerator(void);
> > +extern int refrigerator(void);
> >  extern int freeze_processes(void);
> >  extern void thaw_processes(void);
> > 
> >  static inline int try_to_freeze(void)
> >  {
> > -	if (freezing(current)) {
> > -		refrigerator();
> > -		return 1;
> > -	} else
> > -		return 0;
> > +	return refrigerator();
> >  }
> > 
> >  /*
> > @@ -104,7 +100,7 @@ static inline void freeze(struct task_st
> >  static inline int thaw_process(struct task_struct *p) { return 1; }
> >  static inline void frozen_process(struct task_struct *p) { BUG(); }
> > 
> > -static inline void refrigerator(void) {}
> > +static inline int refrigerator(void) { return 0; }
> >  static inline int freeze_processes(void) { BUG(); return 0; }
> >  static inline void thaw_processes(void) {}
> > 
> > Index: linux-2.6.20-mm2/kernel/power/process.c
> > ===================================================================
> > --- linux-2.6.20-mm2.orig/kernel/power/process.c
> > +++ linux-2.6.20-mm2/kernel/power/process.c
> > @@ -24,6 +24,8 @@
> >  #define FREEZER_KERNEL_THREADS 0
> >  #define FREEZER_USER_SPACE 1
> > 
> > +spinlock_t refrigerator_lock;
> > +
> >  static inline int freezeable(struct task_struct * p)
> >  {
> >  	if ((p == current) ||
> > @@ -34,15 +36,23 @@ static inline int freezeable(struct task
> >  }
> > 
> >  /* Refrigerator is place where frozen processes are stored :-). */
> > -void refrigerator(void)
> > +int refrigerator(void)
> >  {
> >  	/* Hmm, should we be allowed to suspend when there are realtime
> >  	   processes around? */
> >  	long save;
> > +
> > +	spin_lock(&refrigerator_lock);
> 
> I hope we can do this without a global lock that is acquired on each
> try_to_freeze() call!

Yes.  Here's the current version (try_to_freeze() is unchanged, so the lock
is only taken by the tasks that are going to freeze, or so they think):

---
 kernel/power/process.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -24,6 +24,8 @@
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
 
+static spinlock_t refrigerator_lock;
+
 static inline int freezeable(struct task_struct * p)
 {
 	if ((p == current) ||
@@ -39,10 +41,18 @@ void refrigerator(void)
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	spin_lock(&refrigerator_lock);
+	if (freezing(current)) {
+		frozen_process(current);
+		spin_unlock(&refrigerator_lock);
+	} else {
+		spin_unlock(&refrigerator_lock);
+		return;
+	}
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
-	frozen_process(current);
 	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* We sent fake signal, clean it up */
 	spin_unlock_irq(&current->sighand->siglock);
@@ -143,6 +153,7 @@ static unsigned int try_to_freeze_tasks(
 					"kernel threads",
 				TIMEOUT / HZ, todo);
 		read_lock(&tasklist_lock);
+		spin_lock(&refrigerator_lock);
 		do_each_thread(g, p) {
 			if (is_user_space(p) == !freeze_user_space)
 				continue;
@@ -152,6 +163,7 @@ static unsigned int try_to_freeze_tasks(
 
 			cancel_freezing(p);
 		} while_each_thread(g, p);
+		spin_unlock(&refrigerator_lock);
 		read_unlock(&tasklist_lock);
 	}
 
@@ -169,6 +181,7 @@ int freeze_processes(void)
 	unsigned int nr_unfrozen;
 
 	printk("Stopping tasks ... ");
+	spin_lock_init(&refrigerator_lock);
 	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE);
 	if (nr_unfrozen)
 		return nr_unfrozen;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-20 18:29                                                 ` Rafael J. Wysocki
@ 2007-02-21 18:14                                                   ` Paul E. McKenney
  2007-02-21 18:13                                                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Paul E. McKenney @ 2007-02-21 18:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > Hm.  In the case discussed above we have a task that's right before calling
> > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > way to check this.
> > 
> > I think to close this race the refrigerator should check TIF_FREEZE and set
> > PF_FROZEN _and_ reset TIF_FREEZE under a lock that would also have to be
> > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > ensure that all tasks either freeze themselves before the error path in
> > try_to_freeze_tasks() is executed, or remain unfrozen.
> > 
> > I'll try to prepare a patch to illustrate this, but right now I'm too tired to
> > do it. :-)
> 
> Something like this, perhaps:
> 
> ---
>  include/linux/freezer.h |   10 +++-------
>  kernel/power/process.c  |   18 ++++++++++++++++--
>  2 files changed, 19 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6.20-mm2/include/linux/freezer.h
> ===================================================================
> --- linux-2.6.20-mm2.orig/include/linux/freezer.h
> +++ linux-2.6.20-mm2/include/linux/freezer.h
> @@ -58,17 +58,13 @@ static inline void frozen_process(struct
>  	clear_tsk_thread_flag(p, TIF_FREEZE);
>  }
> 
> -extern void refrigerator(void);
> +extern int refrigerator(void);
>  extern int freeze_processes(void);
>  extern void thaw_processes(void);
> 
>  static inline int try_to_freeze(void)
>  {
> -	if (freezing(current)) {
> -		refrigerator();
> -		return 1;
> -	} else
> -		return 0;
> +	return refrigerator();
>  }
> 
>  /*
> @@ -104,7 +100,7 @@ static inline void freeze(struct task_st
>  static inline int thaw_process(struct task_struct *p) { return 1; }
>  static inline void frozen_process(struct task_struct *p) { BUG(); }
> 
> -static inline void refrigerator(void) {}
> +static inline int refrigerator(void) { return 0; }
>  static inline int freeze_processes(void) { BUG(); return 0; }
>  static inline void thaw_processes(void) {}
> 
> Index: linux-2.6.20-mm2/kernel/power/process.c
> ===================================================================
> --- linux-2.6.20-mm2.orig/kernel/power/process.c
> +++ linux-2.6.20-mm2/kernel/power/process.c
> @@ -24,6 +24,8 @@
>  #define FREEZER_KERNEL_THREADS 0
>  #define FREEZER_USER_SPACE 1
> 
> +spinlock_t refrigerator_lock;
> +
>  static inline int freezeable(struct task_struct * p)
>  {
>  	if ((p == current) ||
> @@ -34,15 +36,23 @@ static inline int freezeable(struct task
>  }
> 
>  /* Refrigerator is place where frozen processes are stored :-). */
> -void refrigerator(void)
> +int refrigerator(void)
>  {
>  	/* Hmm, should we be allowed to suspend when there are realtime
>  	   processes around? */
>  	long save;
> +
> +	spin_lock(&refrigerator_lock);

I hope we can do this without a global lock that is acquired on each
try_to_freeze() call!

> +	if (freezing(current)) {

Would it be possible to acquire the lock here instead, then recheck here?
Or use a per-thread lock?  (Yes, this would make the error checking path
have to acquire a very large number of threads, but...

							Thanx, Paul

> +		frozen_process(current);
> +		spin_unlock(&refrigerator_lock);
> +	} else {
> +		spin_unlock(&refrigerator_lock);
> +		return 0;
> +	}
>  	save = current->state;
>  	pr_debug("%s entered refrigerator\n", current->comm);
> 
> -	frozen_process(current);
>  	spin_lock_irq(&current->sighand->siglock);
>  	recalc_sigpending(); /* We sent fake signal, clean it up */
>  	spin_unlock_irq(&current->sighand->siglock);
> @@ -53,6 +63,7 @@ void refrigerator(void)
>  	}
>  	pr_debug("%s left refrigerator\n", current->comm);
>  	current->state = save;
> +	return 1;
>  }
> 
>  static inline void freeze_process(struct task_struct *p)
> @@ -143,6 +154,7 @@ static unsigned int try_to_freeze_tasks(
>  					"kernel threads",
>  				TIMEOUT / HZ, todo);
>  		read_lock(&tasklist_lock);
> +		spin_lock(&refrigerator_lock);
>  		do_each_thread(g, p) {
>  			if (is_user_space(p) == !freeze_user_space)
>  				continue;
> @@ -152,6 +164,7 @@ static unsigned int try_to_freeze_tasks(
> 
>  			cancel_freezing(p);
>  		} while_each_thread(g, p);
> +		spin_unlock(&refrigerator_lock);
>  		read_unlock(&tasklist_lock);
>  	}
> 
> @@ -169,6 +182,7 @@ int freeze_processes(void)
>  	unsigned int nr_unfrozen;
> 
>  	printk("Stopping tasks ... ");
> +	spin_lock_init(&refrigerator_lock);
>  	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE);
>  	if (nr_unfrozen)
>  		return nr_unfrozen;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 18:13                                                     ` Rafael J. Wysocki
@ 2007-02-21 18:27                                                       ` Paul E. McKenney
  2007-02-21 20:03                                                       ` Oleg Nesterov
  1 sibling, 0 replies; 92+ messages in thread
From: Paul E. McKenney @ 2007-02-21 18:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Wed, Feb 21, 2007 at 07:13:40PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> > On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > > Hm.  In the case discussed above we have a task that's right before calling
> > > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > > way to check this.
> > > > 
> > > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > > PF_FROZEN _and_ reset TIF_FREEZE under a lock that would also have to be
> > > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > > ensure that all tasks either freeze themselves before the error path in
> > > > try_to_freeze_tasks() is executed, or remain unfrozen.
> > > > 
> > > > I'll try to prepare a patch to illustrate this, but right now I'm too tired to
> > > > do it. :-)
> > > 
> > > Something like this, perhaps:
> > > 
> > > ---
> > >  include/linux/freezer.h |   10 +++-------
> > >  kernel/power/process.c  |   18 ++++++++++++++++--
> > >  2 files changed, 19 insertions(+), 9 deletions(-)
> > > 
> > > Index: linux-2.6.20-mm2/include/linux/freezer.h
> > > ===================================================================
> > > --- linux-2.6.20-mm2.orig/include/linux/freezer.h
> > > +++ linux-2.6.20-mm2/include/linux/freezer.h
> > > @@ -58,17 +58,13 @@ static inline void frozen_process(struct
> > >  	clear_tsk_thread_flag(p, TIF_FREEZE);
> > >  }
> > > 
> > > -extern void refrigerator(void);
> > > +extern int refrigerator(void);
> > >  extern int freeze_processes(void);
> > >  extern void thaw_processes(void);
> > > 
> > >  static inline int try_to_freeze(void)
> > >  {
> > > -	if (freezing(current)) {
> > > -		refrigerator();
> > > -		return 1;
> > > -	} else
> > > -		return 0;
> > > +	return refrigerator();
> > >  }
> > > 
> > >  /*
> > > @@ -104,7 +100,7 @@ static inline void freeze(struct task_st
> > >  static inline int thaw_process(struct task_struct *p) { return 1; }
> > >  static inline void frozen_process(struct task_struct *p) { BUG(); }
> > > 
> > > -static inline void refrigerator(void) {}
> > > +static inline int refrigerator(void) { return 0; }
> > >  static inline int freeze_processes(void) { BUG(); return 0; }
> > >  static inline void thaw_processes(void) {}
> > > 
> > > Index: linux-2.6.20-mm2/kernel/power/process.c
> > > ===================================================================
> > > --- linux-2.6.20-mm2.orig/kernel/power/process.c
> > > +++ linux-2.6.20-mm2/kernel/power/process.c
> > > @@ -24,6 +24,8 @@
> > >  #define FREEZER_KERNEL_THREADS 0
> > >  #define FREEZER_USER_SPACE 1
> > > 
> > > +spinlock_t refrigerator_lock;
> > > +
> > >  static inline int freezeable(struct task_struct * p)
> > >  {
> > >  	if ((p == current) ||
> > > @@ -34,15 +36,23 @@ static inline int freezeable(struct task
> > >  }
> > > 
> > >  /* Refrigerator is place where frozen processes are stored :-). */
> > > -void refrigerator(void)
> > > +int refrigerator(void)
> > >  {
> > >  	/* Hmm, should we be allowed to suspend when there are realtime
> > >  	   processes around? */
> > >  	long save;
> > > +
> > > +	spin_lock(&refrigerator_lock);
> > 
> > I hope we can do this without a global lock that is acquired on each
> > try_to_freeze() call!
> 
> Yes.  Here's the current version (try_to_freeze() is unchanged, so the lock
> is only taken by the tasks that are going to freeze, or so they think):

Ah, OK, that should work much better from a lock-contention viewpoint!

							Thanx, Paul

> ---
>  kernel/power/process.c |   15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.20-mm2/kernel/power/process.c
> ===================================================================
> --- linux-2.6.20-mm2.orig/kernel/power/process.c
> +++ linux-2.6.20-mm2/kernel/power/process.c
> @@ -24,6 +24,8 @@
>  #define FREEZER_KERNEL_THREADS 0
>  #define FREEZER_USER_SPACE 1
> 
> +static spinlock_t refrigerator_lock;
> +
>  static inline int freezeable(struct task_struct * p)
>  {
>  	if ((p == current) ||
> @@ -39,10 +41,18 @@ void refrigerator(void)
>  	/* Hmm, should we be allowed to suspend when there are realtime
>  	   processes around? */
>  	long save;
> +
> +	spin_lock(&refrigerator_lock);
> +	if (freezing(current)) {
> +		frozen_process(current);
> +		spin_unlock(&refrigerator_lock);
> +	} else {
> +		spin_unlock(&refrigerator_lock);
> +		return;
> +	}
>  	save = current->state;
>  	pr_debug("%s entered refrigerator\n", current->comm);
> 
> -	frozen_process(current);
>  	spin_lock_irq(&current->sighand->siglock);
>  	recalc_sigpending(); /* We sent fake signal, clean it up */
>  	spin_unlock_irq(&current->sighand->siglock);
> @@ -143,6 +153,7 @@ static unsigned int try_to_freeze_tasks(
>  					"kernel threads",
>  				TIMEOUT / HZ, todo);
>  		read_lock(&tasklist_lock);
> +		spin_lock(&refrigerator_lock);
>  		do_each_thread(g, p) {
>  			if (is_user_space(p) == !freeze_user_space)
>  				continue;
> @@ -152,6 +163,7 @@ static unsigned int try_to_freeze_tasks(
> 
>  			cancel_freezing(p);
>  		} while_each_thread(g, p);
> +		spin_unlock(&refrigerator_lock);
>  		read_unlock(&tasklist_lock);
>  	}
> 
> @@ -169,6 +181,7 @@ int freeze_processes(void)
>  	unsigned int nr_unfrozen;
> 
>  	printk("Stopping tasks ... ");
> +	spin_lock_init(&refrigerator_lock);
>  	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE);
>  	if (nr_unfrozen)
>  		return nr_unfrozen;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug
  2007-02-21 14:52               ` Gautham R Shenoy
@ 2007-02-21 19:42                 ` Pavel Machek
  0 siblings, 0 replies; 92+ messages in thread
From: Pavel Machek @ 2007-02-21 19:42 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Rafael J. Wysocki, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, oleg

Hi!

> Rafael,
> On Sat, Feb 17, 2007 at 12:24:45PM +0100, Rafael J. Wysocki wrote:
> > 
> > Pavel, do you think we can remove the PF_NOFREEZE from bluetooth, BTW?
> 
> The create_workqueue by default marks the worker_threads to be
> non_freezable. For cpu hotplug, all workqueues can be frozen 
> except the "kthread" workqueue (which is single threaded, so won't 
> be frozen anyway).
> 
> And a quick cscope scan shows that only the "xfslogd" and "xfsdatad"
> are the only freezable workqueues. Any particular reason
> for not marking rest of the non-single_threaded workqueues freezeable ??

As I said, go ahead.

bluetooth has absolutely no business running while we are writing
suspend image to disk.

(First person suggesting
suspend-to-file-on-nfs-filesystem-mounted-over-GPRS-line-connected-over-bluetooth
will be punished by only getting bread and water till he implements
it).
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 18:13                                                     ` Rafael J. Wysocki
  2007-02-21 18:27                                                       ` Paul E. McKenney
@ 2007-02-21 20:03                                                       ` Oleg Nesterov
  2007-02-21 20:47                                                         ` Rafael J. Wysocki
  2007-02-21 21:06                                                         ` Paul E. McKenney
  1 sibling, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-21 20:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On 02/21, Rafael J. Wysocki wrote:
>
> On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> > On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > > Hm.  In the case discussed above we have a task that's right before calling
> > > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > > way to check this.
> > > > 
> > > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > > PF_FROZEN _and_ reset TIF_FREEZE under a lock

I personally think this is good. Not only this allows us to close the race,
I think we can do more.

>                                                      that would also have to be
> > > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > > ensure that all tasks either freeze themselves before the error path in
> > > > try_to_freeze_tasks() is executed, or remain unfrozen.

How about take this lock in thaw_tasks() instead/too ?

Currently we need a separate loop in thaw_tasks() to handle PF_FREEZER_SKIP. This
means that PF_FREEZER_SKIP is not so generic: thaw_tasks() can't tolerate if such
a task was woken in between. What if we change thaw_process() to clear TIF_FREEZE ?

Note also that we can use task_lock() instead of global refrigerator_lock. This
means that thaw_process() should take it too, probably this is slowdown, but I
think not too much because thaw_process() is going to write to p->flags anyway.
In this case thaw_process() works perfectly as cancel_freezing_and_thaw() and
can be used to fix exec/coredump in future.

Thoughts?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 20:03                                                       ` Oleg Nesterov
@ 2007-02-21 20:47                                                         ` Rafael J. Wysocki
  2007-02-21 21:06                                                         ` Paul E. McKenney
  1 sibling, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-21 20:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Wednesday, 21 February 2007 21:03, Oleg Nesterov wrote:
> On 02/21, Rafael J. Wysocki wrote:
> >
> > On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> > > On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > > > Hm.  In the case discussed above we have a task that's right before calling
> > > > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > > > way to check this.
> > > > > 
> > > > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > > > PF_FROZEN _and_ reset TIF_FREEZE under a lock
> 
> I personally think this is good. Not only this allows us to close the race,
> I think we can do more.
> 
> >                                                      that would also have to be
> > > > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > > > ensure that all tasks either freeze themselves before the error path in
> > > > > try_to_freeze_tasks() is executed, or remain unfrozen.
> 
> How about take this lock in thaw_tasks() instead/too ?

Good point.  If we take it in thaw_tasks(), then the tasks that have
TIF_FREEZE set, but haven't entered the refrigerator yet, won't be able to
enter the refrigerator until thaw_tasks() releases the lock ...
> 
> Currently we need a separate loop in thaw_tasks() to handle PF_FREEZER_SKIP. This
> means that PF_FREEZER_SKIP is not so generic: thaw_tasks() can't tolerate if such
> a task was woken in between. What if we change thaw_process() to clear TIF_FREEZE ?

... and then we can drop the PF_FREEZER_SKIP-handling loop and change
thaw_process() to clear TIF_FREEZE.

> Note also that we can use task_lock() instead of global refrigerator_lock. This
> means that thaw_process() should take it too, probably this is slowdown, but I
> think not too much because thaw_process() is going to write to p->flags anyway.
> In this case thaw_process() works perfectly as cancel_freezing_and_thaw() and
> can be used to fix exec/coredump in future.

Hm, I think we can try doing this too.

I'll try to prepare a patch later today.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 20:03                                                       ` Oleg Nesterov
  2007-02-21 20:47                                                         ` Rafael J. Wysocki
@ 2007-02-21 21:06                                                         ` Paul E. McKenney
  2007-02-21 23:10                                                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Paul E. McKenney @ 2007-02-21 21:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rafael J. Wysocki, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Wed, Feb 21, 2007 at 11:03:14PM +0300, Oleg Nesterov wrote:
> On 02/21, Rafael J. Wysocki wrote:
> >
> > On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> > > On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > > > Hm.  In the case discussed above we have a task that's right before calling
> > > > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > > > way to check this.
> > > > > 
> > > > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > > > PF_FROZEN _and_ reset TIF_FREEZE under a lock
> 
> I personally think this is good. Not only this allows us to close the race,
> I think we can do more.
> 
> >                                                      that would also have to be
> > > > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > > > ensure that all tasks either freeze themselves before the error path in
> > > > > try_to_freeze_tasks() is executed, or remain unfrozen.
> 
> How about take this lock in thaw_tasks() instead/too ?
> 
> Currently we need a separate loop in thaw_tasks() to handle PF_FREEZER_SKIP. This
> means that PF_FREEZER_SKIP is not so generic: thaw_tasks() can't tolerate if such
> a task was woken in between. What if we change thaw_process() to clear TIF_FREEZE ?
> 
> Note also that we can use task_lock() instead of global refrigerator_lock. This
> means that thaw_process() should take it too, probably this is slowdown, but I
> think not too much because thaw_process() is going to write to p->flags anyway.
> In this case thaw_process() works perfectly as cancel_freezing_and_thaw() and
> can be used to fix exec/coredump in future.

This sounds much better than a a global lock to me!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 21:06                                                         ` Paul E. McKenney
@ 2007-02-21 23:10                                                           ` Rafael J. Wysocki
  2007-02-22 10:47                                                             ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-21 23:10 UTC (permalink / raw)
  To: paulmck
  Cc: Oleg Nesterov, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Wednesday, 21 February 2007 22:06, Paul E. McKenney wrote:
> On Wed, Feb 21, 2007 at 11:03:14PM +0300, Oleg Nesterov wrote:
> > On 02/21, Rafael J. Wysocki wrote:
> > >
> > > On Wednesday, 21 February 2007 19:14, Paul E. McKenney wrote:
> > > > On Tue, Feb 20, 2007 at 07:29:01PM +0100, Rafael J. Wysocki wrote:
> > > > > On Tuesday, 20 February 2007 01:32, Rafael J. Wysocki wrote:
> > > > > > On Tuesday, 20 February 2007 01:12, Oleg Nesterov wrote:
> > > > > > Hm.  In the case discussed above we have a task that's right before calling
> > > > > > frozen_process(), so we can't thaw it, because it's not frozen.  It will be
> > > > > > frozen just in a while, but try_to_freeze_tasks() and thaw_tasks() have no
> > > > > > way to check this.
> > > > > > 
> > > > > > I think to close this race the refrigerator should check TIF_FREEZE and set
> > > > > > PF_FROZEN _and_ reset TIF_FREEZE under a lock
> > 
> > I personally think this is good. Not only this allows us to close the race,
> > I think we can do more.
> > 
> > >                                                      that would also have to be
> > > > > > taken by try_to_freeze_tasks() in the beginning of the error path.  This will
> > > > > > ensure that all tasks either freeze themselves before the error path in
> > > > > > try_to_freeze_tasks() is executed, or remain unfrozen.
> > 
> > How about take this lock in thaw_tasks() instead/too ?
> > 
> > Currently we need a separate loop in thaw_tasks() to handle PF_FREEZER_SKIP. This
> > means that PF_FREEZER_SKIP is not so generic: thaw_tasks() can't tolerate if such
> > a task was woken in between. What if we change thaw_process() to clear TIF_FREEZE ?
> > 
> > Note also that we can use task_lock() instead of global refrigerator_lock. This
> > means that thaw_process() should take it too, probably this is slowdown, but I
> > think not too much because thaw_process() is going to write to p->flags anyway.
> > In this case thaw_process() works perfectly as cancel_freezing_and_thaw() and
> > can be used to fix exec/coredump in future.
> 
> This sounds much better than a a global lock to me!  ;-)

Okay, below is what I have right now (compilation tested on x86_64):

This patch fixes the vfork problem by adding the PF_FREEZER_SKIP flag that
can be used by tasks to tell the freezer not to count them as freezeable and
making the vfork parents set this flag before they call wait_for_completion().

Secondly, it fixes the race which happens it a task with TIF_FREEZE set is
preempted right before calling frozen_process() in refrigerator() and stays
unforzen until after thaw_tasks() runs and checks its status.  For this purpose
task_lock() is used.

 include/linux/freezer.h |   34 ++++++++++++++++++++++++++++++++--
 include/linux/sched.h   |    1 +
 kernel/fork.c           |    3 +++
 kernel/power/process.c  |   36 +++++++++++++++++++-----------------
 4 files changed, 55 insertions(+), 19 deletions(-)

Index: linux-2.6.20-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc
 #define PF_SPREAD_SLAB	0x02000000	/* Spread some slab caches over cpuset */
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
+#define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -40,11 +40,15 @@ static inline void do_not_freeze(struct 
  */
 static inline int thaw_process(struct task_struct *p)
 {
+	task_lock(p);
 	if (frozen(p)) {
 		p->flags &= ~PF_FROZEN;
+		task_unlock(p);
 		wake_up_process(p);
 		return 1;
 	}
+	clear_tsk_thread_flag(p, TIF_FREEZE);
+	task_unlock(p);
 	return 0;
 }
 
@@ -71,7 +75,31 @@ static inline int try_to_freeze(void)
 		return 0;
 }
 
-extern void thaw_some_processes(int all);
+/*
+ * Tell the freezer not to count current task as freezeable
+ */
+static inline void freezer_do_not_count(void)
+{
+	current->flags |= PF_FREEZER_SKIP;
+}
+
+/*
+ * Try to freeze the current task and tell the freezer to count it as freezeable
+ * again
+ */
+static inline void freezer_count(void)
+{
+	current->flags &= ~PF_FREEZER_SKIP;
+	try_to_freeze();
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+	return !!(p->flags & PF_FREEZER_SKIP);
+}
 
 #else
 static inline int frozen(struct task_struct *p) { return 0; }
@@ -86,5 +114,7 @@ static inline void thaw_processes(void) 
 
 static inline int try_to_freeze(void) { return 0; }
 
-
+static inline void freezer_do_not_count(void) {}
+static inline void freezer_count(void) {}
+static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 #endif
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -50,6 +50,7 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/ptrace.h>
+#include <linux/freezer.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
 		tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
 			wait_for_completion(&vfork);
+			freezer_count();
 			tracehook_report_vfork_done(p, nr);
 		}
 	} else {
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -39,10 +39,18 @@ void refrigerator(void)
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	task_lock(current);
+	if (freezing(current)) {
+		frozen_process(current);
+		task_unlock(current);
+	} else {
+		task_unlock(current);
+		return;
+	}
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
-	frozen_process(current);
 	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* We sent fake signal, clean it up */
 	spin_unlock_irq(&current->sighand->siglock);
@@ -79,12 +87,16 @@ static void cancel_freezing(struct task_
 {
 	unsigned long flags;
 
+	task_lock(p);
 	if (freezing(p)) {
 		pr_debug("  clean up: %s\n", p->comm);
 		do_not_freeze(p);
+		task_unlock(p);
 		spin_lock_irqsave(&p->sighand->siglock, flags);
 		recalc_sigpending_tsk(p);
 		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+	} else {
+		task_unlock(p);
 	}
 }
 
@@ -119,22 +131,12 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
-			todo++;
+			freeze_process(p);
+			if (!freezer_should_skip(p))
+				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */
@@ -207,7 +209,7 @@ static void thaw_tasks(int thaw_user_spa
 		if (is_user_space(p) == !thaw_user_space)
 			continue;
 
-		if (!thaw_process(p))
+		if (!thaw_process(p) && !freezer_should_skip(p))
 			printk(KERN_WARNING " Strange, %s not stopped\n",
 				p->comm );
 	} while_each_thread(g, p);



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-21 23:10                                                           ` Rafael J. Wysocki
@ 2007-02-22 10:47                                                             ` Oleg Nesterov
  2007-02-22 11:33                                                               ` Oleg Nesterov
  2007-02-22 17:03                                                               ` Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-22 10:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On 02/22, Rafael J. Wysocki wrote:
>
> Okay, below is what I have right now (compilation tested on x86_64):
> 
> This patch fixes the vfork problem by adding the PF_FREEZER_SKIP flag that
> can be used by tasks to tell the freezer not to count them as freezeable and
> making the vfork parents set this flag before they call wait_for_completion().
> 
> Secondly, it fixes the race which happens it a task with TIF_FREEZE set is
> preempted right before calling frozen_process() in refrigerator() and stays
> unforzen until after thaw_tasks() runs and checks its status.  For this purpose
> task_lock() is used.

Great! But please be kind to those of us who read the source control history
trying to understand the code. Could you make 2 separate patches?

> @@ -207,7 +209,7 @@ static void thaw_tasks(int thaw_user_spa
>  		if (is_user_space(p) == !thaw_user_space)
>  			continue;
>  
> -		if (!thaw_process(p))
> +		if (!thaw_process(p) && !freezer_should_skip(p))
>  			printk(KERN_WARNING " Strange, %s not stopped\n",

This is racy, the warning could be false. We wake up the task, testing
its ->flags is not reliable.

Damn. PF_FREEZER_SKIP task could be woken before, clear PF_FREEZER_SKIP,
but not frozen.

We can change freezer_count() to clear PF_FREEZER_SKIP after try_to_freeze(),
not before. Now thaw_process() can take PF_FREEZER_SKIP into account and
return "true".

But this means the task may be PF_FREEZER_SKIP | PF_FROZEN. What if we we
call try_to_freeze_tasks() soon after thaw_tasks()? We may hit the task which
leaves the refrigerator, but didn't clear PF_FREEZER_SKIP yet. This means
that thaw_process() should clear PF_FREEZER_SKIP as well. This is messy :(

Any other ideas? In any case we should imho avoid a separate loop for
PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
anyway.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 10:47                                                             ` Oleg Nesterov
@ 2007-02-22 11:33                                                               ` Oleg Nesterov
  2007-02-22 17:03                                                               ` Rafael J. Wysocki
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-22 11:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On 02/22, Oleg Nesterov wrote:
>
> On 02/22, Rafael J. Wysocki wrote:
> >
> > @@ -207,7 +209,7 @@ static void thaw_tasks(int thaw_user_spa
> >  		if (is_user_space(p) == !thaw_user_space)
> >  			continue;
> >  
> > -		if (!thaw_process(p))
> > +		if (!thaw_process(p) && !freezer_should_skip(p))
> >  			printk(KERN_WARNING " Strange, %s not stopped\n",
> 
> This is racy, the warning could be false. We wake up the task, testing
> its ->flags is not reliable.
> 
> Damn. PF_FREEZER_SKIP task could be woken before, clear PF_FREEZER_SKIP,
> but not frozen.
> 
> We can change freezer_count() to clear PF_FREEZER_SKIP after try_to_freeze(),
> not before. Now thaw_process() can take PF_FREEZER_SKIP into account and
> return "true".
> 
> But this means the task may be PF_FREEZER_SKIP | PF_FROZEN. What if we we
> call try_to_freeze_tasks() soon after thaw_tasks()? We may hit the task which
> leaves the refrigerator, but didn't clear PF_FREEZER_SKIP yet. This means
> that thaw_process() should clear PF_FREEZER_SKIP as well. This is messy :(
> 
> Any other ideas? In any case we should imho avoid a separate loop for
> PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
> anyway.

Probably: current clears PF_FREEZER_SKIP along with TIF_FREEZE "atomically"
under task_lock in refrigerator(). thaw_process() takes PF_FREEZER_SKIP into
account.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 10:47                                                             ` Oleg Nesterov
  2007-02-22 11:33                                                               ` Oleg Nesterov
@ 2007-02-22 17:03                                                               ` Rafael J. Wysocki
  2007-02-22 17:44                                                                 ` Oleg Nesterov
  2007-02-23  3:02                                                                 ` Gautham R Shenoy
  1 sibling, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-22 17:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

On Thursday, 22 February 2007 11:47, Oleg Nesterov wrote:
> On 02/22, Rafael J. Wysocki wrote:
> >
> > Okay, below is what I have right now (compilation tested on x86_64):
> > 
> > This patch fixes the vfork problem by adding the PF_FREEZER_SKIP flag that
> > can be used by tasks to tell the freezer not to count them as freezeable and
> > making the vfork parents set this flag before they call wait_for_completion().
> > 
> > Secondly, it fixes the race which happens it a task with TIF_FREEZE set is
> > preempted right before calling frozen_process() in refrigerator() and stays
> > unforzen until after thaw_tasks() runs and checks its status.  For this purpose
> > task_lock() is used.
> 
> Great! But please be kind to those of us who read the source control history
> trying to understand the code. Could you make 2 separate patches?

Okay, attached.  The first one closes the race between thaw_tasks() and the
refrigerator that can occurs if the freezing fails.  The second one fixes the
vfork problem (should go on top of the first one).

> > @@ -207,7 +209,7 @@ static void thaw_tasks(int thaw_user_spa
> >  		if (is_user_space(p) == !thaw_user_space)
> >  			continue;
> >  
> > -		if (!thaw_process(p))
> > +		if (!thaw_process(p) && !freezer_should_skip(p))
> >  			printk(KERN_WARNING " Strange, %s not stopped\n",
> 
> This is racy, the warning could be false. We wake up the task, testing
> its ->flags is not reliable.
> 
> Damn. PF_FREEZER_SKIP task could be woken before, clear PF_FREEZER_SKIP,
> but not frozen.
> 
> We can change freezer_count() to clear PF_FREEZER_SKIP after try_to_freeze(),
> not before. Now thaw_process() can take PF_FREEZER_SKIP into account and
> return "true".
> 
> But this means the task may be PF_FREEZER_SKIP | PF_FROZEN. What if we we
> call try_to_freeze_tasks() soon after thaw_tasks()? We may hit the task which
> leaves the refrigerator, but didn't clear PF_FREEZER_SKIP yet. This means
> that thaw_process() should clear PF_FREEZER_SKIP as well. This is messy :(
> 
> Any other ideas? In any case we should imho avoid a separate loop for
> PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
> anyway.

Why don't we just drop the warning?  try_to_freeze_tasks() should give us a
warning if there's anything wrong anyway.

Greetings,
Rafael

[-- Attachment #2: freezer-fix-theoretical-race.patch --]
[-- Type: text/x-diff, Size: 1825 bytes --]

 include/linux/freezer.h |    4 ++++
 kernel/power/process.c  |   14 +++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -40,11 +40,15 @@ static inline void do_not_freeze(struct 
  */
 static inline int thaw_process(struct task_struct *p)
 {
+	task_lock(p);
 	if (frozen(p)) {
 		p->flags &= ~PF_FROZEN;
+		task_unlock(p);
 		wake_up_process(p);
 		return 1;
 	}
+	clear_tsk_thread_flag(p, TIF_FREEZE);
+	task_unlock(p);
 	return 0;
 }
 
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -39,10 +39,18 @@ void refrigerator(void)
 	/* Hmm, should we be allowed to suspend when there are realtime
 	   processes around? */
 	long save;
+
+	task_lock(current);
+	if (freezing(current)) {
+		frozen_process(current);
+		task_unlock(current);
+	} else {
+		task_unlock(current);
+		return;
+	}
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
-	frozen_process(current);
 	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* We sent fake signal, clean it up */
 	spin_unlock_irq(&current->sighand->siglock);
@@ -79,12 +87,16 @@ static void cancel_freezing(struct task_
 {
 	unsigned long flags;
 
+	task_lock(p);
 	if (freezing(p)) {
 		pr_debug("  clean up: %s\n", p->comm);
 		do_not_freeze(p);
+		task_unlock(p);
 		spin_lock_irqsave(&p->sighand->siglock, flags);
 		recalc_sigpending_tsk(p);
 		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+	} else {
+		task_unlock(p);
 	}
 }
 

[-- Attachment #3: freezer-fix-vfork-problem.patch --]
[-- Type: text/x-diff, Size: 4348 bytes --]

 include/linux/freezer.h |   30 ++++++++++++++++++++++++++++--
 include/linux/sched.h   |    1 +
 kernel/fork.c           |    3 +++
 kernel/power/process.c  |   29 +++++++++--------------------
 4 files changed, 41 insertions(+), 22 deletions(-)

Index: linux-2.6.20-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc
 #define PF_SPREAD_SLAB	0x02000000	/* Spread some slab caches over cpuset */
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
+#define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -75,7 +75,31 @@ static inline int try_to_freeze(void)
 		return 0;
 }
 
-extern void thaw_some_processes(int all);
+/*
+ * Tell the freezer not to count current task as freezeable
+ */
+static inline void freezer_do_not_count(void)
+{
+	current->flags |= PF_FREEZER_SKIP;
+}
+
+/*
+ * Try to freeze the current task and tell the freezer to count it as freezeable
+ * again
+ */
+static inline void freezer_count(void)
+{
+	current->flags &= ~PF_FREEZER_SKIP;
+	try_to_freeze();
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+	return !!(p->flags & PF_FREEZER_SKIP);
+}
 
 #else
 static inline int frozen(struct task_struct *p) { return 0; }
@@ -90,5 +114,7 @@ static inline void thaw_processes(void) 
 
 static inline int try_to_freeze(void) { return 0; }
 
-
+static inline void freezer_do_not_count(void) {}
+static inline void freezer_count(void) {}
+static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 #endif
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -50,6 +50,7 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/ptrace.h>
+#include <linux/freezer.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
 		tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
 			wait_for_completion(&vfork);
+			freezer_count();
 			tracehook_report_vfork_done(p, nr);
 		}
 	} else {
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -131,22 +131,12 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
-			todo++;
+			freeze_process(p);
+			if (!freezer_should_skip(p))
+				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */
@@ -171,8 +161,9 @@ static unsigned int try_to_freeze_tasks(
 			if (is_user_space(p) == !freeze_user_space)
 				continue;
 
-			if (freezeable(p) && !frozen(p))
-				printk(KERN_ERR " %s\n", p->comm);
+			if (freezeable(p) && !frozen(p) &&
+			    !freezer_should_skip(p))
+				printk(KERN_WARNING " %s\n", p->comm);
 
 			cancel_freezing(p);
 		} while_each_thread(g, p);
@@ -219,9 +210,7 @@ static void thaw_tasks(int thaw_user_spa
 		if (is_user_space(p) == !thaw_user_space)
 			continue;
 
-		if (!thaw_process(p))
-			printk(KERN_WARNING " Strange, %s not stopped\n",
-				p->comm );
+		thaw_process(p);
 	} while_each_thread(g, p);
 	read_unlock(&tasklist_lock);
 }

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 17:03                                                               ` Rafael J. Wysocki
@ 2007-02-22 17:44                                                                 ` Oleg Nesterov
  2007-02-22 21:56                                                                   ` Rafael J. Wysocki
  2007-02-23  3:02                                                                 ` Gautham R Shenoy
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-22 17:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On 02/22, Rafael J. Wysocki wrote:
> 
> Okay, attached.  The first one closes the race between thaw_tasks() and the
> refrigerator that can occurs if the freezing fails.  The second one fixes the
> vfork problem (should go on top of the first one).

Looks good to me.

> > Any other ideas? In any case we should imho avoid a separate loop for
> > PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
> > anyway.
> 
> Why don't we just drop the warning?  try_to_freeze_tasks() should give us a
> warning if there's anything wrong anyway.

Indeed :)

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 17:44                                                                 ` Oleg Nesterov
@ 2007-02-22 21:56                                                                   ` Rafael J. Wysocki
  2007-02-23 18:15                                                                     ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2007-02-22 21:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Thursday, 22 February 2007 18:44, Oleg Nesterov wrote:
> On 02/22, Rafael J. Wysocki wrote:
> > 
> > Okay, attached.  The first one closes the race between thaw_tasks() and the
> > refrigerator that can occurs if the freezing fails.  The second one fixes the
> > vfork problem (should go on top of the first one).
> 
> Looks good to me.
> 
> > > Any other ideas? In any case we should imho avoid a separate loop for
> > > PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
> > > anyway.
> > 
> > Why don't we just drop the warning?  try_to_freeze_tasks() should give us a
> > warning if there's anything wrong anyway.
> 
> Indeed :)

Still, there is a tiny race in the error path of try_to_freeze_tasks(), where
a vfork parent process can be preempted after clearing PF_FREEZER_SKIP and
before entering refrigerator(), so try_to_freeze_tasks() will mistakenly report
that this process has caused a problem.

I think this race can be closed by (1) clearing PF_FREEZER_SKIP after calling
try_to_freeze() in freezer_count(), (2) clearing PF_FREEZER_SKIP in
refrigerator() before calling frozen_process() and (3) taking task_lock()
around the warning check in the error path of try_to_freeze_tasks().

I have modified freezer-fix-vfork-problem.patch to implement this (appended;
it assumes that freezer-fix-theoretical-race.patch has already been applied).

If this is the right thing to do, I think there's a reason to additionally move
task_lock/unlock() from cancel_freezing() to the error path in
try_to_freeze_tasks().

Greetings,
Rafael


 include/linux/freezer.h |   30 ++++++++++++++++++++++++++++--
 include/linux/sched.h   |    1 +
 kernel/fork.c           |    3 +++
 kernel/power/process.c  |   32 +++++++++++++-------------------
 4 files changed, 45 insertions(+), 21 deletions(-)

Index: linux-2.6.20-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc
 #define PF_SPREAD_SLAB	0x02000000	/* Spread some slab caches over cpuset */
 #define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER	0x20000000	/* Thread belongs to the rt mutex tester */
+#define PF_FREEZER_SKIP	0x40000000	/* Freezer should not count it as freezeable */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.20-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -75,7 +75,31 @@ static inline int try_to_freeze(void)
 		return 0;
 }
 
-extern void thaw_some_processes(int all);
+/*
+ * Tell the freezer not to count current task as freezeable
+ */
+static inline void freezer_do_not_count(void)
+{
+	current->flags |= PF_FREEZER_SKIP;
+}
+
+/*
+ * Try to freeze the current task and tell the freezer to count it as freezeable
+ * again
+ */
+static inline void freezer_count(void)
+{
+	try_to_freeze();
+	current->flags &= ~PF_FREEZER_SKIP;
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+	return !!(p->flags & PF_FREEZER_SKIP);
+}
 
 #else
 static inline int frozen(struct task_struct *p) { return 0; }
@@ -90,5 +114,7 @@ static inline void thaw_processes(void) 
 
 static inline int try_to_freeze(void) { return 0; }
 
-
+static inline void freezer_do_not_count(void) {}
+static inline void freezer_count(void) {}
+static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 #endif
Index: linux-2.6.20-mm2/kernel/fork.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -50,6 +50,7 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 #include <linux/ptrace.h>
+#include <linux/freezer.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
 		tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
 			wait_for_completion(&vfork);
+			freezer_count();
 			tracehook_report_vfork_done(p, nr);
 		}
 	} else {
Index: linux-2.6.20-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.20-mm2.orig/kernel/power/process.c
+++ linux-2.6.20-mm2/kernel/power/process.c
@@ -42,6 +42,7 @@ void refrigerator(void)
 
 	task_lock(current);
 	if (freezing(current)) {
+		current->flags &= ~PF_FREEZER_SKIP;
 		frozen_process(current);
 		task_unlock(current);
 	} else {
@@ -131,22 +132,12 @@ static unsigned int try_to_freeze_tasks(
 				cancel_freezing(p);
 				continue;
 			}
-			if (is_user_space(p)) {
-				if (!freeze_user_space)
-					continue;
-
-				/* Freeze the task unless there is a vfork
-				 * completion pending
-				 */
-				if (!p->vfork_done)
-					freeze_process(p);
-			} else {
-				if (freeze_user_space)
-					continue;
+			if (is_user_space(p) == !freeze_user_space)
+				continue;
 
-				freeze_process(p);
-			}
-			todo++;
+			freeze_process(p);
+			if (!freezer_should_skip(p))
+				todo++;
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
 		yield();			/* Yield is okay here */
@@ -171,9 +162,14 @@ static unsigned int try_to_freeze_tasks(
 			if (is_user_space(p) == !freeze_user_space)
 				continue;
 
-			if (freezeable(p) && !frozen(p))
+			task_lock(p);
+
+			if (freezeable(p) && !frozen(p) &&
+			    !freezer_should_skip(p))
 				printk(KERN_ERR " %s\n", p->comm);
 
+			task_unlock(p);
+
 			cancel_freezing(p);
 		} while_each_thread(g, p);
 		read_unlock(&tasklist_lock);
@@ -219,9 +215,7 @@ static void thaw_tasks(int thaw_user_spa
 		if (is_user_space(p) == !thaw_user_space)
 			continue;
 
-		if (!thaw_process(p))
-			printk(KERN_WARNING " Strange, %s not stopped\n",
-				p->comm );
+		thaw_process(p);
 	} while_each_thread(g, p);
 	read_unlock(&tasklist_lock);
 }


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 17:03                                                               ` Rafael J. Wysocki
  2007-02-22 17:44                                                                 ` Oleg Nesterov
@ 2007-02-23  3:02                                                                 ` Gautham R Shenoy
  1 sibling, 0 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-23  3:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, paulmck, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On Thu, Feb 22, 2007 at 06:03:37PM +0100, Rafael J. Wysocki wrote:
> 
> Why don't we just drop the warning?  try_to_freeze_tasks() should give us a
> warning if there's anything wrong anyway.

The patches look good. I will add my hotplug changes on the top of
these. 

And yeah, removing the warnings from thaw_processes
sounds like a good thing to me. I was constantly getting warnings like
"Strange, but ksoftirqd/1 was not frozen" when ksoftirqd was created in 
the CPU_UP path from the frozen context! 
I was wondering if freezing the processes created from the
frozen context is a good thing to silence these warnings, but I guess that
would open up some new races.


> 
> Greetings,
> Rafael

regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
  2007-02-22 21:56                                                                   ` Rafael J. Wysocki
@ 2007-02-23 18:15                                                                     ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2007-02-23 18:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: paulmck, ego, akpm, paulmck, mingo, vatsa, dipankar,
	venkatesh.pallipadi, linux-kernel, Pavel Machek

On 02/22, Rafael J. Wysocki wrote:
>
> On Thursday, 22 February 2007 18:44, Oleg Nesterov wrote:
> > On 02/22, Rafael J. Wysocki wrote:
> > > 
> > > Okay, attached.  The first one closes the race between thaw_tasks() and the
> > > refrigerator that can occurs if the freezing fails.  The second one fixes the
> > > vfork problem (should go on top of the first one).
> > 
> > Looks good to me.
> > 
> > > > Any other ideas? In any case we should imho avoid a separate loop for
> > > > PF_FREEZER_SKIP tasks to just fix debug messages. In fact it can't help
> > > > anyway.
> > > 
> > > Why don't we just drop the warning?  try_to_freeze_tasks() should give us a
> > > warning if there's anything wrong anyway.
> > 
> > Indeed :)
> 
> Still, there is a tiny race in the error path of try_to_freeze_tasks(), where
> a vfork parent process can be preempted after clearing PF_FREEZER_SKIP and
> before entering refrigerator(), so try_to_freeze_tasks() will mistakenly report
> that this process has caused a problem.

Yes, basically the same race. But unlike thaw_tasks(), this is the error path,
it is not so critical to filter out the false warnings. We can't do this anyway.
Since try_to_freeze_tasks() failed, new processes can be created, we don't filter
out "TASK_TRACED && frozen(p->parent)", etc.

> I think this race can be closed by (1) clearing PF_FREEZER_SKIP after calling
> try_to_freeze() in freezer_count(), (2) clearing PF_FREEZER_SKIP in
> refrigerator() before calling frozen_process() and (3) taking task_lock()
> around the warning check in the error path of try_to_freeze_tasks().

I am a bit confused by this patch. Which method it uses?

> +static inline void freezer_count(void)
> +{
> +	try_to_freeze();
> +	current->flags &= ~PF_FREEZER_SKIP;
> +}
>
> ...
>
> @@ -42,6 +42,7 @@ void refrigerator(void)
>  
>  	task_lock(current);
>  	if (freezing(current)) {
> +		current->flags &= ~PF_FREEZER_SKIP;

Isn't it better to clear PF_FREEZER_SKIP unconditionally? freezer_count() will
do try_to_freeze(), nothing more.

It is not safe to clear PF_FREEZER_SKIP in freezer_count() ater try_to_freeze().
PF_FREEZER_SKIP is a promise to to try_to_freeze(). try_to_freeze_tasks() fails,
does cancel_freezing(), a CLONE_VFORK parent returns from try_to_freeze() with
PF_FREEZER_SKIP set, and now comes another try_to_freeze_tasks() ?

Surely, this can't happen in practice, but still.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: freezer problems
       [not found]         ` <20070223110201.GC10973@in.ibm.com>
@ 2007-02-23 19:03           ` Gautham R Shenoy
  0 siblings, 0 replies; 92+ messages in thread
From: Gautham R Shenoy @ 2007-02-23 19:03 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Rafael J. Wysocki, Aneesh Kumar, oleg, linux-kernel

On Fri, Feb 23, 2007 at 04:32:01PM +0530, Srivatsa Vaddagiri wrote:
> On Fri, Feb 23, 2007 at 04:17:23PM +0530, Srivatsa Vaddagiri wrote:
> > Ok that was my point of concern. For hotplug we would ideally like
> > everyone to be frozen. If we are not freezing some (like vfork parents),
> > (rather if we dont -wait- for them to get frozen) before offlining a
> > cpu, then it may expose some hotplug unsafe code in the caller of vfork
> > in kernel - hopefully that is not a issue practically speaking.
> 
> I notice that __call_usermodehelper() work function calls kernel_thread with
> CLONE_VFORK set. __call_usermodehelper() is usualled called in the
> context of a worker thread (kevent).

But I see __call_usermodehelper being called from the context of
khelper_wq which is a singlethreaded workqueue.

I thought we were not planning to freeze singlethreaded workqueue's for
hotplug, since we are not kthread_stopping them anywhere.

So this kthread_stop waiting for parent(khelper_wq) which is blocked on
wait_for_complete(child->vfork_done) shouldn't occur. No?

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2007-02-27  4:10 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-14 14:40 [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Gautham R Shenoy
2007-02-14 14:42 ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Gautham R Shenoy
2007-02-14 14:43   ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Gautham R Shenoy
2007-02-14 14:43     ` [RFC PATCH(Experimental) 3/4] Revert changes to sched.c and slab.c Gautham R Shenoy
2007-02-14 14:44       ` [RFC PATCH(Experimental) 4/4] Rip out lock_cpu_hotplug from linux Gautham R Shenoy
2007-02-14 14:59     ` [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c Srivatsa Vaddagiri
2007-02-14 15:24     ` Srivatsa Vaddagiri
2007-02-14 20:23       ` Oleg Nesterov
2007-02-14 20:09     ` Oleg Nesterov
2007-02-16  5:26       ` Srivatsa Vaddagiri
2007-02-16 15:33         ` Oleg Nesterov
2007-02-16 16:47           ` Srivatsa Vaddagiri
2007-02-16 18:45             ` Oleg Nesterov
2007-02-16 23:59             ` Oleg Nesterov
2007-02-17  2:29               ` Srivatsa Vaddagiri
2007-02-17 21:59                 ` Oleg Nesterov
2007-02-20 15:12                   ` Srivatsa Vaddagiri
2007-02-20 20:09                     ` Oleg Nesterov
2007-02-21  6:29                       ` Srivatsa Vaddagiri
2007-02-21 14:30                         ` Oleg Nesterov
2007-02-21 14:37                           ` Gautham R Shenoy
2007-02-21 15:53                           ` Srivatsa Vaddagiri
2007-02-14 15:31   ` [RFC PATCH(Experimental) 1/4] freezer-cpu-hotplug core Srivatsa Vaddagiri
2007-02-14 19:47   ` Oleg Nesterov
2007-02-16  6:48     ` Srivatsa Vaddagiri
2007-02-16 15:47       ` Oleg Nesterov
2007-02-14 20:22   ` Oleg Nesterov
2007-02-16  7:16     ` Srivatsa Vaddagiri
2007-02-16  8:12       ` Srivatsa Vaddagiri
2007-02-16  9:29         ` Rafael J. Wysocki
2007-02-16  9:59           ` Srivatsa Vaddagiri
2007-02-16 11:06             ` Rafael J. Wysocki
2007-02-16 19:46         ` Oleg Nesterov
2007-02-17  2:31           ` Srivatsa Vaddagiri
2007-02-17  5:32         ` Gautham R Shenoy
2007-02-17 11:19           ` Gautham R Shenoy
2007-02-16 16:06       ` Oleg Nesterov
2007-02-14 21:43 ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
2007-02-15  6:34   ` Gautham R Shenoy
2007-02-15  8:09     ` Rafael J. Wysocki
2007-02-15 12:20       ` Gautham R Shenoy
2007-02-15 13:31         ` Rafael J. Wysocki
2007-02-15 14:25           ` Gautham R Shenoy
2007-02-17 11:24             ` Rafael J. Wysocki
2007-02-17 21:34               ` Oleg Nesterov
2007-02-17 22:24                 ` Rafael J. Wysocki
2007-02-17 23:42                   ` Oleg Nesterov
2007-02-17 23:47                     ` Oleg Nesterov
2007-02-18 10:43                       ` Rafael J. Wysocki
2007-02-18 11:31                         ` Oleg Nesterov
2007-02-18 12:14                           ` Rafael J. Wysocki
2007-02-18 14:52                             ` freezer problems Oleg Nesterov
2007-02-18 15:14                               ` Rafael J. Wysocki
2007-02-18 16:19                                 ` Oleg Nesterov
2007-02-18 18:14                                   ` Rafael J. Wysocki
2007-02-18 18:56                               ` Rafael J. Wysocki
2007-02-18 22:01                                 ` Oleg Nesterov
2007-02-18 23:19                                   ` Rafael J. Wysocki
2007-02-19 20:23                                     ` Oleg Nesterov
2007-02-19 21:21                                       ` Rafael J. Wysocki
2007-02-19 22:41                                         ` Oleg Nesterov
2007-02-19 23:35                                           ` Rafael J. Wysocki
2007-02-20  0:12                                             ` Oleg Nesterov
2007-02-20  0:32                                               ` Rafael J. Wysocki
2007-02-20  0:50                                                 ` Oleg Nesterov
2007-02-20 18:28                                                   ` Rafael J. Wysocki
2007-02-20 18:29                                                 ` Rafael J. Wysocki
2007-02-21 18:14                                                   ` Paul E. McKenney
2007-02-21 18:13                                                     ` Rafael J. Wysocki
2007-02-21 18:27                                                       ` Paul E. McKenney
2007-02-21 20:03                                                       ` Oleg Nesterov
2007-02-21 20:47                                                         ` Rafael J. Wysocki
2007-02-21 21:06                                                         ` Paul E. McKenney
2007-02-21 23:10                                                           ` Rafael J. Wysocki
2007-02-22 10:47                                                             ` Oleg Nesterov
2007-02-22 11:33                                                               ` Oleg Nesterov
2007-02-22 17:03                                                               ` Rafael J. Wysocki
2007-02-22 17:44                                                                 ` Oleg Nesterov
2007-02-22 21:56                                                                   ` Rafael J. Wysocki
2007-02-23 18:15                                                                     ` Oleg Nesterov
2007-02-23  3:02                                                                 ` Gautham R Shenoy
2007-02-18 15:09                             ` [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug Rafael J. Wysocki
2007-02-18 16:11                               ` Oleg Nesterov
2007-02-18 18:51                                 ` Rafael J. Wysocki
2007-02-18 10:32                     ` Rafael J. Wysocki
2007-02-18 11:32                       ` Oleg Nesterov
2007-02-18 12:12                         ` Rafael J. Wysocki
2007-02-18 15:06                           ` Oleg Nesterov
2007-02-18 12:56               ` Pavel Machek
2007-02-21 14:52               ` Gautham R Shenoy
2007-02-21 19:42                 ` Pavel Machek
     [not found] ` <200702231041.17136.rjw@sisk.pl>
     [not found]   ` <20070223100817.GA10973@in.ibm.com>
     [not found]     ` <200702231115.00718.rjw@sisk.pl>
     [not found]       ` <20070223104723.GB10973@in.ibm.com>
     [not found]         ` <20070223110201.GC10973@in.ibm.com>
2007-02-23 19:03           ` freezer problems Gautham R Shenoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.