All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Cpu-hotplug: Using the Process Freezer (try2)
@ 2007-04-02  5:34 Gautham R Shenoy
  2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
                   ` (8 more replies)
  0 siblings, 9 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:34 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

Hello Everybody, 

This is another attempt towards process-freezer based cpu-hotplug.
This patchset covers just about everything that was discussed on
the LKML with respect to the freezer-based cpu-hotplug.

Following are new features from the version I last posted:

- Enhancements to the freezer interface so that
  it can be used for events other than Software suspend.

- A reentrant process freezer. Software suspend needs since it uses
  cpu-hotplug.

- All non-singlethreaded workqueues freezeable by default.

And yes, this time around, I have some numbers to give an idea 
of the performance.

Test m/c configuration: 4-way i386 Xeon Box with 2.5G Memory.
Test case: 'make -jN' running parallely with cpu-offline/online operations
	    in a tight loop.

With N=256.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kernel: linux-2.6.21-rc5-mm3 vanilla
---------------------------------------------------------
Maximum time taken for a hotplug operation	:  1m5.357s
Minimum time taken for a hotplug operation	:  0m0.288s
Average number of attempts to offline AND Online
cpu1, cpu2 and cpu3				: 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kernel: linux-2.6.21-rc5-mm3 + the patchset.
---------------------------------------------------------
Maximum time taken for a hotplug operation	: 0m1s.261s
Minimum time taken for a hotplug operation	: 0m0s.808s
Average number of attempts to offline and online
cpu1, cpu2 and cpu3				: 14
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The numbers look ok.

However, on an average it took around 14 attempts to offline AND online
cpu1,cpu2, cpu3 when otherwise it should have taken only 6.

The failures were due to the freezer failing to freeze all the tasks
within the timeout period (20sec).

As N increased (512, 1024, 2048 and 8192), even the average number
of attempts to online AND offline the 3 cpus increased.

At this point of time, I worked towards ensuring that cpu-hotplug works
reliably with process freezer. The patchset has behaved reliably 
(no oopses/hangs/freezes) whenever the process freezer has successfully frozen 
all the tasks in the system, thereby resulting in a successful cpu-hotplug
operation.

The tasks which failed to freeze, in each of the failed cases, were all
'make' tasks which had been forked out by the make -jN test case.

I believe that the reasons for freezer failing as N increases are :
- 'make -jN' keeps forking new tasks every now and then, thereby resulting
  in a never-ending catching up game in the do_while loop inside
  try_to_freeze_tasks (kernel/power/process.c)

- Many of the threads might be sleeping on some mutex/semaphore and thus
  never calling try_to_freeze() within the timeout period.

Instead of waiting for all the tasks to call try_to_freeze 
in the above mentioned do_while loop, I wonder if we can put some hooks in 
sched.c so asto not schedule the task marked PF_FREEZING/PF_FROZEN. 
That way we would only have to wait for the currently running thread to call 
try_to_freeze for each online cpu. 

I'm sure there are better explainations/solutions to this problem.

And yeah, trying a cpu-hotplug operation with a plain 'make -j' might be a 
little too ambitious at this point of time. Atleast in my case it was :-)

The patchset is against 2.6.21-rc5-mm3.

Awaiting your feedback.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
@ 2007-04-02  5:37 ` Gautham R Shenoy
  2007-04-02 13:56   ` Pavel Machek
  2007-04-05  9:46   ` Oleg Nesterov
  2007-04-02  5:37 ` [PATCH 2/8] Make process freezer reentrant Gautham R Shenoy
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:37 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

 This patch provides an interface to extend the use of the process
 freezer beyond Suspend.

The tasks can selectively mark themselves to be exempted from specific
freeze events like SUSPEND /KPROBES/CPU_HOTPLUG.

This patch however, *does not* sort non freezable threads into
different categories based on the freeze events. Thus all 
tasks which were previously marked PF_NOFREEZE are now
exempted from freezer using 
	freezer_exempt(FE_ALL);
which means exempt from all kinds of freezes.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

--
 arch/i386/kernel/apm.c              |    2 -
 drivers/block/loop.c                |    2 -
 drivers/char/apm-emulation.c        |    6 ++--
 drivers/ieee1394/ieee1394_core.c    |    2 -
 drivers/md/md.c                     |    2 -
 drivers/mmc/card/queue.c            |    3 +-
 drivers/mtd/mtd_blkdevs.c           |    3 +-
 drivers/scsi/libsas/sas_scsi_host.c |    2 -
 drivers/scsi/scsi_error.c           |    2 -
 drivers/usb/storage/usb.c           |    2 -
 include/linux/freezer.h             |   52 ++++++++++++++++++++++++++++++++----
 include/linux/sched.h               |   15 +++++++++-
 init/do_mounts_initrd.c             |    2 -
 kernel/fork.c                       |    2 -
 kernel/kprobes.c                    |    4 +-
 kernel/power/disk.c                 |    4 +-
 kernel/power/main.c                 |    6 ++--
 kernel/power/process.c              |   29 +++++++++++---------
 kernel/power/user.c                 |    8 ++---
 kernel/rcutorture.c                 |    4 +-
 kernel/sched.c                      |    2 -
 kernel/softirq.c                    |    2 -
 kernel/softlockup.c                 |    2 -
 kernel/workqueue.c                  |    2 -
 24 files changed, 110 insertions(+), 50 deletions(-)

Index: linux-2.6.21-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/sched.h
+++ linux-2.6.21-rc5/include/linux/sched.h
@@ -1186,8 +1186,21 @@ static inline void put_task_struct(struc
 #define PF_MEMALLOC	0x00000800	/* Allocating memory */
 #define PF_FLUSHER	0x00001000	/* responsible for disk writeback */
 #define PF_USED_MATH	0x00002000	/* if unset the fpu must be initialized before use */
-#define PF_NOFREEZE	0x00008000	/* this thread should not be frozen */
+
+/* Per process freezer specific flags */
+#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
+					 * for suspend
+					 */
+
+#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
+					 * for Kprobes
+					 */
+
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
+
+#define PF_FE_ALL	(PF_FE_SUSPEND | PF_FE_KPROBES)
+					/* Exempt from all kinds freeze chills*/
+
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
 #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
Index: linux-2.6.21-rc5/include/linux/freezer.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/freezer.h
+++ linux-2.6.21-rc5/include/linux/freezer.h
@@ -2,7 +2,49 @@
 
 #include <linux/sched.h>
 
-#ifdef CONFIG_PM
+#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
+					defined(CONFIG_KPROBES)
+
+/* These are the events which make use of the process freezer */
+#define FE_NONE		0
+#define FE_ALL		PF_FE_ALL
+#define FE_SUSPEND	PF_FE_SUSPEND
+#define FE_KPROBES	PF_FE_KPROBES
+
+/*
+ * Exempt the current process from being frozen for a certain event
+ */
+static inline void freezer_exempt(unsigned long exempt_freeze_event)
+{
+	if (exempt_freeze_event == FE_NONE)
+		current->flags &= ~PF_FE_ALL;
+	else
+		current->flags |= exempt_freeze_event;
+}
+
+/*
+ * Consider the current process to be frozen for a certain event
+ */
+static inline void freezer_consider(unsigned long freeze_event)
+{
+	current->flags &= ~freeze_event;
+}
+
+/*
+ * Check if the process is exempted from freeze for the particular event.
+ */
+static inline int freezer_is_exempted(struct task_struct *p,
+					unsigned long freeze_event)
+{
+	return p->flags & freeze_event;
+}
+
+/* Returns the mask of the events for which this process is freezeable */
+static inline unsigned long freezeable_event_mask(struct task_struct *p)
+{
+	return ~p->flags & PF_FE_ALL;
+}
+
 /*
  * Check if a process has been frozen
  */
@@ -63,8 +105,8 @@ static inline void frozen_process(struct
 }
 
 extern void refrigerator(void);
-extern int freeze_processes(void);
-extern void thaw_processes(void);
+extern int freeze_processes(unsigned long freeze_event);
+extern void thaw_processes(unsigned long freeze_event);
 
 static inline int try_to_freeze(void)
 {
@@ -127,8 +169,8 @@ static inline int thaw_process(struct ta
 static inline void frozen_process(struct task_struct *p) { BUG(); }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
-static inline void thaw_processes(void) {}
+static inline int freeze_processes(int a) { BUG(); return 1; }
+static inline void thaw_processes(int a) {}
 
 static inline int try_to_freeze(void) { return 0; }
 
Index: linux-2.6.21-rc5/kernel/power/process.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/power/process.c
+++ linux-2.6.21-rc5/kernel/power/process.c
@@ -24,10 +24,10 @@
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
 
-static inline int freezeable(struct task_struct * p)
+static inline int freezeable(struct task_struct * p, unsigned long freeze_event)
 {
 	if ((p == current) ||
-	    (p->flags & PF_NOFREEZE) ||
+	    (freezer_is_exempted(p, freeze_event)) ||
 	    (p->exit_state != 0))
 		return 0;
 	return 1;
@@ -106,7 +106,8 @@ static inline int is_user_space(struct t
 	return ret;
 }
 
-static unsigned int try_to_freeze_tasks(int freeze_user_space)
+static unsigned int try_to_freeze_tasks(int freeze_user_space,
+					unsigned long freeze_event)
 {
 	struct task_struct *g, *p;
 	unsigned long end_time;
@@ -117,7 +118,7 @@ static unsigned int try_to_freeze_tasks(
 		todo = 0;
 		read_lock(&tasklist_lock);
 		do_each_thread(g, p) {
-			if (!freezeable(p))
+			if (!freezeable(p, freeze_event))
 				continue;
 
 			if (frozen(p))
@@ -158,7 +159,7 @@ static unsigned int try_to_freeze_tasks(
 				continue;
 
 			task_lock(p);
-			if (freezeable(p) && !frozen(p) &&
+			if (freezeable(p, freeze_event) && !frozen(p) &&
 			    !freezer_should_skip(p))
 				printk(KERN_ERR " %s\n", p->comm);
 
@@ -173,21 +174,23 @@ static unsigned int try_to_freeze_tasks(
 
 /**
  *	freeze_processes - tell processes to enter the refrigerator
+ *	@freeze_event: The system event for which processes are being
+ *			frozen.
  *
  *	Returns 0 on success, or the number of processes that didn't freeze,
  *	although they were told to.
  */
-int freeze_processes(void)
+int freeze_processes(unsigned long freeze_event)
 {
 	unsigned int nr_unfrozen;
 
 	printk("Stopping tasks ... ");
-	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE);
+	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE, freeze_event);
 	if (nr_unfrozen)
 		return nr_unfrozen;
 
 	sys_sync();
-	nr_unfrozen = try_to_freeze_tasks(FREEZER_KERNEL_THREADS);
+	nr_unfrozen = try_to_freeze_tasks(FREEZER_KERNEL_THREADS, freeze_event);
 	if (nr_unfrozen)
 		return nr_unfrozen;
 
@@ -196,13 +199,13 @@ int freeze_processes(void)
 	return 0;
 }
 
-static void thaw_tasks(int thaw_user_space)
+static void thaw_tasks(int thaw_user_space, unsigned long freeze_event)
 {
 	struct task_struct *g, *p;
 
 	read_lock(&tasklist_lock);
 	do_each_thread(g, p) {
-		if (!freezeable(p))
+		if (!freezeable(p, freeze_event))
 			continue;
 
 		if (is_user_space(p) == !thaw_user_space)
@@ -213,11 +216,11 @@ static void thaw_tasks(int thaw_user_spa
 	read_unlock(&tasklist_lock);
 }
 
-void thaw_processes(void)
+void thaw_processes(unsigned long freeze_event)
 {
 	printk("Restarting tasks ... ");
-	thaw_tasks(FREEZER_KERNEL_THREADS);
-	thaw_tasks(FREEZER_USER_SPACE);
+	thaw_tasks(FREEZER_KERNEL_THREADS, freeze_event);
+	thaw_tasks(FREEZER_USER_SPACE, freeze_event);
 	schedule();
 	printk("done.\n");
 }
Index: linux-2.6.21-rc5/kernel/sched.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/sched.c
+++ linux-2.6.21-rc5/kernel/sched.c
@@ -5057,6 +5057,7 @@ static int migration_thread(void *data)
 	BUG_ON(rq->migration_thread != current);
 
 	set_current_state(TASK_INTERRUPTIBLE);
+	freezer_exempt(FE_ALL);
 	while (!kthread_should_stop()) {
 		struct migration_req *req;
 		struct list_head *head;
@@ -5407,7 +5408,6 @@ migration_call(struct notifier_block *nf
 		p = kthread_create(migration_thread, hcpu, "migration/%d",cpu);
 		if (IS_ERR(p))
 			return NOTIFY_BAD;
-		p->flags |= PF_NOFREEZE;
 		kthread_bind(p, cpu);
 		/* Must be high prio: stop_machine expects to yield to it. */
 		rq = task_rq_lock(p, &flags);
Index: linux-2.6.21-rc5/kernel/softlockup.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/softlockup.c
+++ linux-2.6.21-rc5/kernel/softlockup.c
@@ -96,7 +96,7 @@ static int watchdog(void * __bind_cpu)
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
Index: linux-2.6.21-rc5/kernel/workqueue.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/workqueue.c
+++ linux-2.6.21-rc5/kernel/workqueue.c
@@ -292,7 +292,7 @@ static int worker_thread(void *__cwq)
 	struct k_sigaction sa;
 
 	if (!cwq->wq->freezeable)
-		current->flags |= PF_NOFREEZE;
+		freezer_exempt(FE_ALL);
 
 	/*
 	 * We inherited MPOL_INTERLEAVE from the booting kernel.
Index: linux-2.6.21-rc5/kernel/rcutorture.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/rcutorture.c
+++ linux-2.6.21-rc5/kernel/rcutorture.c
@@ -559,7 +559,7 @@ rcu_torture_fakewriter(void *arg)
 
 	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task started");
 	set_user_nice(current, 19);
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	do {
 		schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
@@ -590,7 +590,7 @@ rcu_torture_reader(void *arg)
 
 	VERBOSE_PRINTK_STRING("rcu_torture_reader task started");
 	set_user_nice(current, 19);
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	do {
 		idx = cur_ops->readlock();
Index: linux-2.6.21-rc5/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/kernel/apm.c
+++ linux-2.6.21-rc5/arch/i386/kernel/apm.c
@@ -1395,6 +1395,7 @@ static void apm_mainloop(void)
 
 	add_wait_queue(&apm_waitqueue, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
+	freezer_exempt(FE_ALL);
 	for (;;) {
 		try_to_freeze();
 		schedule_timeout(APM_CHECK_TIMEOUT);
@@ -2315,7 +2316,6 @@ static int __init apm_init(void)
 		remove_proc_entry("apm", NULL);
 		return err;
 	}
-	kapmd_task->flags |= PF_NOFREEZE;
 	wake_up_process(kapmd_task);
 
 	if (num_online_cpus() > 1 && !smp ) {
Index: linux-2.6.21-rc5/drivers/char/apm-emulation.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/char/apm-emulation.c
+++ linux-2.6.21-rc5/drivers/char/apm-emulation.c
@@ -336,7 +336,7 @@ apm_ioctl(struct inode * inode, struct f
 			 * threads.
 			 */
 			flags = current->flags;
-			current->flags |= PF_NOFREEZE;
+			freezer_exempt(FE_ALL);
 
 			wait_event(apm_suspend_waitqueue,
 				   as->suspend_state == SUSPEND_DONE);
@@ -372,7 +372,7 @@ apm_ioctl(struct inode * inode, struct f
 			 * threads.
 			 */
 			flags = current->flags;
-			current->flags |= PF_NOFREEZE;
+			freezer_exempt(FE_ALL);
 
 			wait_event_interruptible(apm_suspend_waitqueue,
 					 as->suspend_state == SUSPEND_DONE);
@@ -536,6 +536,7 @@ static int apm_get_info(char *buf, char 
 
 static int kapmd(void *arg)
 {
+	freezer_exempt(FE_ALL);
 	do {
 		apm_event_t event;
 		int ret;
@@ -601,7 +602,6 @@ static int __init apm_init(void)
 		kapmd_tsk = NULL;
 		return ret;
 	}
-	kapmd_tsk->flags |= PF_NOFREEZE;
 	wake_up_process(kapmd_tsk);
 
 #ifdef CONFIG_PROC_FS
Index: linux-2.6.21-rc5/drivers/block/loop.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/block/loop.c
+++ linux-2.6.21-rc5/drivers/block/loop.c
@@ -587,7 +587,7 @@ static int loop_thread(void *data)
 	 * hence, it mustn't be stopped at all
 	 * because it could be indirectly used during suspension
 	 */
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	set_user_nice(current, -20);
 
Index: linux-2.6.21-rc5/drivers/ieee1394/ieee1394_core.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/ieee1394/ieee1394_core.c
+++ linux-2.6.21-rc5/drivers/ieee1394/ieee1394_core.c
@@ -1134,7 +1134,7 @@ static int hpsbpkt_thread(void *__hi)
 	struct list_head tmp;
 	int may_schedule;
 
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	while (!kthread_should_stop()) {
 
Index: linux-2.6.21-rc5/drivers/md/md.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/md/md.c
+++ linux-2.6.21-rc5/drivers/md/md.c
@@ -4527,7 +4527,7 @@ static int md_thread(void * arg)
 	 * many dirty RAID5 blocks.
 	 */
 
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 	allow_signal(SIGKILL);
 	while (!kthread_should_stop()) {
 
Index: linux-2.6.21-rc5/drivers/mtd/mtd_blkdevs.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/mtd/mtd_blkdevs.c
+++ linux-2.6.21-rc5/drivers/mtd/mtd_blkdevs.c
@@ -82,7 +82,8 @@ static int mtd_blktrans_thread(void *arg
 	struct request_queue *rq = tr->blkcore_priv->rq;
 
 	/* we might get involved when memory gets low, so use PF_MEMALLOC */
-	current->flags |= PF_MEMALLOC | PF_NOFREEZE;
+	current->flags |= PF_MEMALLOC;
+	freezer_exempt(FE_ALL);
 
 	daemonize("%sd", tr->name);
 
Index: linux-2.6.21-rc5/drivers/scsi/libsas/sas_scsi_host.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/scsi/libsas/sas_scsi_host.c
+++ linux-2.6.21-rc5/drivers/scsi/libsas/sas_scsi_host.c
@@ -871,7 +871,7 @@ static int sas_queue_thread(void *_sas_h
 	struct scsi_core *core = &sas_ha->core;
 
 	daemonize("sas_queue_%d", core->shost->host_no);
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	complete(&queue_th_comp);
 
Index: linux-2.6.21-rc5/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/scsi/scsi_error.c
+++ linux-2.6.21-rc5/drivers/scsi/scsi_error.c
@@ -1536,7 +1536,7 @@ int scsi_error_handler(void *data)
 {
 	struct Scsi_Host *shost = data;
 
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	/*
 	 * We use TASK_INTERRUPTIBLE so that the thread is not
Index: linux-2.6.21-rc5/drivers/usb/storage/usb.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/usb/storage/usb.c
+++ linux-2.6.21-rc5/drivers/usb/storage/usb.c
@@ -301,7 +301,7 @@ static int usb_stor_control_thread(void 
 	struct us_data *us = (struct us_data *)__us;
 	struct Scsi_Host *host = us_to_host(us);
 
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	for(;;) {
 		try_to_freeze();
Index: linux-2.6.21-rc5/init/do_mounts_initrd.c
===================================================================
--- linux-2.6.21-rc5.orig/init/do_mounts_initrd.c
+++ linux-2.6.21-rc5/init/do_mounts_initrd.c
@@ -55,7 +55,7 @@ static void __init handle_initrd(void)
 	sys_mount(".", "/", NULL, MS_MOVE, NULL);
 	sys_chroot(".");
 
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
 	if (pid > 0) {
 		while (pid != sys_wait4(-1, NULL, 0, NULL))
Index: linux-2.6.21-rc5/kernel/softirq.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/softirq.c
+++ linux-2.6.21-rc5/kernel/softirq.c
@@ -490,7 +490,7 @@ void __init softirq_init(void)
 static int ksoftirqd(void * __bind_cpu)
 {
 	set_user_nice(current, 19);
-	current->flags |= PF_NOFREEZE;
+	freezer_exempt(FE_ALL);
 
 	set_current_state(TASK_INTERRUPTIBLE);
 
Index: linux-2.6.21-rc5/kernel/kprobes.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/kprobes.c
+++ linux-2.6.21-rc5/kernel/kprobes.c
@@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
 {
 	int ret = 0;
 #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
-	ret = freeze_processes();
+	ret = freeze_processes(FE_KPROBES);
 	if (ret == 0) {
 		struct task_struct *p, *q;
 		do_each_thread(p, q) {
@@ -117,7 +117,7 @@ static int __kprobes check_safety(void)
 		} while_each_thread(p, q);
 	}
 loop_end:
-	thaw_processes();
+	thaw_processes(FE_KPROBES);
 #else
 	synchronize_sched();
 #endif
Index: linux-2.6.21-rc5/kernel/power/disk.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/power/disk.c
+++ linux-2.6.21-rc5/kernel/power/disk.c
@@ -104,7 +104,7 @@ static inline void platform_finish(void)
 
 static void unprepare_processes(void)
 {
-	thaw_processes();
+	thaw_processes(FE_SUSPEND);
 	pm_restore_console();
 }
 
@@ -113,7 +113,7 @@ static int prepare_processes(void)
 	int error = 0;
 
 	pm_prepare_console();
-	if (freeze_processes()) {
+	if (freeze_processes(FE_SUSPEND)) {
 		error = -EBUSY;
 		unprepare_processes();
 	}
Index: linux-2.6.21-rc5/kernel/power/main.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/power/main.c
+++ linux-2.6.21-rc5/kernel/power/main.c
@@ -86,7 +86,7 @@ static int suspend_prepare(suspend_state
 
 	pm_prepare_console();
 
-	if (freeze_processes()) {
+	if (freeze_processes(FE_SUSPEND)) {
 		error = -EAGAIN;
 		goto Thaw;
 	}
@@ -123,7 +123,7 @@ static int suspend_prepare(suspend_state
 	device_resume();
 	resume_console();
  Thaw:
-	thaw_processes();
+	thaw_processes(FE_SUSPEND);
 	pm_restore_console();
 	return error;
 }
@@ -162,7 +162,7 @@ static void suspend_finish(suspend_state
 	pm_finish(state);
 	device_resume();
 	resume_console();
-	thaw_processes();
+	thaw_processes(FE_SUSPEND);
 	pm_restore_console();
 }
 
Index: linux-2.6.21-rc5/kernel/power/user.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/power/user.c
+++ linux-2.6.21-rc5/kernel/power/user.c
@@ -88,7 +88,7 @@ static int snapshot_release(struct inode
 	free_bitmap(data->bitmap);
 	if (data->frozen) {
 		mutex_lock(&pm_mutex);
-		thaw_processes();
+		thaw_processes(FE_SUSPEND);
 		enable_nonboot_cpus();
 		mutex_unlock(&pm_mutex);
 	}
@@ -239,8 +239,8 @@ static int snapshot_ioctl(struct inode *
 		if (data->frozen)
 			break;
 		mutex_lock(&pm_mutex);
-		if (freeze_processes()) {
-			thaw_processes();
+		if (freeze_processes(FE_SUSPEND)) {
+			thaw_processes(FE_SUSPEND);
 			error = -EBUSY;
 		}
 		mutex_unlock(&pm_mutex);
@@ -252,7 +252,7 @@ static int snapshot_ioctl(struct inode *
 		if (!data->frozen)
 			break;
 		mutex_lock(&pm_mutex);
-		thaw_processes();
+		thaw_processes(FE_SUSPEND);
 		mutex_unlock(&pm_mutex);
 		data->frozen = 0;
 		break;
Index: linux-2.6.21-rc5/drivers/mmc/card/queue.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/mmc/card/queue.c
+++ linux-2.6.21-rc5/drivers/mmc/card/queue.c
@@ -66,7 +66,8 @@ static int mmc_queue_thread(void *d)
 	 * Set iothread to ensure that we aren't put to sleep by
 	 * the process freezing.  We handle suspension ourselves.
 	 */
-	current->flags |= PF_MEMALLOC|PF_NOFREEZE;
+	current->flags |= PF_MEMALLOC;
+	freezer_exempt(FE_ALL);
 
 	down(&mq->thread_sem);
 	do {
Index: linux-2.6.21-rc5/kernel/fork.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/fork.c
+++ linux-2.6.21-rc5/kernel/fork.c
@@ -921,7 +921,7 @@ static inline void copy_flags(unsigned l
 {
 	unsigned long new_flags = p->flags;
 
-	new_flags &= ~(PF_SUPERPRIV | PF_NOFREEZE);
+	new_flags &= ~(PF_SUPERPRIV | PF_FE_ALL);
 	new_flags |= PF_FORKNOEXEC;
 	new_flags |= PF_STARTING;
 	p->flags = new_flags;
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 2/8] Make process freezer reentrant
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
  2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
@ 2007-04-02  5:37 ` Gautham R Shenoy
  2007-04-05  9:53   ` Oleg Nesterov
  2007-04-02  5:38 ` [PATCH 3/8] Use process freezer for cpu-hotplug Gautham R Shenoy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:37 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

This patch adds provision to make the process freezer reentrant 
for different kinds of freeze events.

Credit to Rafael Wysocki for the system_freeze_event_mask idea.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>

--
 kernel/power/process.c |   57 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 7 deletions(-)

Index: linux-2.6.21-rc3/kernel/power/process.c
===================================================================
--- linux-2.6.21-rc3.orig/kernel/power/process.c
+++ linux-2.6.21-rc3/kernel/power/process.c
@@ -23,6 +23,13 @@
 
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
+/*
+ * Global mask of the events (think suspend, cpu-hotplug, etc) for
+ * which the services of the processes freezer have been
+ * currently employed.
+ */
+static unsigned long system_freeze_event_mask;
+static DEFINE_MUTEX(freezer_mutex);
 
 static inline int freezeable(struct task_struct * p, unsigned long freeze_event)
 {
@@ -33,6 +40,25 @@ static inline int freezeable(struct task
 	return 1;
 }
 
+/* Returns the mask of the events for which p is currently frozen */
+static inline int process_frozen_event_mask(struct task_struct *p)
+{
+	return freezeable_event_mask(p) & system_freeze_event_mask;
+}
+
+static inline int thawable(struct task_struct *p, unsigned long freeze_event)
+{
+	if (!freezeable(p, freeze_event))
+		return 0;
+
+	/* p can be thawed iff the process p has been
+	 * frozen for freeze_event alone.
+	 */
+	if (process_frozen_event_mask(p) & ~freeze_event)
+		return 0;
+
+	return 1;
+}
 /* Refrigerator is place where frozen processes are stored :-). */
 void refrigerator(void)
 {
@@ -183,20 +209,30 @@ static unsigned int try_to_freeze_tasks(
 int freeze_processes(unsigned long freeze_event)
 {
 	unsigned int nr_unfrozen;
-
+	int ret = 0;
+	mutex_lock(&freezer_mutex);
+	if (system_freeze_event_mask & freeze_event)
+		goto done;
+	system_freeze_event_mask |= freeze_event;
 	printk("Stopping tasks ... ");
 	nr_unfrozen = try_to_freeze_tasks(FREEZER_USER_SPACE, freeze_event);
-	if (nr_unfrozen)
-		return nr_unfrozen;
+	if (nr_unfrozen) {
+		ret = nr_unfrozen;
+		goto done;
+	}
 
 	sys_sync();
 	nr_unfrozen = try_to_freeze_tasks(FREEZER_KERNEL_THREADS, freeze_event);
-	if (nr_unfrozen)
-		return nr_unfrozen;
+	if (nr_unfrozen){
+		ret = nr_unfrozen;
+		goto done;
+	}
 
 	printk("done.\n");
+done:
+	mutex_unlock(&freezer_mutex);
 	BUG_ON(in_atomic());
-	return 0;
+	return ret;
 }
 
 static void thaw_tasks(int thaw_user_space, unsigned long freeze_event)
@@ -205,7 +241,7 @@ static void thaw_tasks(int thaw_user_spa
 
 	read_lock(&tasklist_lock);
 	do_each_thread(g, p) {
-		if (!freezeable(p, freeze_event))
+		if (!thawable(p, freeze_event))
 			continue;
 
 		if (is_user_space(p) == !thaw_user_space)
@@ -218,11 +254,18 @@ static void thaw_tasks(int thaw_user_spa
 
 void thaw_processes(unsigned long freeze_event)
 {
+	mutex_lock(&freezer_mutex);
+	/* Thaw for freeze_event, only if we've frozen for freeze_event */
+	if (!(system_freeze_event_mask & freeze_event))
+		goto done;
 	printk("Restarting tasks ... ");
 	thaw_tasks(FREEZER_KERNEL_THREADS, freeze_event);
 	thaw_tasks(FREEZER_USER_SPACE, freeze_event);
 	schedule();
 	printk("done.\n");
+	system_freeze_event_mask &= ~freeze_event;
+done:
+	mutex_unlock(&freezer_mutex);
 }
 
 EXPORT_SYMBOL(refrigerator);
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
  2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
  2007-04-02  5:37 ` [PATCH 2/8] Make process freezer reentrant Gautham R Shenoy
@ 2007-04-02  5:38 ` Gautham R Shenoy
  2007-04-05 10:53   ` Oleg Nesterov
  2007-04-06 17:27   ` Nathan Lynch
  2007-04-02  5:39 ` [PATCH 4/8] Rip out lock_cpu_hotplug() Gautham R Shenoy
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:38 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

This patch implements process_freezer based cpu-hotplug
core. 
The sailent features are:
o No more (un)lock_cpu_hotplug.
o No more CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. Hence no per-subsystem
  hotcpu mutexes.
o Calls freeze_process/thaw_processes at the beginning/end of
  the hotplug operation.

Lessons Learnt:
o CPU_UP_PREPARE has to be called with the processes frozen.
  If implemented otherwise, the kernel threads which we create
  during UP_PREPARE will never be frozen during freeze_processes
  since we them up only during CPU_ONLIN.

Points to ponder: 
o Can the SYSTEM_RUNNING hack in _cpu_up be avoided by some cleaner means.

Signed-off-by : Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by : Gautham R Shenoy   <ego@in.ibm.com>
Signed-off-by : Rafael J. Wysocki  <rjw@sisk.pl> 
--
 include/linux/freezer.h  |   10 ++++--
 include/linux/notifier.h |    2 -
 include/linux/sched.h    |    5 ++-
 kernel/cpu.c             |   70 ++++++++++++++++++-----------------------------
 kernel/sched.c           |   20 +------------
 kernel/softirq.c         |   13 ++++----
 kernel/softlockup.c      |    1 
 kernel/workqueue.c       |   33 ++++++++++------------
 mm/slab.c                |    6 ----
 9 files changed, 63 insertions(+), 97 deletions(-)

Index: linux-2.6.21-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/sched.h
+++ linux-2.6.21-rc5/include/linux/sched.h
@@ -1195,10 +1195,13 @@ static inline void put_task_struct(struc
 #define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
 					 * for Kprobes
 					 */
+#define PF_FE_HOTPLUG_CPU 0x00000020	/* Thread should not be frozen for
+					 * cpu hotplug.
+					 */
 
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 
-#define PF_FE_ALL	(PF_FE_SUSPEND | PF_FE_KPROBES)
+#define PF_FE_ALL	(PF_FE_SUSPEND | PF_FE_KPROBES | PF_FE_HOTPLUG_CPU)
 					/* Exempt from all kinds freeze chills*/
 
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
Index: linux-2.6.21-rc5/include/linux/freezer.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/freezer.h
+++ linux-2.6.21-rc5/include/linux/freezer.h
@@ -6,10 +6,12 @@
 					defined(CONFIG_KPROBES)
 
 /* These are the events which make use of the process freezer */
-#define FE_NONE		0
-#define FE_ALL		PF_FE_ALL
-#define FE_SUSPEND	PF_FE_SUSPEND
-#define FE_KPROBES	PF_FE_KPROBES
+#define FE_NONE			0
+#define FE_ALL			PF_FE_ALL
+#define FE_SUSPEND		PF_FE_SUSPEND
+#define FE_KPROBES		PF_FE_KPROBES
+#define FE_HOTPLUG_CPU		PF_FE_HOTPLUG_CPU
+#define FE_SUSPEND_KPROBES	PF_FE_SUSPEND | PF_FE_KPROBES
 
 /*
  * Exempt the current process from being frozen for a certain event
Index: linux-2.6.21-rc5/kernel/cpu.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/cpu.c
+++ linux-2.6.21-rc5/kernel/cpu.c
@@ -14,6 +14,7 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <linux/freezer.h>
 
 /* This protects CPUs going up and down... */
 static DEFINE_MUTEX(cpu_add_remove_lock);
@@ -28,38 +29,15 @@ static int cpu_hotplug_disabled;
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-/* Crappy recursive lock-takers in cpufreq! Complain loudly about idiots */
-static struct task_struct *recursive;
-static int recursive_depth;
-
 void lock_cpu_hotplug(void)
 {
-	struct task_struct *tsk = current;
-
-	if (tsk == recursive) {
-		static int warnings = 10;
-		if (warnings) {
-			printk(KERN_ERR "Lukewarm IQ detected in hotplug locking\n");
-			WARN_ON(1);
-			warnings--;
-		}
-		recursive_depth++;
-		return;
-	}
-	mutex_lock(&cpu_bitmask_lock);
-	recursive = tsk;
+	return;
 }
 EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
 
 void unlock_cpu_hotplug(void)
 {
-	WARN_ON(recursive != current);
-	if (recursive_depth) {
-		recursive_depth--;
-		return;
-	}
-	recursive = NULL;
-	mutex_unlock(&cpu_bitmask_lock);
+	return;
 }
 EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
 
@@ -133,7 +111,11 @@ static int _cpu_down(unsigned int cpu)
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	if (freeze_processes(FE_HOTPLUG_CPU)) {
+		thaw_processes(FE_HOTPLUG_CPU);
+		return -EBUSY;
+	}
+
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
 					hcpu, -1, &nr_calls);
 	if (err == NOTIFY_BAD) {
@@ -142,7 +124,7 @@ static int _cpu_down(unsigned int cpu)
 		printk("%s: attempt to take down CPU %u failed\n",
 				__FUNCTION__, cpu);
 		err = -EINVAL;
-		goto out_release;
+		goto out_thread;
 	}
 
 	/* Ensure that we are not runnable on dying cpu */
@@ -151,9 +133,7 @@ static int _cpu_down(unsigned int cpu)
 	cpu_clear(cpu, tmp);
 	set_cpus_allowed(current, tmp);
 
-	mutex_lock(&cpu_bitmask_lock);
 	p = __stop_machine_run(take_cpu_down, NULL, cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 
 	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
@@ -161,10 +141,13 @@ static int _cpu_down(unsigned int cpu)
 					    hcpu) == NOTIFY_BAD)
 			BUG();
 
-		if (IS_ERR(p)) {
+		set_cpus_allowed(current, old_allowed);
+
+		if (IS_ERR(p))
 			err = PTR_ERR(p);
-			goto out_allowed;
-		}
+		else
+			err = kthread_stop(p);
+
 		goto out_thread;
 	}
 
@@ -184,14 +167,11 @@ static int _cpu_down(unsigned int cpu)
 		BUG();
 
 	check_for_tasks(cpu);
+	set_cpus_allowed(current, old_allowed);
+	err = kthread_stop(p);
 
 out_thread:
-	err = kthread_stop(p);
-out_allowed:
-	set_cpus_allowed(current, old_allowed);
-out_release:
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE,
-						(void *)(long)cpu);
+	thaw_processes(FE_HOTPLUG_CPU);
 	return err;
 }
 
@@ -219,7 +199,12 @@ static int __cpuinit _cpu_up(unsigned in
 	if (cpu_online(cpu) || !cpu_present(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	if (system_state == SYSTEM_RUNNING &&
+		freeze_processes(FE_HOTPLUG_CPU)) {
+		thaw_processes(FE_HOTPLUG_CPU);
+		return -EBUSY;
+	}
+
 	ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu,
 							-1, &nr_calls);
 	if (ret == NOTIFY_BAD) {
@@ -229,10 +214,9 @@ static int __cpuinit _cpu_up(unsigned in
 		goto out_notify;
 	}
 
+
 	/* Arch-specific enabling code. */
-	mutex_lock(&cpu_bitmask_lock);
 	ret = __cpu_up(cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
@@ -244,7 +228,9 @@ out_notify:
 	if (ret != 0)
 		__raw_notifier_call_chain(&cpu_chain,
 				CPU_UP_CANCELED, hcpu, nr_calls, NULL);
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
+
+	if (system_state == SYSTEM_RUNNING)
+		thaw_processes(FE_HOTPLUG_CPU);
 
 	return ret;
 }
Index: linux-2.6.21-rc5/include/linux/notifier.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/notifier.h
+++ linux-2.6.21-rc5/include/linux/notifier.h
@@ -194,8 +194,6 @@ extern int __srcu_notifier_call_chain(st
 #define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
 #define CPU_DOWN_FAILED		0x0006 /* CPU (unsigned)v NOT going down */
 #define CPU_DEAD		0x0007 /* CPU (unsigned)v dead */
-#define CPU_LOCK_ACQUIRE	0x0008 /* Acquire all hotcpu locks */
-#define CPU_LOCK_RELEASE	0x0009 /* Release all hotcpu locks */
 
 #endif /* __KERNEL__ */
 #endif /* _LINUX_NOTIFIER_H */
Index: linux-2.6.21-rc5/kernel/softirq.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/softirq.c
+++ linux-2.6.21-rc5/kernel/softirq.c
@@ -490,12 +490,16 @@ void __init softirq_init(void)
 static int ksoftirqd(void * __bind_cpu)
 {
 	set_user_nice(current, 19);
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	set_current_state(TASK_INTERRUPTIBLE);
 
 	while (!kthread_should_stop()) {
 		try_to_freeze();
+		/* When thawed, if cpu is no longer online, die */
+		if (cpu_is_offline((long)__bind_cpu))
+			goto wait_to_die;
+
 		preempt_disable();
 		if (!local_softirq_pending()) {
 			preempt_enable_no_resched();
@@ -506,11 +510,6 @@ static int ksoftirqd(void * __bind_cpu)
 		__set_current_state(TASK_RUNNING);
 
 		while (local_softirq_pending()) {
-			/* Preempt disable stops cpu going offline.
-			   If already offline, we'll be on wrong CPU:
-			   don't process */
-			if (cpu_is_offline((long)__bind_cpu))
-				goto wait_to_die;
 			do_softirq();
 			preempt_enable_no_resched();
 			cond_resched();
@@ -523,7 +522,6 @@ static int ksoftirqd(void * __bind_cpu)
 	return 0;
 
 wait_to_die:
-	preempt_enable();
 	/* Wait for kthread_stop */
 	set_current_state(TASK_INTERRUPTIBLE);
 	while (!kthread_should_stop()) {
@@ -616,6 +614,7 @@ static int __cpuinit cpu_callback(struct
 	case CPU_DEAD:
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
+		thaw_process(p);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
Index: linux-2.6.21-rc5/kernel/sched.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/sched.c
+++ linux-2.6.21-rc5/kernel/sched.c
@@ -255,7 +255,6 @@ struct rq {
 };
 
 static DEFINE_PER_CPU(struct rq, runqueues);
-static DEFINE_MUTEX(sched_hotcpu_mutex);
 
 static inline int cpu_of(struct rq *rq)
 {
@@ -4400,13 +4399,11 @@ long sched_setaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		mutex_unlock(&sched_hotcpu_mutex);
 		return -ESRCH;
 	}
 
@@ -4433,7 +4430,6 @@ long sched_setaffinity(pid_t pid, cpumas
 
 out_unlock:
 	put_task_struct(p);
-	mutex_unlock(&sched_hotcpu_mutex);
 	return retval;
 }
 
@@ -4490,7 +4486,6 @@ long sched_getaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -4506,7 +4501,6 @@ long sched_getaffinity(pid_t pid, cpumas
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	mutex_unlock(&sched_hotcpu_mutex);
 	if (retval)
 		return retval;
 
@@ -5400,9 +5394,6 @@ migration_call(struct notifier_block *nf
 	struct rq *rq;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&sched_hotcpu_mutex);
-		break;
 
 	case CPU_UP_PREPARE:
 		p = kthread_create(migration_thread, hcpu, "migration/%d",cpu);
@@ -5435,6 +5426,7 @@ migration_call(struct notifier_block *nf
 	case CPU_DEAD:
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
+		thaw_process(rq->migration_thread);
 		kthread_stop(rq->migration_thread);
 		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
@@ -5447,8 +5439,7 @@ migration_call(struct notifier_block *nf
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 
-		/* No need to migrate the tasks: it was best-effort if
-		 * they didn't take sched_hotcpu_mutex.  Just wake up
+		/* No need to migrate the tasks: Just wake up
 		 * the requestors. */
 		spin_lock_irq(&rq->lock);
 		while (!list_empty(&rq->migration_queue)) {
@@ -5462,9 +5453,6 @@ migration_call(struct notifier_block *nf
 		spin_unlock_irq(&rq->lock);
 		break;
 #endif
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&sched_hotcpu_mutex);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -6813,10 +6801,8 @@ int arch_reinit_sched_domains(void)
 {
 	int err;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
-	mutex_unlock(&sched_hotcpu_mutex);
 
 	return err;
 }
@@ -6921,12 +6907,10 @@ void __init sched_init_smp(void)
 {
 	cpumask_t non_isolated_cpus;
 
-	mutex_lock(&sched_hotcpu_mutex);
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
-	mutex_unlock(&sched_hotcpu_mutex);
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
 
Index: linux-2.6.21-rc5/kernel/softlockup.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/softlockup.c
+++ linux-2.6.21-rc5/kernel/softlockup.c
@@ -147,6 +147,7 @@ cpu_callback(struct notifier_block *nfb,
 	case CPU_DEAD:
 		p = per_cpu(watchdog_task, hotcpu);
 		per_cpu(watchdog_task, hotcpu) = NULL;
+		thaw_process(p);
 		kthread_stop(p);
 		break;
 #endif /* CONFIG_HOTPLUG_CPU */
Index: linux-2.6.21-rc5/kernel/workqueue.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/workqueue.c
+++ linux-2.6.21-rc5/kernel/workqueue.c
@@ -47,7 +47,10 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	struct task_struct *thread;
-	int should_stop;
+	int status;
+	#define CWQ_RUNNING	0
+	#define CWQ_SHOULD_STOP	1
+	#define CWQ_STOPPED	2
 
 	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
@@ -272,13 +275,13 @@ static void run_workqueue(struct cpu_wor
  */
 static int cwq_should_stop(struct cpu_workqueue_struct *cwq)
 {
-	int should_stop = cwq->should_stop;
+	int should_stop = cwq->status;
 
-	if (unlikely(should_stop)) {
+	if (unlikely(should_stop == CWQ_SHOULD_STOP)) {
 		spin_lock_irq(&cwq->lock);
-		should_stop = cwq->should_stop && list_empty(&cwq->worklist);
+		should_stop = cwq->status && list_empty(&cwq->worklist);
 		if (should_stop)
-			cwq->thread = NULL;
+			cwq->status= CWQ_STOPPED;
 		spin_unlock_irq(&cwq->lock);
 	}
 
@@ -311,7 +314,7 @@ static int worker_thread(void *__cwq)
 			try_to_freeze();
 
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
-		if (!cwq->should_stop && list_empty(&cwq->worklist))
+		if (cwq->status == CWQ_RUNNING && list_empty(&cwq->worklist))
 			schedule();
 		finish_wait(&cwq->more_work, &wait);
 
@@ -643,7 +646,7 @@ static int create_workqueue_thread(struc
 		return PTR_ERR(p);
 
 	cwq->thread = p;
-	cwq->should_stop = 0;
+	cwq->status = CWQ_RUNNING;
 	if (!is_single_threaded(wq))
 		kthread_bind(p, cpu);
 
@@ -707,21 +710,23 @@ static void cleanup_workqueue_thread(str
 	spin_lock_irq(&cwq->lock);
 	if (cwq->thread != NULL) {
 		insert_wq_barrier(cwq, &barr, 1);
-		cwq->should_stop = 1;
+		cwq->status = CWQ_SHOULD_STOP;
 		alive = 1;
 	}
 	spin_unlock_irq(&cwq->lock);
 
 	if (alive) {
+		thaw_process(cwq->thread);
 		wait_for_completion(&barr.done);
 
-		while (unlikely(cwq->thread != NULL))
+		while (unlikely(cwq->status != CWQ_STOPPED))
 			cpu_relax();
 		/*
 		 * Wait until cwq->thread unlocks cwq->lock,
 		 * it won't touch *cwq after that.
 		 */
 		smp_rmb();
+		cwq->thread = NULL;
 		spin_unlock_wait(&cwq->lock);
 	}
 }
@@ -761,18 +766,11 @@ static int __devinit workqueue_cpu_callb
 	struct workqueue_struct *wq;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&workqueue_mutex);
-		return NOTIFY_OK;
-
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&workqueue_mutex);
-		return NOTIFY_OK;
-
 	case CPU_UP_PREPARE:
 		cpu_set(cpu, cpu_populated_map);
 	}
 
+	mutex_lock(&workqueue_mutex);
 	list_for_each_entry(wq, &workqueues, list) {
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
 
@@ -795,6 +793,7 @@ static int __devinit workqueue_cpu_callb
 			break;
 		}
 	}
+	mutex_unlock(&workqueue_mutex);
 
 	return NOTIFY_OK;
 }
Index: linux-2.6.21-rc5/mm/slab.c
===================================================================
--- linux-2.6.21-rc5.orig/mm/slab.c
+++ linux-2.6.21-rc5/mm/slab.c
@@ -1186,9 +1186,6 @@ static int __cpuinit cpuup_callback(stru
 	int memsize = sizeof(struct kmem_list3);
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&cache_chain_mutex);
-		break;
 	case CPU_UP_PREPARE:
 		/*
 		 * We need to do this right in the beginning since
@@ -1364,9 +1361,6 @@ free_array_cache:
 			drain_freelist(cachep, l3, l3->free_objects);
 		}
 		break;
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&cache_chain_mutex);
-		break;
 	}
 	return NOTIFY_OK;
 bad:
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 4/8] Rip out lock_cpu_hotplug()
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (2 preceding siblings ...)
  2007-04-02  5:38 ` [PATCH 3/8] Use process freezer for cpu-hotplug Gautham R Shenoy
@ 2007-04-02  5:39 ` Gautham R Shenoy
  2007-04-02  5:40 ` [PATCH 5/8] __cpu_up: use singlethreaded workqueue Gautham R Shenoy
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:39 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

This patch rips out lock_cpu_hotplug from the kernel.
Good Riddance!! (hopefully :) )

Signed-off-by : Gautham R Shenoy <ego@in.ibm.com>
--
 arch/i386/kernel/cpu/mtrr/main.c             |    6 ------
 arch/i386/kernel/microcode.c                 |    8 --------
 arch/mips/kernel/mips-mt.c                   |    5 -----
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    5 -----
 arch/powerpc/platforms/pseries/rtasd.c       |    4 ----
 include/linux/cpu.h                          |   20 --------------------
 kernel/cpu.c                                 |   17 -----------------
 kernel/rcutorture.c                          |    3 ---
 kernel/stop_machine.c                        |    3 ---
 net/core/flow.c                              |    2 --
 10 files changed, 73 deletions(-)

Index: linux-2.6.21-rc4/include/linux/cpu.h
===================================================================
--- linux-2.6.21-rc4.orig/include/linux/cpu.h
+++ linux-2.6.21-rc4/include/linux/cpu.h
@@ -89,18 +89,6 @@ extern struct sysdev_class cpu_sysdev_cl
 #ifdef CONFIG_HOTPLUG_CPU
 /* Stop CPUs going up and down. */
 
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{
-	mutex_lock(cpu_hp_mutex);
-}
-
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{
-	mutex_unlock(cpu_hp_mutex);
-}
-
-extern void lock_cpu_hotplug(void);
-extern void unlock_cpu_hotplug(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb =			\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -113,14 +101,6 @@ int cpu_down(unsigned int cpu);
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-static inline void cpuhotplug_mutex_lock(struct mutex *cpu_hp_mutex)
-{ }
-static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
-{ }
-
-#define lock_cpu_hotplug()	do { } while (0)
-#define unlock_cpu_hotplug()	do { } while (0)
-#define lock_cpu_hotplug_interruptible() 0
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 #define register_hotcpu_notifier(nb)	do { (void)(nb); } while (0)
 #define unregister_hotcpu_notifier(nb)	do { (void)(nb); } while (0)
Index: linux-2.6.21-rc4/kernel/cpu.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/cpu.c
+++ linux-2.6.21-rc4/kernel/cpu.c
@@ -18,7 +18,6 @@
 
 /* This protects CPUs going up and down... */
 static DEFINE_MUTEX(cpu_add_remove_lock);
-static DEFINE_MUTEX(cpu_bitmask_lock);
 
 static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
 
@@ -27,22 +26,6 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(c
  */
 static int cpu_hotplug_disabled;
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-void lock_cpu_hotplug(void)
-{
-	return;
-}
-EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
-
-void unlock_cpu_hotplug(void)
-{
-	return;
-}
-EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
-
-#endif	/* CONFIG_HOTPLUG_CPU */
-
 /* Need to know about CPUs going up/down? */
 int __cpuinit register_cpu_notifier(struct notifier_block *nb)
 {
Index: linux-2.6.21-rc4/arch/i386/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/i386/kernel/cpu/mtrr/main.c
+++ linux-2.6.21-rc4/arch/i386/kernel/cpu/mtrr/main.c
@@ -346,8 +346,6 @@ int mtrr_add_page(unsigned long base, un
 	error = -EINVAL;
 	replace = -1;
 
-	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
 	/*  Search for existing MTRR  */
 	mutex_lock(&mtrr_mutex);
 	for (i = 0; i < num_var_ranges; ++i) {
@@ -403,7 +401,6 @@ int mtrr_add_page(unsigned long base, un
 	error = i;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
 	return error;
 }
 
@@ -492,8 +489,6 @@ int mtrr_del_page(int reg, unsigned long
 		return -ENXIO;
 
 	max = num_var_ranges;
-	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
 	mutex_lock(&mtrr_mutex);
 	if (reg < 0) {
 		/*  Search for existing MTRR  */
@@ -534,7 +529,6 @@ int mtrr_del_page(int reg, unsigned long
 	error = reg;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
 	return error;
 }
 /**
Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
@@ -435,7 +435,6 @@ static ssize_t microcode_write (struct f
 		return -EINVAL;
 	}
 
-	lock_cpu_hotplug();
 	mutex_lock(&microcode_mutex);
 
 	user_buffer = (void __user *) buf;
@@ -446,7 +445,6 @@ static ssize_t microcode_write (struct f
 		ret = (ssize_t)len;
 
 	mutex_unlock(&microcode_mutex);
-	unlock_cpu_hotplug();
 
 	return ret;
 }
@@ -609,14 +607,12 @@ static ssize_t reload_store(struct sys_d
 
 		old = current->cpus_allowed;
 
-		lock_cpu_hotplug();
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
 
 		mutex_lock(&microcode_mutex);
 		if (uci->valid)
 			err = cpu_request_microcode(cpu);
 		mutex_unlock(&microcode_mutex);
-		unlock_cpu_hotplug();
 		set_cpus_allowed(current, old);
 	}
 	if (err)
@@ -740,9 +736,7 @@ static int __init microcode_init (void)
 		return PTR_ERR(microcode_pdev);
 	}
 
-	lock_cpu_hotplug();
 	error = sysdev_driver_register(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
 	if (error) {
 		microcode_dev_exit();
 		platform_device_unregister(microcode_pdev);
@@ -762,9 +756,7 @@ static void __exit microcode_exit (void)
 
 	unregister_hotcpu_notifier(&mc_cpu_notifier);
 
-	lock_cpu_hotplug();
 	sysdev_driver_unregister(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
 
 	platform_device_unregister(microcode_pdev);
 }
Index: linux-2.6.21-rc4/arch/mips/kernel/mips-mt.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/mips/kernel/mips-mt.c
+++ linux-2.6.21-rc4/arch/mips/kernel/mips-mt.c
@@ -73,13 +73,11 @@ asmlinkage long mipsmt_sys_sched_setaffi
 	if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
 		return -EFAULT;
 
-	lock_cpu_hotplug();
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		unlock_cpu_hotplug();
 		return -ESRCH;
 	}
 
@@ -121,7 +119,6 @@ asmlinkage long mipsmt_sys_sched_setaffi
 
 out_unlock:
 	put_task_struct(p);
-	unlock_cpu_hotplug();
 	return retval;
 }
 
@@ -140,7 +137,6 @@ asmlinkage long mipsmt_sys_sched_getaffi
 	if (len < real_len)
 		return -EINVAL;
 
-	lock_cpu_hotplug();
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -155,7 +151,6 @@ asmlinkage long mipsmt_sys_sched_getaffi
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	unlock_cpu_hotplug();
 	if (retval)
 		return retval;
 	if (copy_to_user(user_mask_ptr, &mask, real_len))
Index: linux-2.6.21-rc4/arch/powerpc/platforms/pseries/hotplug-cpu.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ linux-2.6.21-rc4/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -151,8 +151,6 @@ static int pseries_add_processor(struct 
 	for (i = 0; i < nthreads; i++)
 		cpu_set(i, tmp);
 
-	lock_cpu_hotplug();
-
 	BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map));
 
 	/* Get a bitmap of unoccupied slots. */
@@ -188,7 +186,6 @@ static int pseries_add_processor(struct 
 	}
 	err = 0;
 out_unlock:
-	unlock_cpu_hotplug();
 	return err;
 }
 
@@ -209,7 +206,6 @@ static void pseries_remove_processor(str
 
 	nthreads = len / sizeof(u32);
 
-	lock_cpu_hotplug();
 	for (i = 0; i < nthreads; i++) {
 		for_each_present_cpu(cpu) {
 			if (get_hard_smp_processor_id(cpu) != intserv[i])
@@ -223,7 +219,6 @@ static void pseries_remove_processor(str
 			printk(KERN_WARNING "Could not find cpu to remove "
 			       "with physical id 0x%x\n", intserv[i]);
 	}
-	unlock_cpu_hotplug();
 }
 
 static int pseries_smp_notifier(struct notifier_block *nb,
Index: linux-2.6.21-rc4/arch/powerpc/platforms/pseries/rtasd.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/powerpc/platforms/pseries/rtasd.c
+++ linux-2.6.21-rc4/arch/powerpc/platforms/pseries/rtasd.c
@@ -404,7 +404,6 @@ static void do_event_scan_all_cpus(long 
 {
 	int cpu;
 
-	lock_cpu_hotplug();
 	cpu = first_cpu(cpu_online_map);
 	for (;;) {
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
@@ -412,15 +411,12 @@ static void do_event_scan_all_cpus(long 
 		set_cpus_allowed(current, CPU_MASK_ALL);
 
 		/* Drop hotplug lock, and sleep for the specified delay */
-		unlock_cpu_hotplug();
 		msleep_interruptible(delay);
-		lock_cpu_hotplug();
 
 		cpu = next_cpu(cpu, cpu_online_map);
 		if (cpu == NR_CPUS)
 			break;
 	}
-	unlock_cpu_hotplug();
 }
 
 static int rtasd(void *unused)
Index: linux-2.6.21-rc4/kernel/rcutorture.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/rcutorture.c
+++ linux-2.6.21-rc4/kernel/rcutorture.c
@@ -733,11 +733,9 @@ static void rcu_torture_shuffle_tasks(vo
 	cpumask_t tmp_mask = CPU_MASK_ALL;
 	int i;
 
-	lock_cpu_hotplug();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
 	if (num_online_cpus() == 1) {
-		unlock_cpu_hotplug();
 		return;
 	}
 
@@ -769,7 +767,6 @@ static void rcu_torture_shuffle_tasks(vo
 	else
 		rcu_idle_cpu--;
 
-	unlock_cpu_hotplug();
 }
 
 /* Shuffle tasks across CPUs, with the intent of allowing each CPU in the
Index: linux-2.6.21-rc4/kernel/stop_machine.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/stop_machine.c
+++ linux-2.6.21-rc4/kernel/stop_machine.c
@@ -198,14 +198,11 @@ int stop_machine_run(int (*fn)(void *), 
 	struct task_struct *p;
 	int ret;
 
-	/* No CPUs can come up or down during this. */
-	lock_cpu_hotplug();
 	p = __stop_machine_run(fn, data, cpu);
 	if (!IS_ERR(p))
 		ret = kthread_stop(p);
 	else
 		ret = PTR_ERR(p);
-	unlock_cpu_hotplug();
 
 	return ret;
 }
Index: linux-2.6.21-rc4/net/core/flow.c
===================================================================
--- linux-2.6.21-rc4.orig/net/core/flow.c
+++ linux-2.6.21-rc4/net/core/flow.c
@@ -296,7 +296,6 @@ void flow_cache_flush(void)
 	static DEFINE_MUTEX(flow_flush_sem);
 
 	/* Don't want cpus going down or up during this. */
-	lock_cpu_hotplug();
 	mutex_lock(&flow_flush_sem);
 	atomic_set(&info.cpuleft, num_online_cpus());
 	init_completion(&info.completion);
@@ -308,7 +307,6 @@ void flow_cache_flush(void)
 
 	wait_for_completion(&info.completion);
 	mutex_unlock(&flow_flush_sem);
-	unlock_cpu_hotplug();
 }
 
 static void __devinit flow_cache_cpu_prepare(int cpu)
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 5/8] __cpu_up: use singlethreaded workqueue
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (3 preceding siblings ...)
  2007-04-02  5:39 ` [PATCH 4/8] Rip out lock_cpu_hotplug() Gautham R Shenoy
@ 2007-04-02  5:40 ` Gautham R Shenoy
  2007-04-05 12:08   ` Oleg Nesterov
  2007-04-02  5:41 ` [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default Gautham R Shenoy
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:40 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

Currently i386 and x86_64 __cpu_up uses the services of the kevents
workqueue to bring the cpu up. Change this and use kthread workqueue
instead which is single_threaded and won't be frozen during 
CPU_HOTPLUG.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
--
 arch/i386/kernel/smpboot.c   |    3 ++-
 arch/x86_64/kernel/smpboot.c |    5 +++--
 include/linux/kthread.h      |    5 +++--
 kernel/kthread.c             |   16 +++++++++++++---
 4 files changed, 21 insertions(+), 8 deletions(-)

Index: linux-2.6.21-rc5/include/linux/kthread.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/kthread.h
+++ linux-2.6.21-rc5/include/linux/kthread.h
@@ -3,7 +3,7 @@
 /* Simple interface for creating and stopping kernel threads without mess. */
 #include <linux/err.h>
 #include <linux/sched.h>
-
+#include <linux/workqueue.h>
 struct task_struct *kthread_create(int (*threadfn)(void *data),
 				   void *data,
 				   const char namefmt[], ...);
@@ -29,5 +29,6 @@ struct task_struct *kthread_create(int (
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 int kthread_stop(struct task_struct *k);
 int kthread_should_stop(void);
-
+extern void kthreadwq_queue_work(struct work_struct *w);
+extern int kthreadwq_up(void);
 #endif /* _LINUX_KTHREAD_H */
Index: linux-2.6.21-rc5/kernel/kthread.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/kthread.c
+++ linux-2.6.21-rc5/kernel/kthread.c
@@ -112,7 +112,7 @@ static int kthread(void *_create)
 	return 0;
 }
 
-/* We are keventd: create a thread. */
+/* We are keventd: create a thread. Hmm, Are we?? */
 static void keventd_create_kthread(struct work_struct *work)
 {
 	struct kthread_create_info *create =
@@ -132,6 +132,16 @@ static void keventd_create_kthread(struc
 	complete(&create->done);
 }
 
+void kthreadwq_queue_work(struct work_struct *work)
+{
+	queue_work(helper_wq, work);
+}
+
+int kthreadwq_up()
+{
+	return (helper_wq != NULL);
+}
+
 /**
  * kthread_create - create a kthread.
  * @threadfn: the function to run until signal_pending(current).
@@ -167,10 +177,10 @@ struct task_struct *kthread_create(int (
 	/*
 	 * The workqueue needs to start up first:
 	 */
-	if (!helper_wq)
+	if (!kthreadwq_up())
 		create.work.func(&create.work);
 	else {
-		queue_work(helper_wq, &create.work);
+		kthreadwq_queue_work(&create.work);
 		wait_for_completion(&create.done);
 	}
 	if (!IS_ERR(create.result)) {
Index: linux-2.6.21-rc5/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/kernel/smpboot.c
+++ linux-2.6.21-rc5/arch/i386/kernel/smpboot.c
@@ -46,6 +46,7 @@
 #include <linux/cpu.h>
 #include <linux/percpu.h>
 #include <linux/nmi.h>
+#include <linux/kthread.h>
 
 #include <linux/delay.h>
 #include <linux/mc146818rtc.h>
@@ -968,7 +969,7 @@ static int __cpuinit __smp_prepare_cpu(i
 	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
 			min_t(unsigned long, KERNEL_PGD_PTRS, USER_PGD_PTRS));
 	flush_tlb_all();
-	schedule_work(&info.task);
+	kthreadwq_queue_work(&info.task);
 	wait_for_completion(&done);
 
 	zap_low_mappings();
Index: linux-2.6.21-rc5/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.21-rc5/arch/x86_64/kernel/smpboot.c
@@ -50,6 +50,7 @@
 #include <linux/mc146818rtc.h>
 #include <linux/smp.h>
 #include <linux/kdebug.h>
+#include <linux/kthread.h>
 
 #include <asm/mtrr.h>
 #include <asm/pgalloc.h>
@@ -613,10 +614,10 @@ static int __cpuinit do_boot_cpu(int cpu
 	 * in context of keventd(), we would end up with locking up the keventd
 	 * thread.
 	 */
-	if (!keventd_up() || current_is_keventd())
+	if (!kthreadwq_up())
 		c_idle.work.func(&c_idle.work);
 	else {
-		schedule_work(&c_idle.work);
+		kthreadwq_queue_work(&c_idle.work);
 		wait_for_completion(&c_idle.done);
 	}
 
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (4 preceding siblings ...)
  2007-04-02  5:40 ` [PATCH 5/8] __cpu_up: use singlethreaded workqueue Gautham R Shenoy
@ 2007-04-02  5:41 ` Gautham R Shenoy
  2007-04-05 11:57   ` Oleg Nesterov
  2007-04-02  5:42 ` [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug Gautham R Shenoy
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:41 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

This patch
o Makes all non-singlethreaded workqueues freezeable by default.
o Introduces a new API for creating freeze_exempted workqueues.
o Uses the combination of cancel_delayed_work and cancel_work_sync
  in Slab during DOWN_PREPARE instead of cancel_rearming_delayed work, 
  which tries to flush_workqueue in vain, in a frozen context.

Note:
o The singlethreaded workqueues will not be frozen.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Johannes Berg <johannes@sipsolutions.net>
--
 include/linux/workqueue.h |   12 +++++-------
 kernel/workqueue.c        |   35 ++++++++++++++++++++++++++---------
 mm/slab.c                 |   10 +++-------
 3 files changed, 34 insertions(+), 23 deletions(-)

Index: linux-2.6.21-rc5/include/linux/workqueue.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/workqueue.h
+++ linux-2.6.21-rc5/include/linux/workqueue.h
@@ -9,7 +9,6 @@
 #include <linux/linkage.h>
 #include <linux/bitops.h>
 #include <asm/atomic.h>
-
 struct workqueue_struct;
 
 struct work_struct;
@@ -112,12 +111,11 @@ struct execute_work {
 	clear_bit(WORK_STRUCT_PENDING, work_data_bits(work))
 
 
-extern struct workqueue_struct *__create_workqueue(const char *name,
-						    int singlethread,
-						    int freezeable);
-#define create_workqueue(name) __create_workqueue((name), 0, 0)
-#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0)
+extern struct workqueue_struct *create_workqueue(const char *name);
+
+extern struct workqueue_struct *create_freeze_exempted_workqueue(const char *name, int freeze_exempt_events);
+
+extern struct workqueue_struct *create_singlethread_workqueue(const char *name);
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
Index: linux-2.6.21-rc5/kernel/workqueue.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/workqueue.c
+++ linux-2.6.21-rc5/kernel/workqueue.c
@@ -64,7 +64,7 @@ struct workqueue_struct {
 	struct list_head list;
 	const char *name;
 	int singlethread;
-	int freezeable;		/* Freeze threads during suspend */
+	int freeze_exempt_events;	/* Events not to freeze! */
 };
 
 /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
@@ -294,8 +294,7 @@ static int worker_thread(void *__cwq)
 	DEFINE_WAIT(wait);
 	struct k_sigaction sa;
 
-	if (!cwq->wq->freezeable)
-		freezer_exempt(FE_ALL);
+	freezer_exempt(cwq->wq->freeze_exempt_events);
 
 	/*
 	 * We inherited MPOL_INTERLEAVE from the booting kernel.
@@ -310,8 +309,7 @@ static int worker_thread(void *__cwq)
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
 	for (;;) {
-		if (cwq->wq->freezeable)
-			try_to_freeze();
+		try_to_freeze();
 
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
 		if (cwq->status == CWQ_RUNNING && list_empty(&cwq->worklist))
@@ -656,8 +654,9 @@ static int create_workqueue_thread(struc
 	return 0;
 }
 
-struct workqueue_struct *__create_workqueue(const char *name,
-					    int singlethread, int freezeable)
+static struct workqueue_struct *__create_workqueue(const char *name,
+					    int singlethread,
+					    int freeze_exempt_events)
 {
 	struct workqueue_struct *wq;
 	struct cpu_workqueue_struct *cwq;
@@ -675,7 +674,7 @@ struct workqueue_struct *__create_workqu
 
 	wq->name = name;
 	wq->singlethread = singlethread;
-	wq->freezeable = freezeable;
+	wq->freeze_exempt_events = freeze_exempt_events;
 	INIT_LIST_HEAD(&wq->list);
 
 	if (singlethread) {
@@ -700,7 +699,25 @@ struct workqueue_struct *__create_workqu
 	}
 	return wq;
 }
-EXPORT_SYMBOL_GPL(__create_workqueue);
+
+struct workqueue_struct *create_workqueue(const char *name)
+{
+	return __create_workqueue(name, 0, FE_NONE);
+}
+EXPORT_SYMBOL_GPL(create_workqueue);
+
+struct workqueue_struct *create_freeze_exempted_workqueue(const char *name,
+							int freeze_exempt_events)
+{
+	return __create_workqueue(name, 0, freeze_exempt_events);
+}
+EXPORT_SYMBOL_GPL(create_freeze_exempted_workqueue);
+
+struct workqueue_struct *create_singlethread_workqueue(const char *name)
+{
+	return __create_workqueue(name, 1, FE_ALL);
+}
+EXPORT_SYMBOL_GPL(create_singlethread_workqueue);
 
 static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
Index: linux-2.6.21-rc5/mm/slab.c
===================================================================
--- linux-2.6.21-rc5.orig/mm/slab.c
+++ linux-2.6.21-rc5/mm/slab.c
@@ -1277,13 +1277,9 @@ static int __cpuinit cpuup_callback(stru
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
   	case CPU_DOWN_PREPARE:
-		/*
-		 * Shutdown cache reaper. Note that the cache_chain_mutex is
-		 * held so that if cache_reap() is invoked it cannot do
-		 * anything expensive but will only modify reap_work
-		 * and reschedule the timer.
-		*/
-		cancel_rearming_delayed_work(&per_cpu(reap_work, cpu));
+		/* we are in a frozen context. So this should be safe.*/
+		cancel_delayed_work(&per_cpu(reap_work, cpu));
+		cancel_work_sync(&per_cpu(reap_work, cpu).work);
 		/* Now the cache_reaper is guaranteed to be not running. */
 		per_cpu(reap_work, cpu).work.func = NULL;
   		break;
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (5 preceding siblings ...)
  2007-04-02  5:41 ` [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default Gautham R Shenoy
@ 2007-04-02  5:42 ` Gautham R Shenoy
  2007-04-03 11:47   ` Oleg Nesterov
  2007-04-02  5:42 ` [PATCH 8/8] Make kernel threads freezeable for cpu-hotplug Gautham R Shenoy
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
  8 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:42 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

Clean up workqueue.c from the perspective of freezer-based cpu-hotplug.
This patch
o Removes cpu_populated_map as cpu_online_map is safe to use.
o removes cwq_should_stop and uses kthread_should_stop instead.
o Reintroduces take_over_work from the worker_thread of a downed
  cpu. This means that all non-singlethreaded workqueues *have* to
  be frozen to avoid any races.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
--
 kernel/workqueue.c |  118 ++++++++++++++++++++++-------------------------------
 1 files changed, 50 insertions(+), 68 deletions(-)

Index: linux-2.6.21-rc4/kernel/workqueue.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/workqueue.c
+++ linux-2.6.21-rc4/kernel/workqueue.c
@@ -47,10 +47,6 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	struct task_struct *thread;
-	int status;
-	#define CWQ_RUNNING	0
-	#define CWQ_SHOULD_STOP	1
-	#define CWQ_STOPPED	2
 
 	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
@@ -75,7 +71,6 @@ static LIST_HEAD(workqueues);
 static int singlethread_cpu __read_mostly;
 static cpumask_t cpu_singlethread_map __read_mostly;
 /* optimization, we could use cpu_possible_map */
-static cpumask_t cpu_populated_map __read_mostly;
 
 /* If it's single threaded, it isn't in the list of workqueues. */
 static inline int is_single_threaded(struct workqueue_struct *wq)
@@ -86,7 +81,7 @@ static inline int is_single_threaded(str
 static const cpumask_t *wq_cpu_map(struct workqueue_struct *wq)
 {
 	return is_single_threaded(wq)
-		? &cpu_singlethread_map : &cpu_populated_map;
+		? &cpu_singlethread_map : &cpu_online_map;
 }
 
 static
@@ -270,32 +265,16 @@ static void run_workqueue(struct cpu_wor
 	spin_unlock_irq(&cwq->lock);
 }
 
-/*
- * NOTE: the caller must not touch *cwq if this func returns true
- */
-static int cwq_should_stop(struct cpu_workqueue_struct *cwq)
-{
-	int should_stop = cwq->status;
-
-	if (unlikely(should_stop == CWQ_SHOULD_STOP)) {
-		spin_lock_irq(&cwq->lock);
-		should_stop = cwq->status && list_empty(&cwq->worklist);
-		if (should_stop)
-			cwq->status= CWQ_STOPPED;
-		spin_unlock_irq(&cwq->lock);
-	}
-
-	return should_stop;
-}
 
 static int worker_thread(void *__cwq)
 {
 	struct cpu_workqueue_struct *cwq = __cwq;
+	int bind_cpu;
 	DEFINE_WAIT(wait);
 	struct k_sigaction sa;
 
 	freezer_exempt(cwq->wq->freeze_exempt_events);
-
+	bind_cpu = smp_processor_id();
 	/*
 	 * We inherited MPOL_INTERLEAVE from the booting kernel.
 	 * Set MPOL_DEFAULT to insure node local allocations.
@@ -308,20 +287,28 @@ static int worker_thread(void *__cwq)
 	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
 	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
 
-	for (;;) {
+	while (!kthread_should_stop()) {
 		try_to_freeze();
-
+
+		if (cpu_is_offline(bind_cpu) && !is_single_threaded(cwq->wq))
+			goto wait_to_die;
+
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
-		if (cwq->status == CWQ_RUNNING && list_empty(&cwq->worklist))
+		if (list_empty(&cwq->worklist))
 			schedule();
 		finish_wait(&cwq->more_work, &wait);
 
-		if (cwq_should_stop(cwq))
-			break;
-
 		run_workqueue(cwq);
 	}
 
+wait_to_die:
+	set_current_state(TASK_INTERRUPTIBLE);
+	while(!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+
 	return 0;
 }
 
@@ -386,11 +373,10 @@ static void flush_cpu_workqueue(struct c
  */
 void fastcall flush_workqueue(struct workqueue_struct *wq)
 {
-	const cpumask_t *cpu_map = wq_cpu_map(wq);
 	int cpu;
 
 	might_sleep();
-	for_each_cpu_mask(cpu, *cpu_map)
+	for_each_online_cpu(cpu)
 		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
@@ -644,13 +630,6 @@ static int create_workqueue_thread(struc
 		return PTR_ERR(p);
 
 	cwq->thread = p;
-	cwq->status = CWQ_RUNNING;
-	if (!is_single_threaded(wq))
-		kthread_bind(p, cpu);
-
-	if (is_single_threaded(wq) || cpu_online(cpu))
-		wake_up_process(p);
-
 	return 0;
 }
 
@@ -680,15 +659,21 @@ static struct workqueue_struct *__create
 	if (singlethread) {
 		cwq = init_cpu_workqueue(wq, singlethread_cpu);
 		err = create_workqueue_thread(cwq, singlethread_cpu);
+		wake_up_process(cwq->thread);
 	} else {
 		mutex_lock(&workqueue_mutex);
 		list_add(&wq->list, &workqueues);
 
-		for_each_possible_cpu(cpu) {
+		for_each_online_cpu(cpu) {
 			cwq = init_cpu_workqueue(wq, cpu);
 			if (err || !cpu_online(cpu))
 				continue;
 			err = create_workqueue_thread(cwq, cpu);
+			if (err)
+				continue;
+
+			kthread_bind(cwq->thread, cpu);
+			wake_up_process(cwq->thread);
 		}
 		mutex_unlock(&workqueue_mutex);
 	}
@@ -721,30 +706,11 @@ EXPORT_SYMBOL_GPL(create_singlethread_wo
 
 static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
-	struct wq_barrier barr;
-	int alive = 0;
 
-	spin_lock_irq(&cwq->lock);
 	if (cwq->thread != NULL) {
-		insert_wq_barrier(cwq, &barr, 1);
-		cwq->status = CWQ_SHOULD_STOP;
-		alive = 1;
-	}
-	spin_unlock_irq(&cwq->lock);
-
-	if (alive) {
 		thaw_process(cwq->thread);
-		wait_for_completion(&barr.done);
-
-		while (unlikely(cwq->status != CWQ_STOPPED))
-			cpu_relax();
-		/*
-		 * Wait until cwq->thread unlocks cwq->lock,
-		 * it won't touch *cwq after that.
-		 */
-		smp_rmb();
+		kthread_stop(cwq->thread);
 		cwq->thread = NULL;
-		spin_unlock_wait(&cwq->lock);
 	}
 }
 
@@ -756,15 +722,16 @@ static void cleanup_workqueue_thread(str
  */
 void destroy_workqueue(struct workqueue_struct *wq)
 {
-	const cpumask_t *cpu_map = wq_cpu_map(wq);
 	struct cpu_workqueue_struct *cwq;
 	int cpu;
 
+	flush_workqueue(wq);
+
 	mutex_lock(&workqueue_mutex);
 	list_del(&wq->list);
 	mutex_unlock(&workqueue_mutex);
 
-	for_each_cpu_mask(cpu, *cpu_map) {
+	for_each_online_cpu(cpu) {
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
 		cleanup_workqueue_thread(cwq, cpu);
 	}
@@ -774,6 +741,25 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/*Take the work from this (downed) CPU. */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+	struct list_head list;
+	struct work_struct *work;
+
+	spin_lock_irq(&cwq->lock);
+	list_replace_init(&cwq->worklist, &list);
+
+	while (!list_empty(&list)) {
+		work = list_entry(list.next,struct work_struct,entry);
+		list_del(&work->entry);
+		__queue_work(per_cpu_ptr(wq->cpu_wq, smp_processor_id()), work);
+	}
+	spin_unlock_irq(&cwq->lock);
+}
+
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 						unsigned long action,
 						void *hcpu)
@@ -782,11 +768,6 @@ static int __devinit workqueue_cpu_callb
 	struct cpu_workqueue_struct *cwq;
 	struct workqueue_struct *wq;
 
-	switch (action) {
-	case CPU_UP_PREPARE:
-		cpu_set(cpu, cpu_populated_map);
-	}
-
 	mutex_lock(&workqueue_mutex);
 	list_for_each_entry(wq, &workqueues, list) {
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
@@ -799,6 +780,7 @@ static int __devinit workqueue_cpu_callb
 			return NOTIFY_BAD;
 
 		case CPU_ONLINE:
+			kthread_bind(cwq->thread, cpu);
 			wake_up_process(cwq->thread);
 			break;
 
@@ -806,6 +788,7 @@ static int __devinit workqueue_cpu_callb
 			if (cwq->thread)
 				wake_up_process(cwq->thread);
 		case CPU_DEAD:
+			take_over_work(wq, cpu);
 			cleanup_workqueue_thread(cwq, cpu);
 			break;
 		}
@@ -817,7 +800,6 @@ static int __devinit workqueue_cpu_callb
 
 void __init init_workqueues(void)
 {
-	cpu_populated_map = cpu_online_map;
 	singlethread_cpu = first_cpu(cpu_possible_map);
 	cpu_singlethread_map = cpumask_of_cpu(singlethread_cpu);
 	hotcpu_notifier(workqueue_cpu_callback, 0);
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 8/8] Make kernel threads freezeable for cpu-hotplug
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (6 preceding siblings ...)
  2007-04-02  5:42 ` [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug Gautham R Shenoy
@ 2007-04-02  5:42 ` Gautham R Shenoy
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
  8 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02  5:42 UTC (permalink / raw)
  To: akpm, paulmck, torvalds
  Cc: linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, mingo,
	dipankar, dino, masami.hiramatsu.pt

This patch makes all the kernel_threads (except the migration thread)
freezeable for cpu hotplug.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>

-- 
 arch/i386/kernel/apm.c              |    2 +-
 drivers/block/loop.c                |    2 +-
 drivers/char/apm-emulation.c        |    6 +++---
 drivers/ieee1394/ieee1394_core.c    |    2 +-
 drivers/md/md.c                     |    2 +-
 drivers/mmc/card/queue.c            |    2 +-
 drivers/mtd/mtd_blkdevs.c           |    2 +-
 drivers/scsi/libsas/sas_scsi_host.c |    2 +-
 drivers/scsi/scsi_error.c           |    2 +-
 drivers/usb/storage/usb.c           |    2 +-
 10 files changed, 12 insertions(+), 12 deletions(-)

Index: linux-2.6.21-rc5/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/kernel/apm.c
+++ linux-2.6.21-rc5/arch/i386/kernel/apm.c
@@ -1395,7 +1395,7 @@ static void apm_mainloop(void)
 
 	add_wait_queue(&apm_waitqueue, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 	for (;;) {
 		try_to_freeze();
 		schedule_timeout(APM_CHECK_TIMEOUT);
Index: linux-2.6.21-rc5/drivers/block/loop.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/block/loop.c
+++ linux-2.6.21-rc5/drivers/block/loop.c
@@ -587,7 +587,7 @@ static int loop_thread(void *data)
 	 * hence, it mustn't be stopped at all
 	 * because it could be indirectly used during suspension
 	 */
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	set_user_nice(current, -20);
 
Index: linux-2.6.21-rc5/drivers/char/apm-emulation.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/char/apm-emulation.c
+++ linux-2.6.21-rc5/drivers/char/apm-emulation.c
@@ -336,7 +336,7 @@ apm_ioctl(struct inode * inode, struct f
 			 * threads.
 			 */
 			flags = current->flags;
-			freezer_exempt(FE_ALL);
+			freezer_exempt(FE_SUSPEND_KPROBES);
 
 			wait_event(apm_suspend_waitqueue,
 				   as->suspend_state == SUSPEND_DONE);
@@ -372,7 +372,7 @@ apm_ioctl(struct inode * inode, struct f
 			 * threads.
 			 */
 			flags = current->flags;
-			freezer_exempt(FE_ALL);
+			freezer_exempt(FE_SUSPEND_KPROBES);
 
 			wait_event_interruptible(apm_suspend_waitqueue,
 					 as->suspend_state == SUSPEND_DONE);
@@ -536,7 +536,7 @@ static int apm_get_info(char *buf, char 
 
 static int kapmd(void *arg)
 {
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 	do {
 		apm_event_t event;
 		int ret;
Index: linux-2.6.21-rc5/drivers/ieee1394/ieee1394_core.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/ieee1394/ieee1394_core.c
+++ linux-2.6.21-rc5/drivers/ieee1394/ieee1394_core.c
@@ -1134,7 +1134,7 @@ static int hpsbpkt_thread(void *__hi)
 	struct list_head tmp;
 	int may_schedule;
 
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	while (!kthread_should_stop()) {
 
Index: linux-2.6.21-rc5/drivers/md/md.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/md/md.c
+++ linux-2.6.21-rc5/drivers/md/md.c
@@ -4527,7 +4527,7 @@ static int md_thread(void * arg)
 	 * many dirty RAID5 blocks.
 	 */
 
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 	allow_signal(SIGKILL);
 	while (!kthread_should_stop()) {
 
Index: linux-2.6.21-rc5/drivers/mtd/mtd_blkdevs.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/mtd/mtd_blkdevs.c
+++ linux-2.6.21-rc5/drivers/mtd/mtd_blkdevs.c
@@ -83,7 +83,7 @@ static int mtd_blktrans_thread(void *arg
 
 	/* we might get involved when memory gets low, so use PF_MEMALLOC */
 	current->flags |= PF_MEMALLOC;
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	daemonize("%sd", tr->name);
 
Index: linux-2.6.21-rc5/drivers/scsi/libsas/sas_scsi_host.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/scsi/libsas/sas_scsi_host.c
+++ linux-2.6.21-rc5/drivers/scsi/libsas/sas_scsi_host.c
@@ -871,7 +871,7 @@ static int sas_queue_thread(void *_sas_h
 	struct scsi_core *core = &sas_ha->core;
 
 	daemonize("sas_queue_%d", core->shost->host_no);
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	complete(&queue_th_comp);
 
Index: linux-2.6.21-rc5/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/scsi/scsi_error.c
+++ linux-2.6.21-rc5/drivers/scsi/scsi_error.c
@@ -1536,7 +1536,7 @@ int scsi_error_handler(void *data)
 {
 	struct Scsi_Host *shost = data;
 
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	/*
 	 * We use TASK_INTERRUPTIBLE so that the thread is not
Index: linux-2.6.21-rc5/drivers/usb/storage/usb.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/usb/storage/usb.c
+++ linux-2.6.21-rc5/drivers/usb/storage/usb.c
@@ -301,7 +301,7 @@ static int usb_stor_control_thread(void 
 	struct us_data *us = (struct us_data *)__us;
 	struct Scsi_Host *host = us_to_host(us);
 
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	for(;;) {
 		try_to_freeze();
Index: linux-2.6.21-rc5/drivers/mmc/card/queue.c
===================================================================
--- linux-2.6.21-rc5.orig/drivers/mmc/card/queue.c
+++ linux-2.6.21-rc5/drivers/mmc/card/queue.c
@@ -67,7 +67,7 @@ static int mmc_queue_thread(void *d)
 	 * the process freezing.  We handle suspension ourselves.
 	 */
 	current->flags |= PF_MEMALLOC;
-	freezer_exempt(FE_ALL);
+	freezer_exempt(FE_SUSPEND_KPROBES);
 
 	down(&mq->thread_sem);
 	do {
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
                   ` (7 preceding siblings ...)
  2007-04-02  5:42 ` [PATCH 8/8] Make kernel threads freezeable for cpu-hotplug Gautham R Shenoy
@ 2007-04-02  6:16 ` Ingo Molnar
  2007-04-02  9:28   ` Srivatsa Vaddagiri
                     ` (4 more replies)
  8 siblings, 5 replies; 70+ messages in thread
From: Ingo Molnar @ 2007-04-02  6:16 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, dipankar, dino, masami.hiramatsu.pt


* Gautham R Shenoy <ego@in.ibm.com> wrote:

> Hello Everybody,
> 
> This is another attempt towards process-freezer based cpu-hotplug. 
> This patchset covers just about everything that was discussed on the 
> LKML with respect to the freezer-based cpu-hotplug.

wow - you have made really nice progress!

> I believe that the reasons for freezer failing as N increases are :
> - 'make -jN' keeps forking new tasks every now and then, thereby resulting
>   in a never-ending catching up game in the do_while loop inside
>   try_to_freeze_tasks (kernel/power/process.c)

hm, shouldnt the make be frozen immediately?

doesnt the 'please freeze ASAP' flag get propagated to all tasks, 
immediately? After that point any cloning activity should duplicate that 
flag too, resulting in any new child freezing immediately too.

> Instead of waiting for all the tasks to call try_to_freeze in the 
> above mentioned do_while loop, I wonder if we can put some hooks in 
> sched.c so asto not schedule the task marked PF_FREEZING/PF_FROZEN.

we could definitely do that - but i think it should be unnecessary: if 
we mark all tasks as PF_FREEZING atomically, that should result in 
_every_ task immediately dropping dead (once they get back from 
TASK_UNINTERRUPTIBLE). No excuses. If there's some longer delay then 
that can only be explained by some new cloned task/thread slipping 
through the net somehow. (i.e. the PF_FREEZING flag not being duplicated 
across fork?)

i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
freezer: are they assumed frozen immediately, or do we wait until they 
notice their PF_FREEZING and go into try_to_freeze()? I'd expect 
TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be 
the primary source for freezing 'failures')

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
@ 2007-04-02  9:28   ` Srivatsa Vaddagiri
  2007-04-02 11:18     ` Ingo Molnar
  2007-04-02 11:19   ` Gautham R Shenoy
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-02  9:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> hm, shouldnt the make be frozen immediately?
> 
> doesnt the 'please freeze ASAP' flag get propagated to all tasks, 
> immediately? After that point any cloning activity should duplicate that 
> flag too, resulting in any new child freezing immediately too.

afaics, setting the 'please freeze asap' flag is racy wrt dup_task_struct
(where the child's tsk->thread_info->flags are copied from its parent?). 
Secondly, from what I understand, it takes a 'flag to be set + signal marked
pending' for the child task to be frozen. If that is the case, then
copy_process may not propogae the signal to the child, which could mean
mean that we can be in a catch-up game in freeze_processes, trying to
freeze processes we didnt see in earlier passes.

I think copy_process() can check for something like this:

	write_lock_irq(&tasklist_lock);

	...

	if (freezing(current))
		freeze_process(p);	/* function exported by freezer */

	...

	write_unlock_irq(&tasklist_lock);


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  9:28   ` Srivatsa Vaddagiri
@ 2007-04-02 11:18     ` Ingo Molnar
  2007-04-02 12:42       ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-02 11:18 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt


* Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:

> On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> > hm, shouldnt the make be frozen immediately?
> > 
> > doesnt the 'please freeze ASAP' flag get propagated to all tasks, 
> > immediately? After that point any cloning activity should duplicate 
> > that flag too, resulting in any new child freezing immediately too.
> 
> afaics, setting the 'please freeze asap' flag is racy wrt 
> dup_task_struct (where the child's tsk->thread_info->flags are copied 
> from its parent?). Secondly, from what I understand, it takes a 'flag 
> to be set + signal marked pending' for the child task to be frozen. If 
> that is the case, then copy_process may not propogae the signal to the 
> child, which could mean mean that we can be in a catch-up game in 
> freeze_processes, trying to freeze processes we didnt see in earlier 
> passes.
> 
> I think copy_process() can check for something like this:
> 
> 	write_lock_irq(&tasklist_lock);
> 
> 	...
> 
> 	if (freezing(current))
> 		freeze_process(p);	/* function exported by freezer */

yeah. (is that safe with tasklist_lock held?)

i'm wondering whether we could do even better than the signal approach. 
I _think_ the best approach would be to only wait for tasks that are _on 
the runqueue_. I.e. any task that has scheduled away with 
TASK_UNINTERRUPTIBLE (and might not be able to process signal events for 
a long time) is still freezable because it scheduled away.

the only freeze-unsafe task is one that is on the runqueue, executing 
some unknown kernel code. But the number of those is typically pretty 
low, even with very large make -j task-counts.

now, the current approach approximates that set of tasks, but not 
completely: in particular TASK_UNINTERRUPTIBLE sleeping threads can 
introduce arbitrary long delays (and hence freezing failures). [in 
addition to any fork-related 'leaks' of freeze-notification]

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
  2007-04-02  9:28   ` Srivatsa Vaddagiri
@ 2007-04-02 11:19   ` Gautham R Shenoy
  2007-04-02 11:27     ` Ingo Molnar
  2007-04-02 13:22   ` Pavel Machek
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02 11:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, dipankar, dino, masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@in.ibm.com> wrote:
> 
> > Hello Everybody,
> > 
> > This is another attempt towards process-freezer based cpu-hotplug. 
> > This patchset covers just about everything that was discussed on the 
> > LKML with respect to the freezer-based cpu-hotplug.
> 
> wow - you have made really nice progress!

The discussions on the list helped clear up a lot of issues.

> 
> > I believe that the reasons for freezer failing as N increases are :
> > - 'make -jN' keeps forking new tasks every now and then, thereby resulting
> >   in a never-ending catching up game in the do_while loop inside
> >   try_to_freeze_tasks (kernel/power/process.c)
> 
> hm, shouldnt the make be frozen immediately?
> 
> doesnt the 'please freeze ASAP' flag get propagated to all tasks, 
> immediately? After that point any cloning activity should duplicate that 
> flag too, resulting in any new child freezing immediately too.
> 
> > Instead of waiting for all the tasks to call try_to_freeze in the 
> > above mentioned do_while loop, I wonder if we can put some hooks in 
> > sched.c so asto not schedule the task marked PF_FREEZING/PF_FROZEN.
> 
> we could definitely do that - but i think it should be unnecessary: if 
> we mark all tasks as PF_FREEZING atomically, that should result in 
> _every_ task immediately dropping dead (once they get back from 
> TASK_UNINTERRUPTIBLE). No excuses. If there's some longer delay then 
> that can only be explained by some new cloned task/thread slipping 
> through the net somehow. (i.e. the PF_FREEZING flag not being duplicated 
> across fork?)
> 

I will try again  Vatsa's suggestion of having a 

if (freezing(current))
	freeze_process(p);

in copy processes() and check if we can do away with the fork race. 
That sounds lot simpler than the scheduler hooks.

> i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
> freezer: are they assumed frozen immediately, or do we wait until they 
> notice their PF_FREEZING and go into try_to_freeze()? I'd expect 
> TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be 
> the primary source for freezing 'failures')

>From what I can make out, we fail to freeze if we have some task in
the TASK_UNINTERRUPTIBLE state for more than the timeout period. 

The kernel threads have to call try_to_freeze() explicitly and for the
userspace tasks, try_to_freeze() is called in get_signal_to_deliver().
The system is considered frozen only when *all* the freezeable tasks
call try_to_freeze() one way or the other. This is unlikely in case of
a TASK_UNINTERRUPTIBLE task.

Question is can we have some task in TASK_UNINTERRUPTIBLE state for such
a long duration (20sec) ??

> 
> 	Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 11:19   ` Gautham R Shenoy
@ 2007-04-02 11:27     ` Ingo Molnar
  2007-04-02 22:12       ` Rafael J. Wysocki
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-02 11:27 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, dipankar, dino, masami.hiramatsu.pt


* Gautham R Shenoy <ego@in.ibm.com> wrote:

> From what I can make out, we fail to freeze if we have some task in 
> the TASK_UNINTERRUPTIBLE state for more than the timeout period.

> Question is can we have some task in TASK_UNINTERRUPTIBLE state for 
> such a long duration (20sec) ??

yes, easily so - just have a really long disk queue. Or really heavy 
mutex contention.

i really think we should add a freezing hook to schedule too (no need to 
change anything else - just add a PF_FREEZE check into the schedule() 
function) - and add a wakeup method that moves TASK_UNINTERRUPTIBLE 
tasks to the runqueue but does not touch their task->state.

( the copy_process() handling is still needed, so that no new tasks
  without PF_FREEZE get created that could slip out of control. )

Such a wakeup will cause them to execute again but without disturbing 
their task->state value, at which point a second hook in schedule() 
could catch and freeze them. (and can restart the sleep afterwards, if 
the task is still TASK_UNINTERRUPTIBLE)

i.e. two easy hooks in schedule() plus a try_to_wake_up() variant that 
does not touch p->state. In fact the second hook could be used instead 
of the first one so one might be enough. (I can code up the scheduler 
bits for you if that would be helpful.)

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 11:18     ` Ingo Molnar
@ 2007-04-02 12:42       ` Srivatsa Vaddagiri
  2007-04-02 14:16         ` Gautham R Shenoy
  2007-04-02 18:56         ` Ingo Molnar
  0 siblings, 2 replies; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-02 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 01:18:28PM +0200, Ingo Molnar wrote:
> > 	if (freezing(current))
> > 		freeze_process(p);	/* function exported by freezer */
> 
> yeah. (is that safe with tasklist_lock held?)

from my scan of the code, it appears to be safe ..

> i'm wondering whether we could do even better than the signal approach. 
> I _think_ the best approach would be to only wait for tasks that are _on 
> the runqueue_. I.e. any task that has scheduled away with 
> TASK_UNINTERRUPTIBLE (and might not be able to process signal events for 
> a long time) is still freezable because it scheduled away.

I am slightly uncomfortable with "not waiting for tasks inside the
kernel to get out" part, even if it that is done only for
TASK_UNINTERRUPTIBLE tasks. For ex: consider this:

flush_workqueue() <- One of biggest offenders of lock_cpu_hotplug() to date
	for_each_online_cpu(cpu)
		flush_cpu_workqueue
			TASK_UNINTERRUPTIBLE sleep

If we don't wait for this thread from being frozen "voluntarily" (because it is 
in TASK_UNINTERRUPTIBLE sleep), then flush_workqueue is clearly racy wrt
cpu hotplug.

I would imagine other situations like this are possible where "not waiting
for everyone to /voluntarily/ quiece" can break cpu hotplug. In fact,
the biggest reason why we are moving to freezer based hotplug is the
fact that it quiesces everyone, leading to (hopefully) zero race conditions.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
  2007-04-02  9:28   ` Srivatsa Vaddagiri
  2007-04-02 11:19   ` Gautham R Shenoy
@ 2007-04-02 13:22   ` Pavel Machek
  2007-04-03 12:01   ` Gautham R Shenoy
  2007-04-03 14:01   ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
  4 siblings, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2007-04-02 13:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

Hi!


> i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
> freezer: are they assumed frozen immediately, or do we wait until they 
> notice their PF_FREEZING and go into try_to_freeze()? I'd expect 

They have to wait. We need all tasks frozen _with no locks held_.

							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
@ 2007-04-02 13:56   ` Pavel Machek
  2007-04-02 20:48     ` Rafael J. Wysocki
  2007-04-05  9:46   ` Oleg Nesterov
  1 sibling, 1 reply; 70+ messages in thread
From: Pavel Machek @ 2007-04-02 13:56 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

Hi!

>  This patch provides an interface to extend the use of the process
>  freezer beyond Suspend.
> 
> The tasks can selectively mark themselves to be exempted from specific
> freeze events like SUSPEND /KPROBES/CPU_HOTPLUG.
> 
> This patch however, *does not* sort non freezable threads into
> different categories based on the freeze events. Thus all 
> tasks which were previously marked PF_NOFREEZE are now
> exempted from freezer using 
> 	freezer_exempt(FE_ALL);
> which means exempt from all kinds of freezes.
> 
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Actually no, I was not in cc. 

> +/* Per process freezer specific flags */
> +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> +					 * for suspend
> +					 */
> +
> +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> +					 * for Kprobes
> +					 */

Just put the comment before the define for long comments?


> -#ifdef CONFIG_PM
> +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> +					defined(CONFIG_KPROBES)

Should we create CONFIG_FREEZER?

> Index: linux-2.6.21-rc5/kernel/softlockup.c
> ===================================================================
> --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> +++ linux-2.6.21-rc5/kernel/softlockup.c
> @@ -96,7 +96,7 @@ static int watchdog(void * __bind_cpu)
>  	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
>  
>  	sched_setscheduler(current, SCHED_FIFO, &param);
> -	current->flags |= PF_NOFREEZE;
> +	freezer_exempt(FE_ALL);
>  
>  	/*
>  	 * Run briefly once per second to reset the softlockup timestamp.

Hmmm, I do not really like softlockup watchdog running during suspend.
Can we make this freezeable and make watchdog shut itself off while
suspending?

> Index: linux-2.6.21-rc5/kernel/rcutorture.c
> ===================================================================
> --- linux-2.6.21-rc5.orig/kernel/rcutorture.c
> +++ linux-2.6.21-rc5/kernel/rcutorture.c
> @@ -559,7 +559,7 @@ rcu_torture_fakewriter(void *arg)
>  
>  	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task started");
>  	set_user_nice(current, 19);
> -	current->flags |= PF_NOFREEZE;
> +	freezer_exempt(FE_ALL);


Fix rcutorture instead. It has no business running while suspending.

>  
>  	do {
>  		schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
> @@ -590,7 +590,7 @@ rcu_torture_reader(void *arg)
>  
>  	VERBOSE_PRINTK_STRING("rcu_torture_reader task started");
>  	set_user_nice(current, 19);
> -	current->flags |= PF_NOFREEZE;
> +	freezer_exempt(FE_ALL);
>  

Same here.

Eventually, we should fix apm, too.

> Index: linux-2.6.21-rc5/init/do_mounts_initrd.c
> ===================================================================
> --- linux-2.6.21-rc5.orig/init/do_mounts_initrd.c
> +++ linux-2.6.21-rc5/init/do_mounts_initrd.c
> @@ -55,7 +55,7 @@ static void __init handle_initrd(void)
>  	sys_mount(".", "/", NULL, MS_MOVE, NULL);
>  	sys_chroot(".");
>  
> -	current->flags |= PF_NOFREEZE;
> +	freezer_exempt(FE_ALL);
>  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
>  	if (pid > 0) {
>  		while (pid != sys_wait4(-1, NULL, 0, NULL))

Does this mean we have userland /linuxrc running with PF_NOFREEZE?
That would be very bad...

> --- linux-2.6.21-rc5.orig/kernel/kprobes.c
> +++ linux-2.6.21-rc5/kernel/kprobes.c
> @@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
>  {
>  	int ret = 0;
>  #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)

Eh? Why does kprobes code depend on config_pm?

						Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 12:42       ` Srivatsa Vaddagiri
@ 2007-04-02 14:16         ` Gautham R Shenoy
  2007-04-02 18:56         ` Ingo Molnar
  1 sibling, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-02 14:16 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Ingo Molnar, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 06:12:00PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, Apr 02, 2007 at 01:18:28PM +0200, Ingo Molnar wrote:
> > > 	if (freezing(current))
> > > 		freeze_process(p);	/* function exported by freezer */
> > 
> > yeah. (is that safe with tasklist_lock held?)
> 
> from my scan of the code, it appears to be safe ..

I quickly coded this up and ran my tests again. Unfortunately, the
results are negative. I printk'd the state of the unfrozen task and 
this is how the serial console output looks like:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stopping user space processes timed out after 20 seconds (7 tasks
refusing to freeze)
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
make:TASK_UNINTERRUPTIBLE
Restarting tasks ... done.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I couldn't see any reduction in the number of
freeze-failures. Thus Ingo right.  These failures are due
to the TASK_UNINTERRUPTIBLE tasks which fail to freeze within the
timeout period and not because of the fork race.

> 
> > i'm wondering whether we could do even better than the signal approach. 
> > I _think_ the best approach would be to only wait for tasks that are _on 
> > the runqueue_. I.e. any task that has scheduled away with 
> > TASK_UNINTERRUPTIBLE (and might not be able to process signal events for 
> > a long time) is still freezable because it scheduled away.
> 
> I am slightly uncomfortable with "not waiting for tasks inside the
> kernel to get out" part, even if it that is done only for
> TASK_UNINTERRUPTIBLE tasks. For ex: consider this:
> 
> flush_workqueue() <- One of biggest offenders of lock_cpu_hotplug() to date
> 	for_each_online_cpu(cpu)
> 		flush_cpu_workqueue
> 			TASK_UNINTERRUPTIBLE sleep
> 
> If we don't wait for this thread from being frozen "voluntarily" (because it is 
> in TASK_UNINTERRUPTIBLE sleep), then flush_workqueue is clearly racy wrt
> cpu hotplug.
> 

The other option would be to have another state equivalent to 
TASK_UNINTERRUPTIBLE_NOFREEZE, so that Ingo's solution can be applied
for regular TASK_UNINTERRUPTIBLE tasks. Thus TASK_UNINTERRUPTBLE tasks
from a for_each_online_cpu context should now do a
TASK_UNINTERRUPTIBLE_NOFREEZE instead. 

Hmm, lots of audit and hence defeats the purpose.

> I would imagine other situations like this are possible where "not waiting
> for everyone to /voluntarily/ quiece" can break cpu hotplug. In fact,
> the biggest reason why we are moving to freezer based hotplug is the
> fact that it quiesces everyone, leading to (hopefully) zero race conditions.
> 
> -- 
> Regards,
> vatsa

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 12:42       ` Srivatsa Vaddagiri
  2007-04-02 14:16         ` Gautham R Shenoy
@ 2007-04-02 18:56         ` Ingo Molnar
  2007-04-03 12:56           ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-02 18:56 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt


* Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:

> flush_workqueue() <- One of biggest offenders of lock_cpu_hotplug() to date
> 	for_each_online_cpu(cpu)
> 		flush_cpu_workqueue
> 			TASK_UNINTERRUPTIBLE sleep
> 
> If we don't wait for this thread from being frozen "voluntarily" 
> (because it is in TASK_UNINTERRUPTIBLE sleep), then flush_workqueue is 
> clearly racy wrt cpu hotplug.

ok. But the only real problem would be for_each_online_cpu() loops that 
might sleep, correct? I did a quick audit and those seem to be in the 
minority by a factor of 1:10.

So ... to make the audit obviously safe, how about mechanically 
converting 100% of the for_each_online_cpu() loops to something like:

	mask = get_each_online_cpu_mask();
	for_each_cpu_mask(mask) {
		...
	}
	put_each_online_cpu_mask(mask);

where get_each_online_cpu_mask() also does a preempt_disable() 
implicitly, and put_each_online_cpu_mask() does a preempt_enable(). 
(Note that no locking is needed - only preemption-disabling.)

the 10% loops that _can_ schedule would trigger the __might_sleep() 
atomicity test in schedule()), and those would have to be converted a 
bit more cleverly, on a case by case basis. (for example a number of 
them might not even have to sleep on the for_each_online_cpu() loop)

hm?

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02 13:56   ` Pavel Machek
@ 2007-04-02 20:48     ` Rafael J. Wysocki
  2007-04-02 20:51       ` Pavel Machek
  2007-04-03  7:59       ` Gautham R Shenoy
  0 siblings, 2 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-02 20:48 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, mingo, dipankar, dino, masami.hiramatsu.pt

On Monday, 2 April 2007 15:56, Pavel Machek wrote:
> Hi!
> 
> >  This patch provides an interface to extend the use of the process
> >  freezer beyond Suspend.
> > 
> > The tasks can selectively mark themselves to be exempted from specific
> > freeze events like SUSPEND /KPROBES/CPU_HOTPLUG.
> > 
> > This patch however, *does not* sort non freezable threads into
> > different categories based on the freeze events. Thus all 
> > tasks which were previously marked PF_NOFREEZE are now
> > exempted from freezer using 
> > 	freezer_exempt(FE_ALL);
> > which means exempt from all kinds of freezes.
> > 
> > Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> > Cc: Pavel Machek <pavel@ucw.cz>
> > Cc: Rafael J. Wysocki <rjw@sisk.pl>
> > Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> 
> Actually no, I was not in cc. 
> 
> > +/* Per process freezer specific flags */
> > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > +					 * for suspend
> > +					 */
> > +
> > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > +					 * for Kprobes
> > +					 */
> 
> Just put the comment before the define for long comments?

Agreed.

> > -#ifdef CONFIG_PM
> > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > +					defined(CONFIG_KPROBES)
> 
> Should we create CONFIG_FREEZER?

Why do you think so?  I think the freezer should be compiled automatically
if any of the above is set, which is what this directive really means.

> > Index: linux-2.6.21-rc5/kernel/softlockup.c
> > ===================================================================
> > --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> > +++ linux-2.6.21-rc5/kernel/softlockup.c
> > @@ -96,7 +96,7 @@ static int watchdog(void * __bind_cpu)
> >  	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> >  
> >  	sched_setscheduler(current, SCHED_FIFO, &param);
> > -	current->flags |= PF_NOFREEZE;
> > +	freezer_exempt(FE_ALL);
> >  
> >  	/*
> >  	 * Run briefly once per second to reset the softlockup timestamp.
> 
> Hmmm, I do not really like softlockup watchdog running during suspend.
> Can we make this freezeable and make watchdog shut itself off while
> suspending?

Generally, I agree, but this patch only replaces the existing instances
of PF_NOFREEZE with the new mechanism.  The changes you're talking about
require a separate patch series (or at least one separate patch), I think, and
they need not be so simple to make.

> > Index: linux-2.6.21-rc5/kernel/rcutorture.c
> > ===================================================================
> > --- linux-2.6.21-rc5.orig/kernel/rcutorture.c
> > +++ linux-2.6.21-rc5/kernel/rcutorture.c
> > @@ -559,7 +559,7 @@ rcu_torture_fakewriter(void *arg)
> >  
> >  	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task started");
> >  	set_user_nice(current, 19);
> > -	current->flags |= PF_NOFREEZE;
> > +	freezer_exempt(FE_ALL);
> 
> 
> Fix rcutorture instead. It has no business running while suspending.
> 
> >  
> >  	do {
> >  		schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
> > @@ -590,7 +590,7 @@ rcu_torture_reader(void *arg)
> >  
> >  	VERBOSE_PRINTK_STRING("rcu_torture_reader task started");
> >  	set_user_nice(current, 19);
> > -	current->flags |= PF_NOFREEZE;
> > +	freezer_exempt(FE_ALL);
> >  
> 
> Same here.
> 
> Eventually, we should fix apm, too.
> 
> > Index: linux-2.6.21-rc5/init/do_mounts_initrd.c
> > ===================================================================
> > --- linux-2.6.21-rc5.orig/init/do_mounts_initrd.c
> > +++ linux-2.6.21-rc5/init/do_mounts_initrd.c
> > @@ -55,7 +55,7 @@ static void __init handle_initrd(void)
> >  	sys_mount(".", "/", NULL, MS_MOVE, NULL);
> >  	sys_chroot(".");
> >  
> > -	current->flags |= PF_NOFREEZE;
> > +	freezer_exempt(FE_ALL);
> >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> >  	if (pid > 0) {
> >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> 
> Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> That would be very bad...

No, actually it is _required_ for the userland resume to work.  Well, perhaps
I should place a comment in there so that I don't have to explain this again
and again. :-)

> > --- linux-2.6.21-rc5.orig/kernel/kprobes.c
> > +++ linux-2.6.21-rc5/kernel/kprobes.c
> > @@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
> >  {
> >  	int ret = 0;
> >  #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
> 
> Eh? Why does kprobes code depend on config_pm?

Because it uses the freezer? ;-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02 20:48     ` Rafael J. Wysocki
@ 2007-04-02 20:51       ` Pavel Machek
  2007-04-06 14:34         ` Rafael J. Wysocki
  2007-04-09  3:04         ` Gautham R Shenoy
  2007-04-03  7:59       ` Gautham R Shenoy
  1 sibling, 2 replies; 70+ messages in thread
From: Pavel Machek @ 2007-04-02 20:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, mingo, dipankar, dino, masami.hiramatsu.pt

Hi!

> > > +/* Per process freezer specific flags */
> > > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > > +					 * for suspend
> > > +					 */
> > > +
> > > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > > +					 * for Kprobes
> > > +					 */
> > 
> > Just put the comment before the define for long comments?
> 
> Agreed.

(Actually it would be nice to say

/* This thread should not be frozen for suspend, becuase it is needed
   for getting image saved to disk */

> > > -#ifdef CONFIG_PM
> > > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > > +					defined(CONFIG_KPROBES)
> > 
> > Should we create CONFIG_FREEZER?
> 
> Why do you think so?  I think the freezer should be compiled automatically
> if any of the above is set, which is what this directive really means.

Kconfig can do that. ("select statement"). If we have one such ifdef,
it is okay, but if it would be more of them.

> > Hmmm, I do not really like softlockup watchdog running during suspend.
> > Can we make this freezeable and make watchdog shut itself off while
> > suspending?
> 
> Generally, I agree, but this patch only replaces the existing instances
> of PF_NOFREEZE with the new mechanism.  The changes you're talking about
> require a separate patch series (or at least one separate patch), I think, and
> they need not be so simple to make.

Agreed about separate patch series.

> > > -	current->flags |= PF_NOFREEZE;
> > > +	freezer_exempt(FE_ALL);
> > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > >  	if (pid > 0) {
> > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > 
> > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > That would be very bad...
> 
> No, actually it is _required_ for the userland resume to work.  Well, perhaps
> I should place a comment in there so that I don't have to explain this again
> and again. :-)

Can you put big bold comment there?

Why is it needed? Freezer never freezes _current_ task.

> > > @@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
> > >  {
> > >  	int ret = 0;
> > >  #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
> > 
> > Eh? Why does kprobes code depend on config_pm?
> 
> Because it uses the freezer? ;-)

That is no longer true after this patch... Ugly ifdef above makes sure
freezer is there for kprobes. I'm trying to say that #if above is
now broken. Actually it was probably always broken, but it just became
more so.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 11:27     ` Ingo Molnar
@ 2007-04-02 22:12       ` Rafael J. Wysocki
  0 siblings, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-02 22:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, dipankar, dino, masami.hiramatsu.pt, Pavel Machek

On Monday, 2 April 2007 13:27, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@in.ibm.com> wrote:
> 
> > From what I can make out, we fail to freeze if we have some task in 
> > the TASK_UNINTERRUPTIBLE state for more than the timeout period.
> 
> > Question is can we have some task in TASK_UNINTERRUPTIBLE state for 
> > such a long duration (20sec) ??
> 
> yes, easily so - just have a really long disk queue. Or really heavy 
> mutex contention.
> 
> i really think we should add a freezing hook to schedule too (no need to 
> change anything else - just add a PF_FREEZE check into the schedule() 
> function) - and add a wakeup method that moves TASK_UNINTERRUPTIBLE 
> tasks to the runqueue but does not touch their task->state.

Yes, something like this would be necessary to handle TASK_UNINTERRUPTIBLE
tasks.  _Still_, currently we just fail the freezing if there are
uninterruptible tasks, because they can hold locks and may potentially
deadlock with device drivers' .suspend() or .resume() routines (please remember
that the freezer is used for suspending in the first place :-)).

I think we can freeze uninterruptible tasks for the CPU hotplug, but we should
avoid freezing them for the suspend.
 
> ( the copy_process() handling is still needed, so that no new tasks
>   without PF_FREEZE get created that could slip out of control. )

We can catch them while we're freezing kernel threads (provided that the kernel
threads themselves aren't forking like mad, but this doesn't seem to happen in
practice).
 
Greetings,
Rafael

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02 20:48     ` Rafael J. Wysocki
  2007-04-02 20:51       ` Pavel Machek
@ 2007-04-03  7:59       ` Gautham R Shenoy
  1 sibling, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-03  7:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, mingo, dipankar, dino, masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 10:48:24PM +0200, Rafael J. Wysocki wrote:
> On Monday, 2 April 2007 15:56, Pavel Machek wrote:
> > Hi!
> > 
> > >  This patch provides an interface to extend the use of the process
> > >  freezer beyond Suspend.
> > > 
> > > The tasks can selectively mark themselves to be exempted from specific
> > > freeze events like SUSPEND /KPROBES/CPU_HOTPLUG.
> > > 
> > > This patch however, *does not* sort non freezable threads into
> > > different categories based on the freeze events. Thus all 
> > > tasks which were previously marked PF_NOFREEZE are now
> > > exempted from freezer using 
> > > 	freezer_exempt(FE_ALL);
> > > which means exempt from all kinds of freezes.
> > > 
> > > Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> > > Cc: Pavel Machek <pavel@ucw.cz>
> > > Cc: Rafael J. Wysocki <rjw@sisk.pl>
> > > Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> > 
> > Actually no, I was not in cc. 

Oops! Sorry. I knew I had missed something.

> > 
> > > +/* Per process freezer specific flags */
> > > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > > +					 * for suspend
> > > +					 */
> > > +
> > > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > > +					 * for Kprobes
> > > +					 */
> > 
> > Just put the comment before the define for long comments?
> 
> Agreed.

ok, Will do.

> 
> > > -#ifdef CONFIG_PM
> > > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > > +					defined(CONFIG_KPROBES)
> > 
> > Should we create CONFIG_FREEZER?
> 
> Why do you think so?  I think the freezer should be compiled automatically
> if any of the above is set, which is what this directive really means.
> 
> > > Index: linux-2.6.21-rc5/kernel/softlockup.c
> > > ===================================================================
> > > --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> > > +++ linux-2.6.21-rc5/kernel/softlockup.c
> > > @@ -96,7 +96,7 @@ static int watchdog(void * __bind_cpu)
> > >  	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> > >  
> > >  	sched_setscheduler(current, SCHED_FIFO, &param);
> > > -	current->flags |= PF_NOFREEZE;
> > > +	freezer_exempt(FE_ALL);
> > >  
> > >  	/*
> > >  	 * Run briefly once per second to reset the softlockup timestamp.
> > 
> > Hmmm, I do not really like softlockup watchdog running during suspend.
> > Can we make this freezeable and make watchdog shut itself off while
> > suspending?
> 
> Generally, I agree, but this patch only replaces the existing instances
> of PF_NOFREEZE with the new mechanism.  The changes you're talking about
> require a separate patch series (or at least one separate patch), I think, and
> they need not be so simple to make.

Yes.

> 
> > > Index: linux-2.6.21-rc5/kernel/rcutorture.c
> > > ===================================================================
> > > --- linux-2.6.21-rc5.orig/kernel/rcutorture.c
> > > +++ linux-2.6.21-rc5/kernel/rcutorture.c
> > > @@ -559,7 +559,7 @@ rcu_torture_fakewriter(void *arg)
> > >  
> > >  	VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task started");
> > >  	set_user_nice(current, 19);
> > > -	current->flags |= PF_NOFREEZE;
> > > +	freezer_exempt(FE_ALL);
> > 
> > 
> > Fix rcutorture instead. It has no business running while suspending.
> > 
> > >  
> > >  	do {
> > >  		schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10);
> > > @@ -590,7 +590,7 @@ rcu_torture_reader(void *arg)
> > >  
> > >  	VERBOSE_PRINTK_STRING("rcu_torture_reader task started");
> > >  	set_user_nice(current, 19);
> > > -	current->flags |= PF_NOFREEZE;
> > > +	freezer_exempt(FE_ALL);
> > >  
> > 
> > Same here.
> > 
> > Eventually, we should fix apm, too.
> > 
> > > Index: linux-2.6.21-rc5/init/do_mounts_initrd.c
> > > ===================================================================
> > > --- linux-2.6.21-rc5.orig/init/do_mounts_initrd.c
> > > +++ linux-2.6.21-rc5/init/do_mounts_initrd.c
> > > @@ -55,7 +55,7 @@ static void __init handle_initrd(void)
> > >  	sys_mount(".", "/", NULL, MS_MOVE, NULL);
> > >  	sys_chroot(".");
> > >  
> > > -	current->flags |= PF_NOFREEZE;
> > > +	freezer_exempt(FE_ALL);
> > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > >  	if (pid > 0) {
> > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > 
> > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > That would be very bad...
> 
> No, actually it is _required_ for the userland resume to work.  Well, perhaps
> I should place a comment in there so that I don't have to explain this again
> and again. :-)
> 
> > > --- linux-2.6.21-rc5.orig/kernel/kprobes.c
> > > +++ linux-2.6.21-rc5/kernel/kprobes.c
> > > @@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
> > >  {
> > >  	int ret = 0;
> > >  #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
> > 
> > Eh? Why does kprobes code depend on config_pm?
> 
> Because it uses the freezer? ;-)

Is that why?! Then I guess we can remove it. Because the freezer is
going to be compiled in if CONFIG_KPROBES is set.

> 
> Greetings,
> Rafael

thanks
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-02  5:42 ` [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug Gautham R Shenoy
@ 2007-04-03 11:47   ` Oleg Nesterov
  2007-04-03 13:59     ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-03 11:47 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

> On 04/02, Gautham R Shenoy wrote:
>
> Clean up workqueue.c from the perspective of freezer-based cpu-hotplug.
> This patch

I'll study these patches later, a couple of comments after the quick reading.

>   This means that all non-singlethreaded workqueues *have* to
>   be frozen to avoid any races.

We can't freeze workqueue if some work_struct continuously re-queues itself.
Perhaps this is not a problem (I don't see a good reason for such a behaviour),
but this should be documented.

>  static int worker_thread(void *__cwq)
>  {
>  	struct cpu_workqueue_struct *cwq = __cwq;
> +	int bind_cpu;
>  	DEFINE_WAIT(wait);
>  	struct k_sigaction sa;
>
>  	freezer_exempt(cwq->wq->freeze_exempt_events);
> -
> +	bind_cpu = smp_processor_id();
>  	/*
>  	 * We inherited MPOL_INTERLEAVE from the booting kernel.
>  	 * Set MPOL_DEFAULT to insure node local allocations.
> @@ -308,20 +287,28 @@ static int worker_thread(void *__cwq)
>  	siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
>  	do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);
>
> -	for (;;) {
> +	while (!kthread_should_stop()) {
>  		try_to_freeze();
> -
> +
> +		if (cpu_is_offline(bind_cpu) && !is_single_threaded(cwq->wq))
> +			goto wait_to_die;
> +
>  		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
> -		if (cwq->status == CWQ_RUNNING && list_empty(&cwq->worklist))
> +		if (list_empty(&cwq->worklist))
>  			schedule();
>  		finish_wait(&cwq->more_work, &wait);
>
> -		if (cwq_should_stop(cwq))
> -			break;
> -
>  		run_workqueue(cwq);
>  	}
>
> +wait_to_die:
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	while(!kthread_should_stop()) {
> +		schedule();
> +		set_current_state(TASK_INTERRUPTIBLE);
> +	}
> +	__set_current_state(TASK_RUNNING);
> +
>  	return 0;
>  }

I still think that wait_to_die + bind_cpu is unneeded complication.
Why can't we do the following:

	static int worker_thread(void *__cwq)
	{
		...

		for (;;) {
			try_to_freeze();

			prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
			if (!kthread_should_stop() && list_empty(&cwq->worklist))
				schedule();
			finish_wait(&cwq->more_work, &wait);

			if (kthread_should_stop(cwq))
				break;

			run_workqueue(cwq);
		}

		return 0;
	}

?

>  void fastcall flush_workqueue(struct workqueue_struct *wq)
>  {
> -	const cpumask_t *cpu_map = wq_cpu_map(wq);
>  	int cpu;
>
>  	might_sleep();
> -	for_each_cpu_mask(cpu, *cpu_map)
> +	for_each_online_cpu(cpu)
>  		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
>  }

Hm... I can't understand this change. I believe it is wrong.

> @@ -644,13 +630,6 @@ static int create_workqueue_thread(struc
>  		return PTR_ERR(p);
>
>  	cwq->thread = p;
> -	cwq->status = CWQ_RUNNING;
> -	if (!is_single_threaded(wq))
> -		kthread_bind(p, cpu);
> -
> -	if (is_single_threaded(wq) || cpu_online(cpu))
> -		wake_up_process(p);
> -
>  	return 0;
>  }

Well, this is a matter of taste, but I don't like this change. Now we
should add kthread_bind/wake_up_process calls to __create_workqueue()
and workqueue_cpu_callback(). I won't persist though.

> @@ -680,15 +659,21 @@ static struct workqueue_struct *__create
>  	if (singlethread) {
>  		cwq = init_cpu_workqueue(wq, singlethread_cpu);
>  		err = create_workqueue_thread(cwq, singlethread_cpu);
> +		wake_up_process(cwq->thread);
>  	} else {
>  		mutex_lock(&workqueue_mutex);
>  		list_add(&wq->list, &workqueues);
>
> -		for_each_possible_cpu(cpu) {
> +		for_each_online_cpu(cpu) {

This is wrong. CPU_UP_PREPARE doesn't call init_cpu_workqueue().
Easy to fix, but I personally think is is better to initialize
the whole cpu_possible_map.

>  static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
>  {
> -	struct wq_barrier barr;
> -	int alive = 0;
>
> -	spin_lock_irq(&cwq->lock);
>  	if (cwq->thread != NULL) {
> -		insert_wq_barrier(cwq, &barr, 1);
> -		cwq->status = CWQ_SHOULD_STOP;
> -		alive = 1;
> -	}
> -	spin_unlock_irq(&cwq->lock);
> -
> -	if (alive) {
>  		thaw_process(cwq->thread);
> -		wait_for_completion(&barr.done);
> -
> -		while (unlikely(cwq->status != CWQ_STOPPED))
> -			cpu_relax();
> -		/*
> -		 * Wait until cwq->thread unlocks cwq->lock,
> -		 * it won't touch *cwq after that.
> -		 */
> -		smp_rmb();
> +		kthread_stop(cwq->thread);
>  		cwq->thread = NULL;
> -		spin_unlock_wait(&cwq->lock);
>  	}
>  }

Deadlockable. Suppose that the freezing is in progress, cwq->thread is not
frozen yet. cleanup_workqueue_thread() calls thaw_process(cwq->thread),
then cwq->thread() goes to refrigerator, kthread_stop() blocks forever.

> +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> +{
> +	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> +	struct list_head list;
> +	struct work_struct *work;
> +
> +	spin_lock_irq(&cwq->lock);

This CPU is dead (or cancelled), we don't need cwq->lock.

>  static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
>  						unsigned long action,
>  						void *hcpu)
> @@ -782,11 +768,6 @@ static int __devinit workqueue_cpu_callb
>  	struct cpu_workqueue_struct *cwq;
>  	struct workqueue_struct *wq;
>
> -	switch (action) {
> -	case CPU_UP_PREPARE:
> -		cpu_set(cpu, cpu_populated_map);
> -	}
> -
>  	mutex_lock(&workqueue_mutex);
>  	list_for_each_entry(wq, &workqueues, list) {
>  		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> @@ -799,6 +780,7 @@ static int __devinit workqueue_cpu_callb
>  			return NOTIFY_BAD;
>
>  		case CPU_ONLINE:
> +			kthread_bind(cwq->thread, cpu);
>  			wake_up_process(cwq->thread);
>  			break;
>
> @@ -806,6 +788,7 @@ static int __devinit workqueue_cpu_callb
>  			if (cwq->thread)
>  				wake_up_process(cwq->thread);
>  		case CPU_DEAD:
> +			take_over_work(wq, cpu);
>  			cleanup_workqueue_thread(cwq, cpu);
>  			break;
>  		}

This means that the work_struct on single_threaded wq can't use any of

	__create_workqueue()
	destroy_workqueue()
	flush_workqueue()
	cancel_work_sync()

, they are all racy wrt workqueue_cpu_callback(), and we don't freeze
single_threaded workqueues. This is bad.

Probaly we should:

	- freeze all workqueues, even the single_threaded ones.

	- helper_init() explicitely does __create_workqueue(FE_ALL).
	  this means that we should never use the functions above
	  with this workqueue.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
                     ` (2 preceding siblings ...)
  2007-04-02 13:22   ` Pavel Machek
@ 2007-04-03 12:01   ` Gautham R Shenoy
  2007-04-03 19:34     ` Rafael J. Wysocki
  2007-04-03 14:01   ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
  4 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-03 12:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, dipankar, dino, masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> 
> i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
> freezer: are they assumed frozen immediately, or do we wait until they 
> notice their PF_FREEZING and go into try_to_freeze()? I'd expect 
> TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be 
> the primary source for freezing 'failures')

Ok, we might be in some luck. I panic()ed on freezer fail and checked
the stacktrace of the unfrozen tasks. The stacktrace of each one looks
like:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PID: 7697   TASK: cc354a70  CPU: 7   COMMAND: "make"
#0 [cc37fe50] schedule at c0431752
#1 [cc37fec4] wait_for_completion at c04318d0
#2 [cc37ff24] do_fork at c01249a6
#3 [cc37ff94] sys_vfork at c0103c1f
#4 [cc37ffb4] system_call at c0104d8d
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Rafael had sent out a patch to fix the vfork race, which can be found at
http://lkml.org/lkml/2007/3/1/212

However, the hunk

@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
		tracehook_report_clone_complete(clone_flags, nr, p);

		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
			wait_for_completion(&vfork);
+			freezer_count();
			tracehook_report_vfork_done(p, nr);
		}
	} else {

Seems to be missing in the latest -mm's.

Rafael / Andrew, 
	Any reasons for leaving this hunk out?

I will rerun my tests with this hunk applied and report back.

> 
> 	Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02 18:56         ` Ingo Molnar
@ 2007-04-03 12:56           ` Srivatsa Vaddagiri
  2007-04-03 14:15             ` Gautham R Shenoy
  0 siblings, 1 reply; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-03 12:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 08:56:07PM +0200, Ingo Molnar wrote:
> ok. But the only real problem would be for_each_online_cpu() loops that 
> might sleep, correct? 

I would shy from saying that "that's the only problem". It could also be

	for_each_cpu_mask(some_mask)
		do_something_which_sleeps();

where some_mask is suppsoed to represent a subset of online cpus.

For ex: policy->cpus is a mask maintained by cpufreq, which it iterates thr' 
whenever changing cpu frequency. The cpus in that mask are supposed to be 
a subset of online cpus (with CPU_ONLINE/DEAD handlers in cpufreq.c adjusting 
the mask upon hotplug).

We could adopt a similar trick (get/put_each_cpu_mask) that you describe
below in those extended cases as well.

> the 10% loops that _can_ schedule would trigger the __might_sleep() 
> atomicity test in schedule()), and those would have to be converted a 
> bit more cleverly, on a case by case basis.

The real question is how do we convert over those sleeping
for_each_cpu_mask users (for ex: flush_workqueue) such that they don't
block freezer/hotplug for long periods?

One option is to probably rewrite them to understand that
online[or a derived]_map could have changed everytime they come out of a 
(un)interruptible sleep and deal with arising races appropriately. That
would mean a bit of maintenance headache unfortunately (I was hoping
freezer will lead to zero maintenance headache :)

Besides, how problematic is this in practise (that threads sleep for
extended durations in TASK_INTERRUPTIBLE state breaking
freezer/suspend/hotplug)?

Should we ignore this for the timebeing and take up later as and when
users report problems?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-03 11:47   ` Oleg Nesterov
@ 2007-04-03 13:59     ` Srivatsa Vaddagiri
  2007-04-03 15:03       ` Oleg Nesterov
  0 siblings, 1 reply; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-03 13:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Tue, Apr 03, 2007 at 03:47:29PM +0400, Oleg Nesterov wrote:
> I still think that wait_to_die + bind_cpu is unneeded complication.
> Why can't we do the following:
> 
> 	static int worker_thread(void *__cwq)
> 	{
> 		...
> 
> 		for (;;) {
> 			try_to_freeze();
> 
> 			prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
> 			if (!kthread_should_stop() && list_empty(&cwq->worklist))
> 				schedule();
> 			finish_wait(&cwq->more_work, &wait);
> 
> 			if (kthread_should_stop(cwq))
> 				break;
> 
> 			run_workqueue(cwq);
> 		}
> 
> 		return 0;
> 	}
> 
> ?

cleanup_workqueue_thread (in Gautham's patches) does this:

	thaw_process()
	kthread_stop()

There is a chance that after thaw_process() [but before we have posted
the kthread_stop], worker thread can come out of the refrigerator and start
running run_workqueue() - that will simply prolong the subsequent
kthread_stop() and the system freeze time.

We could do what you are suggesting if the thaw_process() part was
integrated into kthread_stop() code [basically thaw_process after
setting the kthread_stop_info.k flag].

> >  void fastcall flush_workqueue(struct workqueue_struct *wq)
> >  {
> > -	const cpumask_t *cpu_map = wq_cpu_map(wq);
> >  	int cpu;
> >
> >  	might_sleep();
> > -	for_each_cpu_mask(cpu, *cpu_map)
> > +	for_each_online_cpu(cpu)
> >  		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
> >  }
> 
> Hm... I can't understand this change. I believe it is wrong.

Why?

> > -		for_each_possible_cpu(cpu) {
> > +		for_each_online_cpu(cpu) {
> 
> This is wrong. CPU_UP_PREPARE doesn't call init_cpu_workqueue().
> Easy to fix, but I personally think is is better to initialize
> the whole cpu_possible_map.

I tend to agree yes.

> >  static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)

[snip]

> > -	if (alive) {
> >  		thaw_process(cwq->thread);
> > -		wait_for_completion(&barr.done);
> > -
> > -		while (unlikely(cwq->status != CWQ_STOPPED))
> > -			cpu_relax();
> > -		/*
> > -		 * Wait until cwq->thread unlocks cwq->lock,
> > -		 * it won't touch *cwq after that.
> > -		 */
> > -		smp_rmb();
> > +		kthread_stop(cwq->thread);
> >  		cwq->thread = NULL;
> > -		spin_unlock_wait(&cwq->lock);
> >  	}
> >  }
> 
> Deadlockable. Suppose that the freezing is in progress, cwq->thread is not
> frozen yet. cleanup_workqueue_thread() calls thaw_process(cwq->thread),
> then cwq->thread() goes to refrigerator, kthread_stop() blocks forever.

Good catch! Can cleanup_workqueue_thread take some mutex to serialize
with freezer here (say freezer_mutex)?

Or better, since this seems to be a general problem for anyone who wants to do a
kthread_stop, how abt modifying kthread_stop like below:

kthread_stop(p)
{
	int old_exempt_flags;

	task_lock(p);
	old_exempt_flags = p->flags;
	p->flags |= PFE_ALL;	/* Exempt 'p' from being frozen? */
	task_unlock(p);

	kthread_stop_info.k = p;
	thaw_process(p);

	wait_for_completion();

}

Marking 'p' as exempt shouldn't be a problem because freezer would wait
on the thread doing kthread_stop() anyway before declaring system as
frozen.

> > +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> > +{
> > +	struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> > +	struct list_head list;
> > +	struct work_struct *work;
> > +
> > +	spin_lock_irq(&cwq->lock);
> 
> This CPU is dead (or cancelled), we don't need cwq->lock.

yeah ..


> 
> >  static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
> >  						unsigned long action,
> >  						void *hcpu)
> > @@ -782,11 +768,6 @@ static int __devinit workqueue_cpu_callb
> >  	struct cpu_workqueue_struct *cwq;
> >  	struct workqueue_struct *wq;
> >
> > -	switch (action) {
> > -	case CPU_UP_PREPARE:
> > -		cpu_set(cpu, cpu_populated_map);
> > -	}
> > -
> >  	mutex_lock(&workqueue_mutex);
> >  	list_for_each_entry(wq, &workqueues, list) {
> >  		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> > @@ -799,6 +780,7 @@ static int __devinit workqueue_cpu_callb
> >  			return NOTIFY_BAD;
> >
> >  		case CPU_ONLINE:
> > +			kthread_bind(cwq->thread, cpu);
> >  			wake_up_process(cwq->thread);
> >  			break;
> >
> > @@ -806,6 +788,7 @@ static int __devinit workqueue_cpu_callb
> >  			if (cwq->thread)
> >  				wake_up_process(cwq->thread);
> >  		case CPU_DEAD:
> > +			take_over_work(wq, cpu);
> >  			cleanup_workqueue_thread(cwq, cpu);
> >  			break;
> >  		}
> 
> This means that the work_struct on single_threaded wq can't use any of
> 
> 	__create_workqueue()
> 	destroy_workqueue()
> 	flush_workqueue()
> 	cancel_work_sync()

The workqueue_mutex() should serialize these with workqueue_cpu_callback() to 
an extent, but  ..

> , they are all racy wrt workqueue_cpu_callback(), and we don't freeze
> single_threaded workqueues. This is bad.
> 
> Probaly we should:
> 
> 	- freeze all workqueues, even the single_threaded ones.

Yes I agree, we should target freezing everybody here. It feels much
safer that way!

The only dependency I have seen is stop_machine() called after processes
are frozen. It needs the services of a workqueue to create kthreads. We
need to atleast exempt that worker thread from being frozen. Hopefully
the list of such non-freezable singlethreaded workqueues will be tiny
enough for us to audit time-to-time.

> 	- helper_init() explicitely does __create_workqueue(FE_ALL).
> 	  this means that we should never use the functions above
> 	  with this workqueue.

Ok you seem to have covered what I went out to say above! 

Thx for your review so far ..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
                     ` (3 preceding siblings ...)
  2007-04-03 12:01   ` Gautham R Shenoy
@ 2007-04-03 14:01   ` Gautham R Shenoy
  4 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-03 14:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, dipankar, dino, masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
> freezer: are they assumed frozen immediately, or do we wait until they 
> notice their PF_FREEZING and go into try_to_freeze()? I'd expect 
> TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be 
> the primary source for freezing 'failures')

Ok, are some luck. I panic()ed on freezer fail and checked
the stacktrace of the unfrozen tasks. The stacktrace of each one looks
like:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PID: 7697   TASK: cc354a70  CPU: 7   COMMAND: "make"
#0 [cc37fe50] schedule at c0431752
#1 [cc37fec4] wait_for_completion at c04318d0
#2 [cc37ff24] do_fork at c01249a6
#3 [cc37ff94] sys_vfork at c0103c1f
#4 [cc37ffb4] system_call at c0104d8d
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Rafael had sent out a patch to fix the vfork race, which can be found at
http://lkml.org/lkml/2007/3/1/212

However, the following hunk seems to be missing from the latest -mm.

@@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
		tracehook_report_clone_complete(clone_flags, nr, p);
		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
			wait_for_completion(&vfork);
+			freezer_count();
			tracehook_report_vfork_done(p, nr);
		}
		} else {



Rafael / Andrew,
        Any reasons for leaving this hunk out?

I reran my tests with this hunk applied, and it work just fine.
Even 'make -j' the maximum time taken to hotplug a cpu was 0m2.16s.

> 
> 	Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-03 12:56           ` Srivatsa Vaddagiri
@ 2007-04-03 14:15             ` Gautham R Shenoy
  2007-04-03 19:25               ` Rafael J. Wysocki
  2007-04-04  3:15               ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-03 14:15 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Ingo Molnar, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Tue, Apr 03, 2007 at 06:26:19PM +0530, Srivatsa Vaddagiri wrote:
> 
> Besides, how problematic is this in practise (that threads sleep for
> extended durations in TASK_INTERRUPTIBLE state breaking
> freezer/suspend/hotplug)?
> 
> Should we ignore this for the timebeing and take up later as and when
> users report problems?

In my case, the problem of freezer failing was due to the vfork freezer race
in do_fork. The parent task was waiting_for_completion() while the child was
frozen.

Rafael has already sent out the fix for that, but for some reason I
don't see it in the -mm.

With that fix, freezer and hence hotplug succeeds even when I am running
a 'make -j' test.

> 
> -- 
> Regards,
> vatsa

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-03 13:59     ` Srivatsa Vaddagiri
@ 2007-04-03 15:03       ` Oleg Nesterov
  2007-04-03 17:18         ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-03 15:03 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On 04/03, Srivatsa Vaddagiri wrote:
>
> On Tue, Apr 03, 2007 at 03:47:29PM +0400, Oleg Nesterov wrote:
> > 
> > 		for (;;) {
> > 			try_to_freeze();
> > 
> > 			prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
> > 			if (!kthread_should_stop() && list_empty(&cwq->worklist))
> > 				schedule();
> > 			finish_wait(&cwq->more_work, &wait);
> > 
> > 			if (kthread_should_stop(cwq))
> > 				break;
> > 
> > 			run_workqueue(cwq);
> > 		}
> 
> cleanup_workqueue_thread (in Gautham's patches) does this:
> 
> 	thaw_process()
> 	kthread_stop()

Yes, thanks.

> We could do what you are suggesting if the thaw_process() part was
> integrated into kthread_stop() code [basically thaw_process after
> setting the kthread_stop_info.k flag].

I think it would be nice to do. I believe we can cleanup ksoftirqd()
and migration_thread() as well (kill wait_to_die: loop). Probably it
is better to introduce a new helper for that, kthread_thaw_stop() or
something.

> > >  void fastcall flush_workqueue(struct workqueue_struct *wq)
> > >  {
> > > -	const cpumask_t *cpu_map = wq_cpu_map(wq);
> > >  	int cpu;
> > >
> > >  	might_sleep();
> > > -	for_each_cpu_mask(cpu, *cpu_map)
> > > +	for_each_online_cpu(cpu)
> > >  		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
> > >  }
> > 
> > Hm... I can't understand this change. I believe it is wrong.
> 
> Why?

What if is_single_threaded(wq) == true? In that case we should call
flush_cpu_workqueue(cpu) only if cpu == singlethread_cpu, otherwise
this is unneeded and wrong, because per_cpu_ptr(wq->cpu_wq, cpu) was
not initialized.

> > > +		kthread_stop(cwq->thread);
> > >  		cwq->thread = NULL;
> > > -		spin_unlock_wait(&cwq->lock);
> > >  	}
> > >  }
> > 
> > Deadlockable. Suppose that the freezing is in progress, cwq->thread is not
> > frozen yet. cleanup_workqueue_thread() calls thaw_process(cwq->thread),
> > then cwq->thread() goes to refrigerator, kthread_stop() blocks forever.
> 
> Good catch! Can cleanup_workqueue_thread take some mutex to serialize
> with freezer here (say freezer_mutex)?
> 
> Or better, since this seems to be a general problem for anyone who wants to do a
> kthread_stop, how abt modifying kthread_stop like below:
> 
> kthread_stop(p)
> {
> 	int old_exempt_flags;
> 
> 	task_lock(p);
> 	old_exempt_flags = p->flags;
> 	p->flags |= PFE_ALL;	/* Exempt 'p' from being frozen? */

I agree, we should mark this thread as non-freezable, but we can't modify
p->flags, this is racy. "current" owns its ->flags and it is not atomic.
Note that thaw_process() checks frozen(p) when it clears PF_FROZEN.

Actually, we should do this before destroy_workqueue() calls flush_workqueue().
Otherwise flush_cpu_workqueue() can hang forever in a similar manner.

Needs more thinking, I guess.

> > This means that the work_struct on single_threaded wq can't use any of
> > 
> > 	__create_workqueue()
> > 	destroy_workqueue()
> > 	flush_workqueue()
> > 	cancel_work_sync()
> 
> The workqueue_mutex() should serialize these with workqueue_cpu_callback() to 
> an extent, but  ..

No, no, workqueue_mutex can't help. Just for example: CPU_UP_PREPARE completes
and drops workqueue_mutex. __create_workqueue(wq) doesn't see the new cpu, it
is not on cpu_online_map, so it doesn't create cwq->thread. CPU_ONLINE oopses.

> Yes I agree, we should target freezing everybody here. It feels much
> safer that way!

Good!

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-03 15:03       ` Oleg Nesterov
@ 2007-04-03 17:18         ` Srivatsa Vaddagiri
  2007-04-04 15:28           ` Oleg Nesterov
  2007-04-12  2:22           ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-03 17:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Tue, Apr 03, 2007 at 07:03:36PM +0400, Oleg Nesterov wrote:
> I think it would be nice to do. I believe we can cleanup ksoftirqd()
> and migration_thread() as well (kill wait_to_die: loop). Probably it

I doubt whether we can kill it in migration_thread, since that is
another thread which is unfrozen for hotplug (stop_machine relies on its
services while rest of the world is frozen).

> is better to introduce a new helper for that, kthread_thaw_stop() or
> something.

Will think of that.


> > Why?
> 
> What if is_single_threaded(wq) == true? In that case we should call
> flush_cpu_workqueue(cpu) only if cpu == singlethread_cpu, otherwise
> this is unneeded and wrong, because per_cpu_ptr(wq->cpu_wq, cpu) was
> not initialized.

Ah yes ..

> > kthread_stop(p)
> > {
> > 	int old_exempt_flags;
> > 
> > 	task_lock(p);
> > 	old_exempt_flags = p->flags;
> > 	p->flags |= PFE_ALL;	/* Exempt 'p' from being frozen? */
> 
> I agree, we should mark this thread as non-freezable, but we can't modify
> p->flags, this is racy. "current" owns its ->flags and it is not atomic.
> Note that thaw_process() checks frozen(p) when it clears PF_FROZEN.

I suspected that we cannot modify p->flags just like that. How abt
moving freezer exemption bits to a separate field, which is protected by
task_lock?

> Actually, we should do this before destroy_workqueue() calls flush_workqueue().
> Otherwise flush_cpu_workqueue() can hang forever in a similar manner.

Yep. I guess these are a class of freezer deadlocks very similar to vfork
parent waiting on child case. I get a feeling these should become common
outside of kthread too (A waits on B for something, B gets frozen, which
means A won't freeze causing freezer to fail). Can freezer detect this
dependency somehow and thaw B automatically? Probably not that easy ..

> Needs more thinking, I guess.

[snip]

> No, no, workqueue_mutex can't help. Just for example: CPU_UP_PREPARE completes
> and drops workqueue_mutex. __create_workqueue(wq) doesn't see the new cpu, it
> is not on cpu_online_map, so it doesn't create cwq->thread. CPU_ONLINE oopses.

Ok ..sure.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-03 14:15             ` Gautham R Shenoy
@ 2007-04-03 19:25               ` Rafael J. Wysocki
  2007-04-04  3:15               ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-03 19:25 UTC (permalink / raw)
  To: ego
  Cc: Srivatsa Vaddagiri, Ingo Molnar, akpm, paulmck, torvalds,
	linux-kernel, Oleg Nesterov, dipankar, dino, masami.hiramatsu.pt

On Tuesday, 3 April 2007 16:15, Gautham R Shenoy wrote:
> On Tue, Apr 03, 2007 at 06:26:19PM +0530, Srivatsa Vaddagiri wrote:
> > 
> > Besides, how problematic is this in practise (that threads sleep for
> > extended durations in TASK_INTERRUPTIBLE state breaking
> > freezer/suspend/hotplug)?
> > 
> > Should we ignore this for the timebeing and take up later as and when
> > users report problems?
> 
> In my case, the problem of freezer failing was due to the vfork freezer race
> in do_fork. The parent task was waiting_for_completion() while the child was
> frozen.
> 
> Rafael has already sent out the fix for that, but for some reason I
> don't see it in the -mm.

Hm, freezer-fix-vfork-problem.patch is in -rc5-mm3.  Is it the patch you're
talking about or is there another one?

Rafael

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-03 12:01   ` Gautham R Shenoy
@ 2007-04-03 19:34     ` Rafael J. Wysocki
  2007-04-03 20:24       ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-03 19:34 UTC (permalink / raw)
  To: ego, akpm
  Cc: Ingo Molnar, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, dipankar, dino, masami.hiramatsu.pt

On Tuesday, 3 April 2007 14:01, Gautham R Shenoy wrote:
> On Mon, Apr 02, 2007 at 08:16:12AM +0200, Ingo Molnar wrote:
> > 
> > i'm wondering about how TASK_UNINTERRUPTIBLE tasks are handled by the 
> > freezer: are they assumed frozen immediately, or do we wait until they 
> > notice their PF_FREEZING and go into try_to_freeze()? I'd expect 
> > TASK_UNINTERRUPTIBLE to be the largest source of latency. (and hence be 
> > the primary source for freezing 'failures')
> 
> Ok, we might be in some luck. I panic()ed on freezer fail and checked
> the stacktrace of the unfrozen tasks. The stacktrace of each one looks
> like:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> PID: 7697   TASK: cc354a70  CPU: 7   COMMAND: "make"
> #0 [cc37fe50] schedule at c0431752
> #1 [cc37fec4] wait_for_completion at c04318d0
> #2 [cc37ff24] do_fork at c01249a6
> #3 [cc37ff94] sys_vfork at c0103c1f
> #4 [cc37ffb4] system_call at c0104d8d
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Rafael had sent out a patch to fix the vfork race, which can be found at
> http://lkml.org/lkml/2007/3/1/212
> 
> However, the hunk
> 
> @@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
> 		tracehook_report_clone_complete(clone_flags, nr, p);
> 
> 		if (clone_flags & CLONE_VFORK) {
> +			freezer_do_not_count();
> 			wait_for_completion(&vfork);
> +			freezer_count();
> 			tracehook_report_vfork_done(p, nr);
> 		}
> 	} else {
> 
> Seems to be missing in the latest -mm's.

Good catch!

> Rafael / Andrew, 
> 	Any reasons for leaving this hunk out?

No, absolutely not.  It's needed.

Moreover, freezer-fix-vfork-problem.patch from the broken-out -rc5-mm3
contains it, so some other patch must have reverted this change.

[looks]

Ah, it's utrace-prep-2.patch .  Andrew?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-03 19:34     ` Rafael J. Wysocki
@ 2007-04-03 20:24       ` Andrew Morton
  2007-04-04 10:06         ` utrace merge Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2007-04-03 20:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, Ingo Molnar, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, dipankar, dino, masami.hiramatsu.pt

On Tue, 3 Apr 2007 21:34:28 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> > However, the hunk
> > 
> > @@ -1393,7 +1394,9 @@ long do_fork(unsigned long clone_flags,
> > 		tracehook_report_clone_complete(clone_flags, nr, p);
> > 
> > 		if (clone_flags & CLONE_VFORK) {
> > +			freezer_do_not_count();
> > 			wait_for_completion(&vfork);
> > +			freezer_count();
> > 			tracehook_report_vfork_done(p, nr);
> > 		}
> > 	} else {
> > 
> > Seems to be missing in the latest -mm's.
> 
> Good catch!
> 
> > Rafael / Andrew, 
> > 	Any reasons for leaving this hunk out?
> 
> No, absolutely not.  It's needed.
> 
> Moreover, freezer-fix-vfork-problem.patch from the broken-out -rc5-mm3
> contains it, so some other patch must have reverted this change.
> 
> [looks]
> 
> Ah, it's utrace-prep-2.patch .  Andrew?

urgh, I screwed up, sorry.

utrace-prep-2 reverts a bit of the underlying tree so that the utrace
patches (which are against mainline) don't throw a tremendous reject which
has to be fixed each time I pull Roland's tree.  I'm supposed to reapply
that change after the utrace patches but forgot.

--- a/kernel/fork.c~undo-utrace-prep-2
+++ a/kernel/fork.c
@@ -1387,7 +1387,9 @@ long do_fork(unsigned long clone_flags,
 			tracehook_report_clone_complete(clone_flags, nr, p);
 
 		if (clone_flags & CLONE_VFORK) {
+			freezer_do_not_count();
 			wait_for_completion(&vfork);
+			freezer_count();
 			if (likely(is_user))
 				tracehook_report_vfork_done(p, nr);
 		}
_


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-03 14:15             ` Gautham R Shenoy
  2007-04-03 19:25               ` Rafael J. Wysocki
@ 2007-04-04  3:15               ` Srivatsa Vaddagiri
  2007-04-04 10:04                 ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-04  3:15 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Ingo Molnar, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Tue, Apr 03, 2007 at 07:45:16PM +0530, Gautham R Shenoy wrote:
> > Should we ignore this for the timebeing and take up later as and when
> > users report problems?
> 
> In my case, the problem of freezer failing was due to the vfork freezer race
> in do_fork. The parent task was waiting_for_completion() while the child was
> frozen.

Good catch and ...

> Rafael has already sent out the fix for that, but for some reason I
> don't see it in the -mm.
> 
> With that fix, freezer and hence hotplug succeeds even when I am running
> a 'make -j' test.

Good to know that!

So Ingo/Rafael, should we ignore the problem of "TASK_UNINTERRUPTIBLE
sleepers can break freezer" for the timebeing? Mainly because its not
trivial to solve and we need to tackle it case by case basis as and when
users report specific problems.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-04  3:15               ` Srivatsa Vaddagiri
@ 2007-04-04 10:04                 ` Ingo Molnar
  2007-04-04 10:41                   ` Gautham R Shenoy
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-04 10:04 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt


* Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:

> > Rafael has already sent out the fix for that, but for some reason I 
> > don't see it in the -mm.
> > 
> > With that fix, freezer and hence hotplug succeeds even when I am 
> > running a 'make -j' test.

nice!

> Good to know that!
> 
> So Ingo/Rafael, should we ignore the problem of "TASK_UNINTERRUPTIBLE 
> sleepers can break freezer" for the timebeing? Mainly because its not 
> trivial to solve and we need to tackle it case by case basis as and 
> when users report specific problems.

yeah, i think you are right - and the TASK_UNINTERRUPTIBLE thing is not 
only quite complex, it also breaks the symmetry of freezer use 
(sw-suspend obviously cannot freeze uninterruptible tasks). We should 
watch whether the current latency of freezing is good enough in 
practice.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: utrace merge
  2007-04-03 20:24       ` Andrew Morton
@ 2007-04-04 10:06         ` Ingo Molnar
  2007-04-04 10:36           ` Christoph Hellwig
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-04 10:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, ego, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, dipankar, dino, masami.hiramatsu.pt,
	Roland McGrath


* Andrew Morton <akpm@linux-foundation.org> wrote:

> urgh, I screwed up, sorry.
> 
> utrace-prep-2 reverts a bit of the underlying tree so that the utrace 
> patches (which are against mainline) don't throw a tremendous reject 
> which has to be fixed each time I pull Roland's tree.  I'm supposed to 
> reapply that change after the utrace patches but forgot.

btw., how about pushing utrace to v2.6.22, is anything big holding that 
up? (other than Roland's modesty, which we should gently but firmly 
ignore in this case =B-)

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: utrace merge
  2007-04-04 10:06         ` utrace merge Ingo Molnar
@ 2007-04-04 10:36           ` Christoph Hellwig
  2007-04-04 18:41             ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Christoph Hellwig @ 2007-04-04 10:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Rafael J. Wysocki, ego, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, dipankar, dino,
	masami.hiramatsu.pt, Roland McGrath

On Wed, Apr 04, 2007 at 12:06:44PM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > urgh, I screwed up, sorry.
> > 
> > utrace-prep-2 reverts a bit of the underlying tree so that the utrace 
> > patches (which are against mainline) don't throw a tremendous reject 
> > which has to be fixed each time I pull Roland's tree.  I'm supposed to 
> > reapply that change after the utrace patches but forgot.
> 
> btw., how about pushing utrace to v2.6.22, is anything big holding that 
> up? (other than Roland's modesty, which we should gently but firmly 
> ignore in this case =B-)

a) lots of architecture support
b) lots of overengineered and obsfucated code
c) missing public information

I've also not gotten any feedback to my ages old review and the issues
don't seem to be addressed either.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-04 10:04                 ` Ingo Molnar
@ 2007-04-04 10:41                   ` Gautham R Shenoy
  2007-04-04 11:49                     ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-04 10:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srivatsa Vaddagiri, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Wed, Apr 04, 2007 at 12:04:31PM +0200, Ingo Molnar wrote:
> 
> yeah, i think you are right - and the TASK_UNINTERRUPTIBLE thing is not 
> only quite complex, it also breaks the symmetry of freezer use 
> (sw-suspend obviously cannot freeze uninterruptible tasks). We should 
> watch whether the current latency of freezing is good enough in 
> practice.

Ok. Do you have any specific tests in mind which I can run and post the
numbers? 

As of now, I've been stressing the system with (kernbench + ondemand governor)
and timing the hotplug operation. 

> 
> 	Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-04 10:41                   ` Gautham R Shenoy
@ 2007-04-04 11:49                     ` Ingo Molnar
  2007-04-04 12:24                       ` Gautham R Shenoy
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2007-04-04 11:49 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Srivatsa Vaddagiri, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt


* Gautham R Shenoy <ego@in.ibm.com> wrote:

> On Wed, Apr 04, 2007 at 12:04:31PM +0200, Ingo Molnar wrote:
> > 
> > yeah, i think you are right - and the TASK_UNINTERRUPTIBLE thing is 
> > not only quite complex, it also breaks the symmetry of freezer use 
> > (sw-suspend obviously cannot freeze uninterruptible tasks). We 
> > should watch whether the current latency of freezing is good enough 
> > in practice.
> 
> Ok. Do you have any specific tests in mind which I can run and post 
> the numbers?
> 
> As of now, I've been stressing the system with (kernbench + ondemand 
> governor) and timing the hotplug operation.

i suspect a fork-intensive application like kernbench should be close to 
the worst-case already. A more IO-intensive workload would maximize the 
uninterruptible-sleep latencies perhaps?

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Cpu-hotplug: Using the Process Freezer (try2)
  2007-04-04 11:49                     ` Ingo Molnar
@ 2007-04-04 12:24                       ` Gautham R Shenoy
  0 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-04 12:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srivatsa Vaddagiri, akpm, paulmck, torvalds, linux-kernel,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

On Wed, Apr 04, 2007 at 01:49:01PM +0200, Ingo Molnar wrote:
> 
> i suspect a fork-intensive application like kernbench should be close to 
> the worst-case already. A more IO-intensive workload would maximize the 
> uninterruptible-sleep latencies perhaps?

Ok. I hope the NFS read/write in a tight loop should do the trick.
Will get back soon.

> 
> 	Ingo

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-03 17:18         ` Srivatsa Vaddagiri
@ 2007-04-04 15:28           ` Oleg Nesterov
  2007-04-04 17:49             ` Srivatsa Vaddagiri
  2007-04-12  2:22           ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-04 15:28 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On 04/03, Srivatsa Vaddagiri wrote:
>
> On Tue, Apr 03, 2007 at 07:03:36PM +0400, Oleg Nesterov wrote:
> 
> > is better to introduce a new helper for that, kthread_thaw_stop() or
> > something.
> 
> Will think of that.

I changed my mind :) The problem is general, I am starting to believe
it is better to change kthread_stop().

> > > kthread_stop(p)
> > > {
> > > 	int old_exempt_flags;
> > > 
> > > 	task_lock(p);
> > > 	old_exempt_flags = p->flags;
> > > 	p->flags |= PFE_ALL;	/* Exempt 'p' from being frozen? */
> > 
> > I agree, we should mark this thread as non-freezable, but we can't modify
> > p->flags, this is racy. "current" owns its ->flags and it is not atomic.
> > Note that thaw_process() checks frozen(p) when it clears PF_FROZEN.
> 
> I suspected that we cannot modify p->flags just like that. How abt
> moving freezer exemption bits to a separate field, which is protected by
> task_lock?

Probably yes... In that case it makes sense to move PF_FREEZER_SKIP/PF_FROZEN
to the new field as well.

Perhaps we can ignore this problem for now. Freezer is not 100% reliable
anyway. For example,

	worker_thread:

		for (;;) {
			try_to_freeze();

			prepare_to_wait();
			if (...)
				schedule();
			finish_wait();
		}

This is racy, we can miss freeze_process()->signal_wake_up() if it happens
between try_to_freeze() and prepare_to_wait(). We have to check TIF_FREEZE
before entering schedule() if we want to fix this race.

Should we? I don't know. This will uglify the code, and the probability
of this race is very low.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-04 15:28           ` Oleg Nesterov
@ 2007-04-04 17:49             ` Srivatsa Vaddagiri
  2007-04-05 12:20               ` Oleg Nesterov
  0 siblings, 1 reply; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-04 17:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Wed, Apr 04, 2007 at 07:28:28PM +0400, Oleg Nesterov wrote:
> I changed my mind :) The problem is general, I am starting to believe
> it is better to change kthread_stop().

yes i agree. Although is some cases like destroy_workqueue, we need to
mark the target thread non-freezable way before we call kthread_stop (as
you pointed out).

> > I suspected that we cannot modify p->flags just like that. How abt
> > moving freezer exemption bits to a separate field, which is protected by
> > task_lock?
> 
> Probably yes... In that case it makes sense to move PF_FREEZER_SKIP/PF_FROZEN
> to the new field as well.

I wonder if there are some reserved fields in task_struct which we can
reuse here ..

> Perhaps we can ignore this problem for now. Freezer is not 100% reliable
> anyway. For example,
> 
> 	worker_thread:
> 
> 		for (;;) {
> 			try_to_freeze();
> 
> 			prepare_to_wait();
> 			if (...)
> 				schedule();
> 			finish_wait();
> 		}
> 
> This is racy, we can miss freeze_process()->signal_wake_up() if it happens
> between try_to_freeze() and prepare_to_wait(). We have to check TIF_FREEZE
> before entering schedule() if we want to fix this race.

Yes that needs a fix as well. Oh dear, freezer is so fragile to break!

> Should we? I don't know. This will uglify the code, and the probability
> of this race is very low.

Would be nice to fix IMO. Atleast serves to show "how to make your code
freezer friendly".

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: utrace merge
  2007-04-04 10:36           ` Christoph Hellwig
@ 2007-04-04 18:41             ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2007-04-04 18:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rafael J. Wysocki, ego, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, dipankar, dino,
	masami.hiramatsu.pt, Roland McGrath

On Wed, 4 Apr 2007 11:36:24 +0100 Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Apr 04, 2007 at 12:06:44PM +0200, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > urgh, I screwed up, sorry.
> > > 
> > > utrace-prep-2 reverts a bit of the underlying tree so that the utrace 
> > > patches (which are against mainline) don't throw a tremendous reject 
> > > which has to be fixed each time I pull Roland's tree.  I'm supposed to 
> > > reapply that change after the utrace patches but forgot.
> > 
> > btw., how about pushing utrace to v2.6.22, is anything big holding that 
> > up? (other than Roland's modesty, which we should gently but firmly 
> > ignore in this case =B-)
> 
> a) lots of architecture support
> b) lots of overengineered and obsfucated code
> c) missing public information
> 
> I've also not gotten any feedback to my ages old review and the issues
> don't seem to be addressed either.
> 

And rmk had showstopper problems with ARM support which I never
saw resolved.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
  2007-04-02 13:56   ` Pavel Machek
@ 2007-04-05  9:46   ` Oleg Nesterov
  2007-04-05 10:59     ` Gautham R Shenoy
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05  9:46 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/02, Gautham R Shenoy wrote:
>
> +/*
> + * Exempt the current process from being frozen for a certain event
> + */
> +static inline void freezer_exempt(unsigned long exempt_freeze_event)
> +{
> +	if (exempt_freeze_event == FE_NONE)
> +		current->flags &= ~PF_FE_ALL;
> +	else
> +		current->flags |= exempt_freeze_event;
> +}

So, a kernel_thread should call freezer_exempt(FE_XXX) somewhere at the
beginning if it doesn't want to be considered as freezeable. But what if
the freezing is already in progress? In that case freezer_exempt() should
somehow clear TIF_FREEZE (if exempt_freeze_event doesn't match freeze_event
parameter of freeze_processes()), otherwise we may hit a nasty bug, much
worse than a freezing failure (which could be restarted).

try_to_freeze_tasks() succeeds because the task is !freezeable(), the
task goes to refrigerator (while it should not), thaw_tasks() ignores
process and it stays frozen.

Alternatively, we can do a re-check in refrigerator() to fix this race.
In any case, it looks like freeze_event should be stored in a global var.

> --- linux-2.6.21-rc5.orig/kernel/sched.c
> +++ linux-2.6.21-rc5/kernel/sched.c
> @@ -5057,6 +5057,7 @@ static int migration_thread(void *data)
>  	BUG_ON(rq->migration_thread != current);
>
>  	set_current_state(TASK_INTERRUPTIBLE);
> +	freezer_exempt(FE_ALL);

This is a real nitpick, but it was hard to me to understand this change.
Because it looks as if we have a subtle reason to set TASK_INTERRUPTIBLE
before freezer_exempt(). Unless I missed something, I'd suggest to move
freezer_exempt() up, before set_current_state().

The same for apm_mainloop().

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/8] Make process freezer reentrant
  2007-04-02  5:37 ` [PATCH 2/8] Make process freezer reentrant Gautham R Shenoy
@ 2007-04-05  9:53   ` Oleg Nesterov
  2007-04-05 10:19     ` Gautham R Shenoy
  0 siblings, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05  9:53 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/02, Gautham R Shenoy wrote:
>
>  int freeze_processes(unsigned long freeze_event)
>  {
>  	unsigned int nr_unfrozen;
> -
> +	int ret = 0;
> +	mutex_lock(&freezer_mutex);
> +	if (system_freeze_event_mask & freeze_event)
> +		goto done;

I am not sure this is correct. Suppose that system_freeze_event_mask == FE_SUSPEND,
now freeze_processes(FE_ALL) returns success. Shouldn't we return an error?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/8] Make process freezer reentrant
  2007-04-05  9:53   ` Oleg Nesterov
@ 2007-04-05 10:19     ` Gautham R Shenoy
  0 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-05 10:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, Apr 05, 2007 at 01:53:01PM +0400, Oleg Nesterov wrote:
> On 04/02, Gautham R Shenoy wrote:
> >
> >  int freeze_processes(unsigned long freeze_event)
> >  {
> >  	unsigned int nr_unfrozen;
> > -
> > +	int ret = 0;
> > +	mutex_lock(&freezer_mutex);
> > +	if (system_freeze_event_mask & freeze_event)
> > +		goto done;
> 
> I am not sure this is correct. Suppose that system_freeze_event_mask == FE_SUSPEND,
> now freeze_processes(FE_ALL) returns success. Shouldn't we return an error?

Hmm, may be we are getting confused with the word 'reentrant' here.
The idea behind this API was to freeze the system for one event
only at any given time.

However, we can have nested freeze_processes() call for different events.

Something like 
freeze_processes(FE_SUSPEND);

/* Do something */

	freeze_processes(FE_HOTPLUG_CPU);

	/* hotplug cpus */

	thaw_processes(FE_HOTPLUG_CPU);

/* Do something more */

thaw_processes(FE_SUSPEND);

So ideally no one is supposed to make a call like
freeze_processes(FE_SUSPEND | FE_HOTPLUG_CPU);
OR
freeze_processes(FE_ALL);

I guess I should also put a check for a valid freeze event.

> 
> Oleg.
> 

Thanks
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-02  5:38 ` [PATCH 3/8] Use process freezer for cpu-hotplug Gautham R Shenoy
@ 2007-04-05 10:53   ` Oleg Nesterov
  2007-04-05 12:14     ` Gautham R Shenoy
  2007-04-06 17:27   ` Nathan Lynch
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 10:53 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/02, Gautham R Shenoy wrote:
>
> +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> +		thaw_processes(FE_HOTPLUG_CPU);
> +		return -EBUSY;
> +	}

Off-topic. This is a common pattern. Perhaps it makes sense to
introduce a try_to_freeze_or_thaw_and_return_an_error() helper.

> @@ -161,10 +141,13 @@ static int _cpu_down(unsigned int cpu)
>  					    hcpu) == NOTIFY_BAD)
>  			BUG();
>
> -		if (IS_ERR(p)) {
> +		set_cpus_allowed(current, old_allowed);
> +
> +		if (IS_ERR(p))
>  			err = PTR_ERR(p);
> -			goto out_allowed;
> -		}
> +		else
> +			err = kthread_stop(p);
> +
>  		goto out_thread;
>  	}

Why this change? We are doing kthread_stop() + set_cpus_allowed() on
return. Imho,

		if (IS_ERR(p))
			goto out_allowed;
		goto out_thread;

looks a bit better. Yes we need a couple of error labels at the end.

> --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> +++ linux-2.6.21-rc5/kernel/softlockup.c
> @@ -147,6 +147,7 @@ cpu_callback(struct notifier_block *nfb,
>  	case CPU_DEAD:
>  		p = per_cpu(watchdog_task, hotcpu);
>  		per_cpu(watchdog_task, hotcpu) = NULL;
> +		thaw_process(p);
>  		kthread_stop(p);

As it was already discussed, this is racy. As Srivatsa (imho rightly)
suggested, kthread_stop(p) should thaw process itself. This also allows
us to kill at least some of wait_for_die loops.

However, the change in kthread_stop(p) in not enough to close the race.
We can check kthread_should_stop() in refrigerator(), this looks like
a most simple approach for now.

Alternatively, Srivatsa suggests to introduce a new task_lock() protected
task_struct->freezer_state (so we can reliably set FE_ALL). Surely this is
more poweful, but needs more changes. I am not sure. Perhaps we can do
this later.

In any case, imho "try3" should add thaw_process() to kthread_stop().
Gautham, Srivatsa, do you agree?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-05  9:46   ` Oleg Nesterov
@ 2007-04-05 10:59     ` Gautham R Shenoy
  2007-04-05 11:30       ` Oleg Nesterov
  0 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-05 10:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, Apr 05, 2007 at 01:46:33PM +0400, Oleg Nesterov wrote:
> On 04/02, Gautham R Shenoy wrote:
> >
> > +/*
> > + * Exempt the current process from being frozen for a certain event
> > + */
> > +static inline void freezer_exempt(unsigned long exempt_freeze_event)
> > +{
> > +	if (exempt_freeze_event == FE_NONE)
> > +		current->flags &= ~PF_FE_ALL;
> > +	else
> > +		current->flags |= exempt_freeze_event;
> > +}
> 
> So, a kernel_thread should call freezer_exempt(FE_XXX) somewhere at the
> beginning if it doesn't want to be considered as freezeable. But what if
> the freezing is already in progress? In that case freezer_exempt() should
> somehow clear TIF_FREEZE (if exempt_freeze_event doesn't match freeze_event
> parameter of freeze_processes()), otherwise we may hit a nasty bug, much
> worse than a freezing failure (which could be restarted).
>

That's a nice catch! The problem still exists in the current -mm with
tasks marking themselves PF_NOFREEZE and then calling try_to_freeze_tasks().

> try_to_freeze_tasks() succeeds because the task is !freezeable(), the
> task goes to refrigerator (while it should not), thaw_tasks() ignores
> process and it stays frozen.

Yup! That's really nasty.
> 
> Alternatively, we can do a re-check in refrigerator() to fix this race.
> In any case, it looks like freeze_event should be stored in a global var.

In the patch 2/8, I am maintaining a global system_freeze_event_mask.
The reads and writes to this mask are serialized by freezer_mutex.
But the mutex is held only in the freeze_processes() and
thaw_processes() path.

So the check in refrigerator() which is not in the freeze_processes()/
thaw_processes() path might need some more thought. Will cook up
something.

> 
> > --- linux-2.6.21-rc5.orig/kernel/sched.c
> > +++ linux-2.6.21-rc5/kernel/sched.c
> > @@ -5057,6 +5057,7 @@ static int migration_thread(void *data)
> >  	BUG_ON(rq->migration_thread != current);
> >
> >  	set_current_state(TASK_INTERRUPTIBLE);
> > +	freezer_exempt(FE_ALL);
> 
> This is a real nitpick, but it was hard to me to understand this change.
> Because it looks as if we have a subtle reason to set TASK_INTERRUPTIBLE
> before freezer_exempt(). Unless I missed something, I'd suggest to move
> freezer_exempt() up, before set_current_state().
> 
> The same for apm_mainloop().

Ok, no subtle reasons for freezer_exempt()ing after set_current_state().
So no problems changing the order. But (just curious), is there any specific
problem with this particular order ?

> 
> Oleg.
> 

Thanks for the review.
Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-05 10:59     ` Gautham R Shenoy
@ 2007-04-05 11:30       ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 11:30 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/05, Gautham R Shenoy wrote:
>
> On Thu, Apr 05, 2007 at 01:46:33PM +0400, Oleg Nesterov wrote:
> > 
> > > --- linux-2.6.21-rc5.orig/kernel/sched.c
> > > +++ linux-2.6.21-rc5/kernel/sched.c
> > > @@ -5057,6 +5057,7 @@ static int migration_thread(void *data)
> > >  	BUG_ON(rq->migration_thread != current);
> > >
> > >  	set_current_state(TASK_INTERRUPTIBLE);
> > > +	freezer_exempt(FE_ALL);
> > 
> > This is a real nitpick, but it was hard to me to understand this change.
> > Because it looks as if we have a subtle reason to set TASK_INTERRUPTIBLE
> > before freezer_exempt(). Unless I missed something, I'd suggest to move
> > freezer_exempt() up, before set_current_state().
> > 
> > The same for apm_mainloop().
> 
> Ok, no subtle reasons for freezer_exempt()ing after set_current_state().
> So no problems changing the order. But (just curious), is there any specific
> problem with this particular order ?

No, no, it was just a nitpick :) May be this is just me, but when I see the
code like

	set_current_state(TASK_XXX);
	something_which_doesnt_need_TASK_XXX();

, I can't read the code further, trying to understand where I was wrong
and why do we need to change task->state here.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default
  2007-04-02  5:41 ` [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default Gautham R Shenoy
@ 2007-04-05 11:57   ` Oleg Nesterov
  2007-04-05 20:06     ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 11:57 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/02, Gautham R Shenoy wrote:
>
> +extern struct workqueue_struct *create_freeze_exempted_workqueue(const char *name, int freeze_exempt_events);
> +
> +extern struct workqueue_struct *create_singlethread_workqueue(const char *name);

I bet akpm will say something about 110-column declaration...

> +struct workqueue_struct *create_workqueue(const char *name)
> +{
> +	return __create_workqueue(name, 0, FE_NONE);
> +}
> +EXPORT_SYMBOL_GPL(create_workqueue);
> +
> +struct workqueue_struct *create_freeze_exempted_workqueue(const char *name,
> +							int freeze_exempt_events)
> +{
> +	return __create_workqueue(name, 0, freeze_exempt_events);
> +}
> +EXPORT_SYMBOL_GPL(create_freeze_exempted_workqueue);
> +
> +struct workqueue_struct *create_singlethread_workqueue(const char *name)
> +{
> +	return __create_workqueue(name, 1, FE_ALL);
> +}
> +EXPORT_SYMBOL_GPL(create_singlethread_workqueue);

EXPORT_SYMBOL() has a cost, and these functions are simple wrappers. How about
static inline?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 5/8] __cpu_up: use singlethreaded workqueue
  2007-04-02  5:40 ` [PATCH 5/8] __cpu_up: use singlethreaded workqueue Gautham R Shenoy
@ 2007-04-05 12:08   ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 12:08 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/02, Gautham R Shenoy wrote:
>
> Currently i386 and x86_64 __cpu_up uses the services of the kevents
> workqueue to bring the cpu up. Change this and use kthread workqueue
> instead which is single_threaded and won't be frozen during
> CPU_HOTPLUG.
>
> [...snip...]
>
> +void kthreadwq_queue_work(struct work_struct *work)
> +{
> +	queue_work(helper_wq, work);
> +}
> +
> +int kthreadwq_up()
> +{
> +	return (helper_wq != NULL);
> +}

Off-topic question: can't kernel/kmod.c use this workqueue too instead
of its own khelper_wq?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-05 10:53   ` Oleg Nesterov
@ 2007-04-05 12:14     ` Gautham R Shenoy
  2007-04-05 13:34       ` Oleg Nesterov
  0 siblings, 1 reply; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-05 12:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, Apr 05, 2007 at 02:53:56PM +0400, Oleg Nesterov wrote:
> On 04/02, Gautham R Shenoy wrote:
> >
> > +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> > +		thaw_processes(FE_HOTPLUG_CPU);
> > +		return -EBUSY;
> > +	}
> 
> Off-topic. This is a common pattern. Perhaps it makes sense to
> introduce a try_to_freeze_or_thaw_and_return_an_error() helper.

Not a bad idea.
> 
> > @@ -161,10 +141,13 @@ static int _cpu_down(unsigned int cpu)
> >  					    hcpu) == NOTIFY_BAD)
> >  			BUG();
> >
> > -		if (IS_ERR(p)) {
> > +		set_cpus_allowed(current, old_allowed);
> > +
> > +		if (IS_ERR(p))
> >  			err = PTR_ERR(p);
> > -			goto out_allowed;
> > -		}
> > +		else
> > +			err = kthread_stop(p);
> > +
> >  		goto out_thread;
> >  	}
> 
> Why this change? We are doing kthread_stop() + set_cpus_allowed() on
> return. Imho,
> 
> 		if (IS_ERR(p))
> 			goto out_allowed;
> 		goto out_thread;
> 
> looks a bit better. Yes we need a couple of error labels at the end.

Yes, that looks feasible and nice. But I remember making this change for
some subtle reason which I cannot recollect now.

> 
> > --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> > +++ linux-2.6.21-rc5/kernel/softlockup.c
> > @@ -147,6 +147,7 @@ cpu_callback(struct notifier_block *nfb,
> >  	case CPU_DEAD:
> >  		p = per_cpu(watchdog_task, hotcpu);
> >  		per_cpu(watchdog_task, hotcpu) = NULL;
> > +		thaw_process(p);
> >  		kthread_stop(p);
> 
> As it was already discussed, this is racy. As Srivatsa (imho rightly)
> suggested, kthread_stop(p) should thaw process itself. This also allows
> us to kill at least some of wait_for_die loops.
> 

Well, in this case this is not racy. Remember, we're doing a
thaw_process(p) in CPU_DEAD where p *is* frozen for cpu hotplug. So
the where we might call a freeze_process(p) after we do a thaw_process
doesn't seem to be feasible.

But I agree, we should definitely all thaw_process within kthread_stop.

> However, the change in kthread_stop(p) in not enough to close the race.
> We can check kthread_should_stop() in refrigerator(), this looks like
> a most simple approach for now.
> 

Why the check kthread_should_stop() refrigerator() ?
As vatsa mentioned, we would be doing 

task_lock(p);
freezer_exempt(p, FE_ALL); /* Doesn't exist as of now, but we can work
				it out */
thaw_process(p);
task_unlock(p);

wait_for_completion();

So we are serializing the whole thing with task_lock() right?

> Alternatively, Srivatsa suggests to introduce a new task_lock() protected
> task_struct->freezer_state (so we can reliably set FE_ALL). Surely this is
> more poweful, but needs more changes. I am not sure. Perhaps we can do
> this later.

This needs an extra field! We're supposed to be miserly when it comes to
adding new fields to task_struct, now aren't we :-)
> 
> In any case, imho "try3" should add thaw_process() to kthread_stop().
> Gautham, Srivatsa, do you agree?
> 

Completely. Working on it now.
> Oleg.
> 

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-04 17:49             ` Srivatsa Vaddagiri
@ 2007-04-05 12:20               ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 12:20 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On 04/04, Srivatsa Vaddagiri wrote:
>
> On Wed, Apr 04, 2007 at 07:28:28PM +0400, Oleg Nesterov wrote:
> > 
> > 	worker_thread:
> > 
> > 		for (;;) {
> > 			try_to_freeze();
> > 
> > 			prepare_to_wait();
> > 			if (...)
> > 				schedule();
> > 			finish_wait();
> > 		}
> > 
> > This is racy, we can miss freeze_process()->signal_wake_up() if it happens
> > between try_to_freeze() and prepare_to_wait(). We have to check TIF_FREEZE
> > before entering schedule() if we want to fix this race.
> 
> Yes that needs a fix as well. Oh dear, freezer is so fragile to break!
> 
> > Should we? I don't know. This will uglify the code, and the probability
> > of this race is very low.
> 
> Would be nice to fix IMO. Atleast serves to show "how to make your code
> freezer friendly".

This is funny. I "noticed" this race a long ago, when the ->freezeable flag
was introduced. However, looking at 2.6.20 I see that the patch was correct,
and this race was in fact introduced by me in

	[PATCH 1/1] workqueue: don't migrate pending works from the dead CPU
	http://marc.info/?l=linux-kernel&m=117062192709871

I'll send a fix on weekend.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-05 12:14     ` Gautham R Shenoy
@ 2007-04-05 13:34       ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-05 13:34 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Rafael J. Wysocki,
	mingo, dipankar, dino, masami.hiramatsu.pt

On 04/05, Gautham R Shenoy wrote:
>
> > > @@ -147,6 +147,7 @@ cpu_callback(struct notifier_block *nfb,
> > >  	case CPU_DEAD:
> > >  		p = per_cpu(watchdog_task, hotcpu);
> > >  		per_cpu(watchdog_task, hotcpu) = NULL;
> > > +		thaw_process(p);
> > >  		kthread_stop(p);
> > 
> > As it was already discussed, this is racy. As Srivatsa (imho rightly)
> > suggested, kthread_stop(p) should thaw process itself. This also allows
> > us to kill at least some of wait_for_die loops.
> > 
> 
> Well, in this case this is not racy. Remember, we're doing a
> thaw_process(p) in CPU_DEAD where p *is* frozen for cpu hotplug. So
> the where we might call a freeze_process(p) after we do a thaw_process
> doesn't seem to be feasible.

Oops, yes.

> > However, the change in kthread_stop(p) in not enough to close the race.
> > We can check kthread_should_stop() in refrigerator(), this looks like
> > a most simple approach for now.
> > 
> 
> Why the check kthread_should_stop() refrigerator() ?
> As vatsa mentioned, we would be doing 
> 
> task_lock(p);
> freezer_exempt(p, FE_ALL); /* Doesn't exist as of now, but we can work
> 				it out */
> thaw_process(p);
> task_unlock(p);

Please look at http://marc.info/?l=linux-kernel&m=117562018530190, we can't
change p->flags unless we know it is frozen.

> > Alternatively, Srivatsa suggests to introduce a new task_lock() protected
> > task_struct->freezer_state (so we can reliably set FE_ALL). Surely this is
> > more poweful, but needs more changes. I am not sure. Perhaps we can do
> > this later.
> 
> This needs an extra field! We're supposed to be miserly when it comes to
> adding new fields to task_struct, now aren't we :-)

That is why "Perhaps we can do this later" :)

> > In any case, imho "try3" should add thaw_process() to kthread_stop().
> > Gautham, Srivatsa, do you agree?
> > 
> 
> Completely. Working on it now.

Great!

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default
  2007-04-05 11:57   ` Oleg Nesterov
@ 2007-04-05 20:06     ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2007-04-05 20:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, paulmck, torvalds, linux-kernel, vatsa,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, 5 Apr 2007 15:57:14 +0400
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> On 04/02, Gautham R Shenoy wrote:
> >
> > +extern struct workqueue_struct *create_freeze_exempted_workqueue(const char *name, int freeze_exempt_events);
> > +
> > +extern struct workqueue_struct *create_singlethread_workqueue(const char *name);
> 
> I bet akpm will say something about 110-column declaration...

something.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02 20:51       ` Pavel Machek
@ 2007-04-06 14:34         ` Rafael J. Wysocki
  2007-04-06 22:20           ` Nigel Cunningham
  2007-04-09  3:04         ` Gautham R Shenoy
  1 sibling, 1 reply; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-06 14:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, mingo, dipankar, dino, masami.hiramatsu.pt

On Monday, 2 April 2007 22:51, Pavel Machek wrote:
> Hi!
> 
> > > > +/* Per process freezer specific flags */
> > > > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > > > +					 * for suspend
> > > > +					 */
> > > > +
> > > > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > > > +					 * for Kprobes
> > > > +					 */
> > > 
> > > Just put the comment before the define for long comments?
> > 
> > Agreed.
> 
> (Actually it would be nice to say
> 
> /* This thread should not be frozen for suspend, becuase it is needed
>    for getting image saved to disk */
> 
> > > > -#ifdef CONFIG_PM
> > > > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > > > +					defined(CONFIG_KPROBES)
> > > 
> > > Should we create CONFIG_FREEZER?
> > 
> > Why do you think so?  I think the freezer should be compiled automatically
> > if any of the above is set, which is what this directive really means.
> 
> Kconfig can do that. ("select statement"). If we have one such ifdef,
> it is okay, but if it would be more of them.
> 
> > > Hmmm, I do not really like softlockup watchdog running during suspend.
> > > Can we make this freezeable and make watchdog shut itself off while
> > > suspending?
> > 
> > Generally, I agree, but this patch only replaces the existing instances
> > of PF_NOFREEZE with the new mechanism.  The changes you're talking about
> > require a separate patch series (or at least one separate patch), I think, and
> > they need not be so simple to make.
> 
> Agreed about separate patch series.
> 
> > > > -	current->flags |= PF_NOFREEZE;
> > > > +	freezer_exempt(FE_ALL);
> > > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > > >  	if (pid > 0) {
> > > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > > 
> > > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > > That would be very bad...
> > 
> > No, actually it is _required_ for the userland resume to work.  Well, perhaps
> > I should place a comment in there so that I don't have to explain this again
> > and again. :-)
> 
> Can you put big bold comment there?
>
> Why is it needed? Freezer never freezes _current_ task.

No, it doesn't, but this task spawns linuxrc and then calls sys_wait4() in a
loop.

Well, actually, I'll try to plant try_to_freeze() in this loop and see if that
works.  If it doesn't, I'll add a comment.

> > > > @@ -104,7 +104,7 @@ static int __kprobes check_safety(void)
> > > >  {
> > > >  	int ret = 0;
> > > >  #if defined(CONFIG_PREEMPT) && defined(CONFIG_PM)
> > > 
> > > Eh? Why does kprobes code depend on config_pm?
> > 
> > Because it uses the freezer? ;-)
> 
> That is no longer true after this patch... Ugly ifdef above makes sure
> freezer is there for kprobes. I'm trying to say that #if above is
> now broken. Actually it was probably always broken, but it just became
> more so.

Agreed.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-02  5:38 ` [PATCH 3/8] Use process freezer for cpu-hotplug Gautham R Shenoy
  2007-04-05 10:53   ` Oleg Nesterov
@ 2007-04-06 17:27   ` Nathan Lynch
  2007-04-06 17:34     ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Nathan Lynch @ 2007-04-06 17:27 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: akpm, paulmck, torvalds, linux-kernel, vatsa, Oleg Nesterov,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

Gautham R Shenoy wrote:
> This patch implements process_freezer based cpu-hotplug
> core. 
> The sailent features are:
> o No more (un)lock_cpu_hotplug.
> o No more CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. Hence no per-subsystem
>   hotcpu mutexes.
> o Calls freeze_process/thaw_processes at the beginning/end of
>   the hotplug operation.

...

> @@ -133,7 +111,11 @@ static int _cpu_down(unsigned int cpu)
>  	if (!cpu_online(cpu))
>  		return -EINVAL;
>  
> -	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
> +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> +		thaw_processes(FE_HOTPLUG_CPU);
> +		return -EBUSY;
> +	}
> +

If I'm understanding correctly, this will cause

# echo 0 > /sys/devices/system/cpu/cpuX/online

to sometimes fail, and userspace is expected to try again?  This will
break existing applications.

Perhaps drivers/base/cpu.c:store_online should retry as long as
cpu_up/down return -EBUSY.  That would avoid a userspace-visible
interface change.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-06 17:27   ` Nathan Lynch
@ 2007-04-06 17:34     ` Ingo Molnar
  2007-04-06 17:47       ` Nathan Lynch
  2007-04-14 18:48       ` Pavel Machek
  0 siblings, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2007-04-06 17:34 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt


* Nathan Lynch <ntl@pobox.com> wrote:

> > -	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
> > +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> > +		thaw_processes(FE_HOTPLUG_CPU);
> > +		return -EBUSY;
> > +	}
> > +
> 
> If I'm understanding correctly, this will cause
> 
> # echo 0 > /sys/devices/system/cpu/cpuX/online
> 
> to sometimes fail, and userspace is expected to try again?  This will 
> break existing applications.
> 
> Perhaps drivers/base/cpu.c:store_online should retry as long as 
> cpu_up/down return -EBUSY.  That would avoid a userspace-visible 
> interface change.

yeah. I'd even suggest a freeze_processes_nofail() API instead, that 
does this internally, without burdening the callsites. (and once the 
freezer becomes complete then freeze_processes_nofail() == 
freeze_processes())

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-06 17:34     ` Ingo Molnar
@ 2007-04-06 17:47       ` Nathan Lynch
  2007-04-06 22:22         ` Nigel Cunningham
  2007-04-14 18:48       ` Pavel Machek
  1 sibling, 1 reply; 70+ messages in thread
From: Nathan Lynch @ 2007-04-06 17:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, Rafael J. Wysocki, dipankar, dino,
	masami.hiramatsu.pt

Ingo Molnar wrote:
> 
> * Nathan Lynch <ntl@pobox.com> wrote:
> 
> > > -	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
> > > +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> > > +		thaw_processes(FE_HOTPLUG_CPU);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > 
> > If I'm understanding correctly, this will cause
> > 
> > # echo 0 > /sys/devices/system/cpu/cpuX/online
> > 
> > to sometimes fail, and userspace is expected to try again?  This will 
> > break existing applications.
> > 
> > Perhaps drivers/base/cpu.c:store_online should retry as long as 
> > cpu_up/down return -EBUSY.  That would avoid a userspace-visible 
> > interface change.
> 
> yeah. I'd even suggest a freeze_processes_nofail() API instead, that 
> does this internally, without burdening the callsites. (and once the 
> freezer becomes complete then freeze_processes_nofail() == 
> freeze_processes())

Yeah, I just realized that an implementation of my proposal would busy
loop in the kernel forever if a silly admin tried to offline the last
cpu (we're already using -EBUSY for that case), so
freeze_processes_nofail is a better idea :-)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-06 14:34         ` Rafael J. Wysocki
@ 2007-04-06 22:20           ` Nigel Cunningham
  2007-04-07  9:33             ` Rafael J. Wysocki
  0 siblings, 1 reply; 70+ messages in thread
From: Nigel Cunningham @ 2007-04-06 22:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Gautham R Shenoy, akpm, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, mingo, dipankar, dino,
	masami.hiramatsu.pt

Hi.

On Fri, 2007-04-06 at 16:34 +0200, Rafael J. Wysocki wrote:
> On Monday, 2 April 2007 22:51, Pavel Machek wrote:
> > Hi!
> > 
> > > > > +/* Per process freezer specific flags */
> > > > > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > > > > +					 * for suspend
> > > > > +					 */
> > > > > +
> > > > > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > > > > +					 * for Kprobes
> > > > > +					 */
> > > > 
> > > > Just put the comment before the define for long comments?
> > > 
> > > Agreed.
> > 
> > (Actually it would be nice to say
> > 
> > /* This thread should not be frozen for suspend, becuase it is needed
> >    for getting image saved to disk */
> > 
> > > > > -#ifdef CONFIG_PM
> > > > > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > > > > +					defined(CONFIG_KPROBES)
> > > > 
> > > > Should we create CONFIG_FREEZER?
> > > 
> > > Why do you think so?  I think the freezer should be compiled automatically
> > > if any of the above is set, which is what this directive really means.
> > 
> > Kconfig can do that. ("select statement"). If we have one such ifdef,
> > it is okay, but if it would be more of them.
> > 
> > > > Hmmm, I do not really like softlockup watchdog running during suspend.
> > > > Can we make this freezeable and make watchdog shut itself off while
> > > > suspending?
> > > 
> > > Generally, I agree, but this patch only replaces the existing instances
> > > of PF_NOFREEZE with the new mechanism.  The changes you're talking about
> > > require a separate patch series (or at least one separate patch), I think, and
> > > they need not be so simple to make.
> > 
> > Agreed about separate patch series.
> > 
> > > > > -	current->flags |= PF_NOFREEZE;
> > > > > +	freezer_exempt(FE_ALL);
> > > > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > > > >  	if (pid > 0) {
> > > > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > > > 
> > > > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > > > That would be very bad...
> > > 
> > > No, actually it is _required_ for the userland resume to work.  Well, perhaps
> > > I should place a comment in there so that I don't have to explain this again
> > > and again. :-)
> > 
> > Can you put big bold comment there?
> >
> > Why is it needed? Freezer never freezes _current_ task.
> 
> No, it doesn't, but this task spawns linuxrc and then calls sys_wait4() in a
> loop.
> 
> Well, actually, I'll try to plant try_to_freeze() in this loop and see if that
> works.  If it doesn't, I'll add a comment.

It works. I've had:

                while (pid != sys_wait4(-1, NULL, 0, NULL)) {
                        yield();
                        try_to_freeze();
                }

there for ages for Suspend2.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-06 17:47       ` Nathan Lynch
@ 2007-04-06 22:22         ` Nigel Cunningham
  0 siblings, 0 replies; 70+ messages in thread
From: Nigel Cunningham @ 2007-04-06 22:22 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Ingo Molnar, Gautham R Shenoy, akpm, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, dipankar,
	dino, masami.hiramatsu.pt

Hi.

On Fri, 2007-04-06 at 12:47 -0500, Nathan Lynch wrote:
> Ingo Molnar wrote:
> > 
> > * Nathan Lynch <ntl@pobox.com> wrote:
> > 
> > > > -	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
> > > > +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> > > > +		thaw_processes(FE_HOTPLUG_CPU);
> > > > +		return -EBUSY;
> > > > +	}
> > > > +
> > > 
> > > If I'm understanding correctly, this will cause
> > > 
> > > # echo 0 > /sys/devices/system/cpu/cpuX/online
> > > 
> > > to sometimes fail, and userspace is expected to try again?  This will 
> > > break existing applications.
> > > 
> > > Perhaps drivers/base/cpu.c:store_online should retry as long as 
> > > cpu_up/down return -EBUSY.  That would avoid a userspace-visible 
> > > interface change.
> > 
> > yeah. I'd even suggest a freeze_processes_nofail() API instead, that 
> > does this internally, without burdening the callsites. (and once the 
> > freezer becomes complete then freeze_processes_nofail() == 
> > freeze_processes())
> 
> Yeah, I just realized that an implementation of my proposal would busy
> loop in the kernel forever if a silly admin tried to offline the last
> cpu (we're already using -EBUSY for that case), so
> freeze_processes_nofail is a better idea :-)

If there's only one online cpu, shouldn't it return -EINVAL?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-06 22:20           ` Nigel Cunningham
@ 2007-04-07  9:33             ` Rafael J. Wysocki
  2007-04-07  9:47               ` Nigel Cunningham
  0 siblings, 1 reply; 70+ messages in thread
From: Rafael J. Wysocki @ 2007-04-07  9:33 UTC (permalink / raw)
  To: nigel
  Cc: Pavel Machek, Gautham R Shenoy, akpm, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, mingo, dipankar, dino,
	masami.hiramatsu.pt

On Saturday, 7 April 2007 00:20, Nigel Cunningham wrote:
> Hi.
> 
> On Fri, 2007-04-06 at 16:34 +0200, Rafael J. Wysocki wrote:
> > On Monday, 2 April 2007 22:51, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > > > +/* Per process freezer specific flags */
> > > > > > +#define PF_FE_SUSPEND	0x00008000	/* This thread should not be frozen
> > > > > > +					 * for suspend
> > > > > > +					 */
> > > > > > +
> > > > > > +#define PF_FE_KPROBES	0x00000010	/* This thread should not be frozen
> > > > > > +					 * for Kprobes
> > > > > > +					 */
> > > > > 
> > > > > Just put the comment before the define for long comments?
> > > > 
> > > > Agreed.
> > > 
> > > (Actually it would be nice to say
> > > 
> > > /* This thread should not be frozen for suspend, becuase it is needed
> > >    for getting image saved to disk */
> > > 
> > > > > > -#ifdef CONFIG_PM
> > > > > > +#if defined(CONFIG_PM) || defined(CONFIG_HOTPLUG_CPU) || \
> > > > > > +					defined(CONFIG_KPROBES)
> > > > > 
> > > > > Should we create CONFIG_FREEZER?
> > > > 
> > > > Why do you think so?  I think the freezer should be compiled automatically
> > > > if any of the above is set, which is what this directive really means.
> > > 
> > > Kconfig can do that. ("select statement"). If we have one such ifdef,
> > > it is okay, but if it would be more of them.
> > > 
> > > > > Hmmm, I do not really like softlockup watchdog running during suspend.
> > > > > Can we make this freezeable and make watchdog shut itself off while
> > > > > suspending?
> > > > 
> > > > Generally, I agree, but this patch only replaces the existing instances
> > > > of PF_NOFREEZE with the new mechanism.  The changes you're talking about
> > > > require a separate patch series (or at least one separate patch), I think, and
> > > > they need not be so simple to make.
> > > 
> > > Agreed about separate patch series.
> > > 
> > > > > > -	current->flags |= PF_NOFREEZE;
> > > > > > +	freezer_exempt(FE_ALL);
> > > > > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > > > > >  	if (pid > 0) {
> > > > > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > > > > 
> > > > > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > > > > That would be very bad...
> > > > 
> > > > No, actually it is _required_ for the userland resume to work.  Well, perhaps
> > > > I should place a comment in there so that I don't have to explain this again
> > > > and again. :-)
> > > 
> > > Can you put big bold comment there?
> > >
> > > Why is it needed? Freezer never freezes _current_ task.
> > 
> > No, it doesn't, but this task spawns linuxrc and then calls sys_wait4() in a
> > loop.
> > 
> > Well, actually, I'll try to plant try_to_freeze() in this loop and see if that
> > works.  If it doesn't, I'll add a comment.
> 
> It works. I've had:
> 
>                 while (pid != sys_wait4(-1, NULL, 0, NULL)) {
>                         yield();
>                         try_to_freeze();
>                 }
> 
> there for ages for Suspend2.

OK, thanks.  Is there any particular reason to place try_to_freeze() after
yield()?

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-07  9:33             ` Rafael J. Wysocki
@ 2007-04-07  9:47               ` Nigel Cunningham
  0 siblings, 0 replies; 70+ messages in thread
From: Nigel Cunningham @ 2007-04-07  9:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Gautham R Shenoy, akpm, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, mingo, dipankar, dino,
	masami.hiramatsu.pt

Hi.

On Sat, 2007-04-07 at 11:33 +0200, Rafael J. Wysocki wrote:
> On Saturday, 7 April 2007 00:20, Nigel Cunningham wrote:
> > > > > > > -	current->flags |= PF_NOFREEZE;
> > > > > > > +	freezer_exempt(FE_ALL);
> > > > > > >  	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
> > > > > > >  	if (pid > 0) {
> > > > > > >  		while (pid != sys_wait4(-1, NULL, 0, NULL))
> > > > > > 
> > > > > > Does this mean we have userland /linuxrc running with PF_NOFREEZE?
> > > > > > That would be very bad...
> > > > > 
> > > > > No, actually it is _required_ for the userland resume to work.  Well, perhaps
> > > > > I should place a comment in there so that I don't have to explain this again
> > > > > and again. :-)
> > > > 
> > > > Can you put big bold comment there?
> > > >
> > > > Why is it needed? Freezer never freezes _current_ task.
> > > 
> > > No, it doesn't, but this task spawns linuxrc and then calls sys_wait4() in a
> > > loop.
> > > 
> > > Well, actually, I'll try to plant try_to_freeze() in this loop and see if that
> > > works.  If it doesn't, I'll add a comment.
> > 
> > It works. I've had:
> > 
> >                 while (pid != sys_wait4(-1, NULL, 0, NULL)) {
> >                         yield();
> >                         try_to_freeze();
> >                 }
> > 
> > there for ages for Suspend2.
> 
> OK, thanks.  Is there any particular reason to place try_to_freeze() after
> yield()?

Not that I remember. I haven't touched that for years :)

Nigel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend
  2007-04-02 20:51       ` Pavel Machek
  2007-04-06 14:34         ` Rafael J. Wysocki
@ 2007-04-09  3:04         ` Gautham R Shenoy
  1 sibling, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-09  3:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, akpm, paulmck, torvalds, linux-kernel, vatsa,
	Oleg Nesterov, mingo, dipankar, dino, masami.hiramatsu.pt

On Mon, Apr 02, 2007 at 10:51:27PM +0200, Pavel Machek wrote:
> > > 
> > > Should we create CONFIG_FREEZER?
> > 
> > Why do you think so?  I think the freezer should be compiled automatically
> > if any of the above is set, which is what this directive really means.
> 
> Kconfig can do that. ("select statement"). If we have one such ifdef,
> it is okay, but if it would be more of them.
> 

Ok.

> > > Eh? Why does kprobes code depend on config_pm?
> > 
> > Because it uses the freezer? ;-)
> 
> That is no longer true after this patch... Ugly ifdef above makes sure
> freezer is there for kprobes. I'm trying to say that #if above is
> now broken. Actually it was probably always broken, but it just became
> more so.

I have already removed it from in my version 3.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-03 17:18         ` Srivatsa Vaddagiri
  2007-04-04 15:28           ` Oleg Nesterov
@ 2007-04-12  2:22           ` Srivatsa Vaddagiri
  2007-04-12 10:01             ` Gautham R Shenoy
  2007-04-12 16:00             ` Oleg Nesterov
  1 sibling, 2 replies; 70+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-12  2:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Tue, Apr 03, 2007 at 10:48:20PM +0530, Srivatsa Vaddagiri wrote:
> > Actually, we should do this before destroy_workqueue() calls flush_workqueue().
> > Otherwise flush_cpu_workqueue() can hang forever in a similar manner.
> 
> Yep. I guess these are a class of freezer deadlocks very similar to vfork
> parent waiting on child case. I get a feeling these should become common
> outside of kthread too (A waits on B for something, B gets frozen, which
> means A won't freeze causing freezer to fail). Can freezer detect this
> dependency somehow and thaw B automatically? Probably not that easy ..

I wonder if there is some value in "enforcing" an order in which
processes get frozen i.e freeze A first before B. That may solve the
deadlocks we have been discussing wrt kthread_stop and flush_workqueue
as well.

The idea is similar to how deadlock wrt multiple locks are solved -
where a ordering is enforced. Take Lock A first before Lock B. 

If process A waits on B (like in kthread_stop or flush_workqueue), then if we:

	1. Insert A and B in a list (freeze_me_first_list)
	2. Have freezer scan freeze_me_first_list before the master
	   task-list, so that it:
		2a. "freezes A and waits for A to get frozen" first
		2b. "freezes B and waits for B to get frozen" next

then we would avoid the nastiness of "B getting frozen first and A doesnt
freeze because of that" with lesser code changes?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-12  2:22           ` Srivatsa Vaddagiri
@ 2007-04-12 10:01             ` Gautham R Shenoy
  2007-04-12 16:00             ` Oleg Nesterov
  1 sibling, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-12 10:01 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Oleg Nesterov, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, Apr 12, 2007 at 07:52:20AM +0530, Srivatsa Vaddagiri wrote:
> I wonder if there is some value in "enforcing" an order in which
> processes get frozen i.e freeze A first before B. That may solve the
> deadlocks we have been discussing wrt kthread_stop and flush_workqueue
> as well.
>
> The idea is similar to how deadlock wrt multiple locks are solved -
> where a ordering is enforced. Take Lock A first before Lock B. 
> 
> If process A waits on B (like in kthread_stop or flush_workqueue), then if we:
> 
> 	1. Insert A and B in a list (freeze_me_first_list)

In that case, A should insert the dependency into the
freeze_me_first_list as B is unaware of the dependency yet.

What if by the time A has inserted the dependency B is already frozen?

Can very well happen right?

A:			B:			freezer
-------------------------------------------------------------------------------
						*Check the list. don't find  A/B
						*Mark A freezeable.
						*Mark B freezeable.
			*try_to_freeze();
*insert A, B		.		
 into the		.	
 list.			.
			.		
		  	.
* wait_for_		.
  completion(done);	.			/* Freezer fails at this
  			.			 *  point 
			.			 */
		  	.
		  	.
		  	*complete(done);

*try_to_freeze();

Example: A = a thread doing flush_workqueue.
	 B = worker thread.


Of course, we can always use the freezer_skip around this
wait_for_completion as well as long as the thread A is not marked
PF_NOFREEZE. But with multiple freeze events, it won't be as simple as
that.

> -- 
> Regards,
> vatsa

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-12  2:22           ` Srivatsa Vaddagiri
  2007-04-12 10:01             ` Gautham R Shenoy
@ 2007-04-12 16:00             ` Oleg Nesterov
  2007-04-13  9:46               ` Gautham R Shenoy
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2007-04-12 16:00 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Gautham R Shenoy, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On 04/12, Srivatsa Vaddagiri wrote:
>
> On Tue, Apr 03, 2007 at 10:48:20PM +0530, Srivatsa Vaddagiri wrote:
> > > Actually, we should do this before destroy_workqueue() calls flush_workqueue().
> > > Otherwise flush_cpu_workqueue() can hang forever in a similar manner.
> > 
> > Yep. I guess these are a class of freezer deadlocks very similar to vfork
> > parent waiting on child case. I get a feeling these should become common
> > outside of kthread too (A waits on B for something, B gets frozen, which
> > means A won't freeze causing freezer to fail). Can freezer detect this
> > dependency somehow and thaw B automatically? Probably not that easy ..
> 
> I wonder if there is some value in "enforcing" an order in which
> processes get frozen i.e freeze A first before B. That may solve the
> deadlocks we have been discussing wrt kthread_stop and flush_workqueue
> as well.

Perhaps we can add "atomic_t xxx" to task_struct.

	int freezing(struct task_struct *p)
	{
		return test_tsk_thread_flag(p, TIF_FREEZE)
			&& atomic_read(&p->xxx) == 0;
	}

	void xxx_start(struct task_struct *p)
	{
		atomic_inc(p->xxx);
		thaw_process(p);
	}

	xxx_end(struct task_struct *p)
	{
		atomic_dec(p->xxx);
	}

Now,

	xxx_start(p);
	... wait for something which depends on p...
	xxx_end(p);

Of course we need other changes, freeze_process() should check ->xxx, etc.
I am not sure this makes sense.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug
  2007-04-12 16:00             ` Oleg Nesterov
@ 2007-04-13  9:46               ` Gautham R Shenoy
  0 siblings, 0 replies; 70+ messages in thread
From: Gautham R Shenoy @ 2007-04-13  9:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srivatsa Vaddagiri, akpm, paulmck, torvalds, linux-kernel,
	Rafael J. Wysocki, mingo, dipankar, dino, masami.hiramatsu.pt

On Thu, Apr 12, 2007 at 08:00:04PM +0400, Oleg Nesterov wrote:
> On 04/12, Srivatsa Vaddagiri wrote:
> >
> > On Tue, Apr 03, 2007 at 10:48:20PM +0530, Srivatsa Vaddagiri wrote:
> > > > Actually, we should do this before destroy_workqueue() calls flush_workqueue().
> > > > Otherwise flush_cpu_workqueue() can hang forever in a similar manner.
> > > 
> > > Yep. I guess these are a class of freezer deadlocks very similar to vfork
> > > parent waiting on child case. I get a feeling these should become common
> > > outside of kthread too (A waits on B for something, B gets frozen, which
> > > means A won't freeze causing freezer to fail). Can freezer detect this
> > > dependency somehow and thaw B automatically? Probably not that easy ..
> > 
> > I wonder if there is some value in "enforcing" an order in which
> > processes get frozen i.e freeze A first before B. That may solve the
> > deadlocks we have been discussing wrt kthread_stop and flush_workqueue
> > as well.
> 
> Perhaps we can add "atomic_t xxx" to task_struct.
> 
> 	int freezing(struct task_struct *p)
> 	{
> 		return test_tsk_thread_flag(p, TIF_FREEZE)
> 			&& atomic_read(&p->xxx) == 0;
> 	}
> 
> 	void xxx_start(struct task_struct *p)
> 	{
> 		atomic_inc(p->xxx);
> 		thaw_process(p);
> 	}
> 
> 	xxx_end(struct task_struct *p)
> 	{
> 		atomic_dec(p->xxx);
> 	}
> 
> Now,
> 
> 	xxx_start(p);
> 	... wait for something which depends on p...
> 	xxx_end(p);
> 
> Of course we need other changes, freeze_process() should check ->xxx, etc.
> I am not sure this makes sense.

I think this is racy.
Say, if the task which does xxx_start(p) [let's call it task q] is not freezeable, and task p is
already frozen when q  calls xxx_start, then we might be in a situation
where 

- Freezer has declared the whole system to be frozen. Hence the thread
  issuing the 'freeze'(suspend/hotplug) can go ahead and do whatever it wants to.

- Task 'p' which was supposed to be frozen , is now running and
  *surprise* We have a thread running on a cpu which ain't there any more!

I feel we need some kind of a global rwlock. 


DEFINE_RWLOCK(freezer_status_lock);
int  xxx_start(struct task_struct *p)
{
	int ret = 0;
	ret = read_trylock(&freezer_status_lock);
	if(ret) 
	 /* We've succeeded. So lets thaw the chap */
	 thaw_process(p);
	/* If we've failed to acquire trylock, that means freezer doesn't 
	 * depend on us.
	 * So lets simply wait without thawing p
	 */
	
	return ret;

}


void xxx_end(int state)
{
	if(state)
		read_unlock(&freezer_status_lock);
}


int try_to_freeze_tasks()
{
	do {
		/* The regular freezer code */

		if (!todo && !write_trylock(&freezer_status_lock));
			continue;
		/* When the freezer goes back, it will find task 'p'
		 * woken up and hence wait for it to get frozen
		 */
	}while(todo);
}

void try_to_thaw_tasks()
{
	.
	.
	.
	write_unlock(&freezer_status_lock);
}


	int state = xxx_start(p);
	... wait for something which depends on p...
 	xxx_end(state);

However, now that I look at it again, I see that it will fail in our case
where we might need to nest the try_to_freeze_tasks call.

Hmm, we don't have a rwlock variant that allows multiple writers, now do
we?!


> 
> Oleg.
> 

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/8] Use process freezer for cpu-hotplug
  2007-04-06 17:34     ` Ingo Molnar
  2007-04-06 17:47       ` Nathan Lynch
@ 2007-04-14 18:48       ` Pavel Machek
  1 sibling, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2007-04-14 18:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nathan Lynch, Gautham R Shenoy, akpm, paulmck, torvalds,
	linux-kernel, vatsa, Oleg Nesterov, Rafael J. Wysocki, dipankar,
	dino, masami.hiramatsu.pt

Hi!
> > > -	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
> > > +	if (freeze_processes(FE_HOTPLUG_CPU)) {
> > > +		thaw_processes(FE_HOTPLUG_CPU);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > 
> > If I'm understanding correctly, this will cause
> > 
> > # echo 0 > /sys/devices/system/cpu/cpuX/online
> > 
> > to sometimes fail, and userspace is expected to try again?  This will 
> > break existing applications.
> > 
> > Perhaps drivers/base/cpu.c:store_online should retry as long as 
> > cpu_up/down return -EBUSY.  That would avoid a userspace-visible 
> > interface change.
> 
> yeah. I'd even suggest a freeze_processes_nofail() API instead, that 
> does this internally, without burdening the callsites. (and once the 
> freezer becomes complete then freeze_processes_nofail() == 
> freeze_processes())

Not sure if we _can_ do freeze_processes_nofail(). If something is
wrong (process in D state forever because of driver bug?), it looks
better to return error to userspace than looping forever.

You may want to pass higher timeout than 20sec. But if you can't
freeze everything in 1hour, it is unlikely to ever succeed.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2007-04-15  9:37 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-02  5:34 [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy
2007-04-02  5:37 ` [PATCH 1/8] Enhance process freezer interface for usage beyond software suspend Gautham R Shenoy
2007-04-02 13:56   ` Pavel Machek
2007-04-02 20:48     ` Rafael J. Wysocki
2007-04-02 20:51       ` Pavel Machek
2007-04-06 14:34         ` Rafael J. Wysocki
2007-04-06 22:20           ` Nigel Cunningham
2007-04-07  9:33             ` Rafael J. Wysocki
2007-04-07  9:47               ` Nigel Cunningham
2007-04-09  3:04         ` Gautham R Shenoy
2007-04-03  7:59       ` Gautham R Shenoy
2007-04-05  9:46   ` Oleg Nesterov
2007-04-05 10:59     ` Gautham R Shenoy
2007-04-05 11:30       ` Oleg Nesterov
2007-04-02  5:37 ` [PATCH 2/8] Make process freezer reentrant Gautham R Shenoy
2007-04-05  9:53   ` Oleg Nesterov
2007-04-05 10:19     ` Gautham R Shenoy
2007-04-02  5:38 ` [PATCH 3/8] Use process freezer for cpu-hotplug Gautham R Shenoy
2007-04-05 10:53   ` Oleg Nesterov
2007-04-05 12:14     ` Gautham R Shenoy
2007-04-05 13:34       ` Oleg Nesterov
2007-04-06 17:27   ` Nathan Lynch
2007-04-06 17:34     ` Ingo Molnar
2007-04-06 17:47       ` Nathan Lynch
2007-04-06 22:22         ` Nigel Cunningham
2007-04-14 18:48       ` Pavel Machek
2007-04-02  5:39 ` [PATCH 4/8] Rip out lock_cpu_hotplug() Gautham R Shenoy
2007-04-02  5:40 ` [PATCH 5/8] __cpu_up: use singlethreaded workqueue Gautham R Shenoy
2007-04-05 12:08   ` Oleg Nesterov
2007-04-02  5:41 ` [PATCH 6/8] Make non-singlethreaded workqueues freezeable by default Gautham R Shenoy
2007-04-05 11:57   ` Oleg Nesterov
2007-04-05 20:06     ` Andrew Morton
2007-04-02  5:42 ` [PATCH 7/8] Clean up workqueue.c with respect to the freezer based cpu-hotplug Gautham R Shenoy
2007-04-03 11:47   ` Oleg Nesterov
2007-04-03 13:59     ` Srivatsa Vaddagiri
2007-04-03 15:03       ` Oleg Nesterov
2007-04-03 17:18         ` Srivatsa Vaddagiri
2007-04-04 15:28           ` Oleg Nesterov
2007-04-04 17:49             ` Srivatsa Vaddagiri
2007-04-05 12:20               ` Oleg Nesterov
2007-04-12  2:22           ` Srivatsa Vaddagiri
2007-04-12 10:01             ` Gautham R Shenoy
2007-04-12 16:00             ` Oleg Nesterov
2007-04-13  9:46               ` Gautham R Shenoy
2007-04-02  5:42 ` [PATCH 8/8] Make kernel threads freezeable for cpu-hotplug Gautham R Shenoy
2007-04-02  6:16 ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Ingo Molnar
2007-04-02  9:28   ` Srivatsa Vaddagiri
2007-04-02 11:18     ` Ingo Molnar
2007-04-02 12:42       ` Srivatsa Vaddagiri
2007-04-02 14:16         ` Gautham R Shenoy
2007-04-02 18:56         ` Ingo Molnar
2007-04-03 12:56           ` Srivatsa Vaddagiri
2007-04-03 14:15             ` Gautham R Shenoy
2007-04-03 19:25               ` Rafael J. Wysocki
2007-04-04  3:15               ` Srivatsa Vaddagiri
2007-04-04 10:04                 ` Ingo Molnar
2007-04-04 10:41                   ` Gautham R Shenoy
2007-04-04 11:49                     ` Ingo Molnar
2007-04-04 12:24                       ` Gautham R Shenoy
2007-04-02 11:19   ` Gautham R Shenoy
2007-04-02 11:27     ` Ingo Molnar
2007-04-02 22:12       ` Rafael J. Wysocki
2007-04-02 13:22   ` Pavel Machek
2007-04-03 12:01   ` Gautham R Shenoy
2007-04-03 19:34     ` Rafael J. Wysocki
2007-04-03 20:24       ` Andrew Morton
2007-04-04 10:06         ` utrace merge Ingo Molnar
2007-04-04 10:36           ` Christoph Hellwig
2007-04-04 18:41             ` Andrew Morton
2007-04-03 14:01   ` [RFC] Cpu-hotplug: Using the Process Freezer (try2) Gautham R Shenoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.