linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq
@ 2009-01-16 19:11 Mike Travis
  2009-01-16 19:11 ` [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu Mike Travis
                   ` (3 more replies)
  0 siblings, 4 replies; 70+ messages in thread
From: Mike Travis @ 2009-01-16 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Dave Jones; +Cc: Rusty Russell, cpufreq, linux-kernel


This set of patches improves the work_on_cpu() function to
eliminate circular lock dependencies from occurring by requiring
the caller to insure the cpu does not go offline.  This effectively
models the previous behaviour using set_cpus_allowed().

The other fix is to use a separate work queue for work_on_cpu so
it does not clash with kevent items that might already be on the
queue from the same caller.

This fixes the boot up and suspend/resume to disk problems previously
seen.

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Dieter Ries <clip2@gmx.de>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>

-- 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu.
  2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
@ 2009-01-16 19:11 ` Mike Travis
  2009-01-16 19:11 ` [PATCH 2/3] work_on_cpu: Use our own workqueue Mike Travis
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 70+ messages in thread
From: Mike Travis @ 2009-01-16 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Dave Jones; +Cc: Rusty Russell, cpufreq, linux-kernel

[-- Attachment #1: remove-get_online_cpus-from-work_on_cpu.patch --]
[-- Type: text/plain, Size: 1472 bytes --]

From: Rusty Russell <rusty@rustcorp.com.au>

Impact: remove potential circular lock dependency with cpu hotplug lock

This has caused more problems than it solved, with a pile of cpu
hotplug locking issues.

Followup patches will get_online_cpus() in callers that need it, but
if they don't do it they're no worse than before when they were using
set_cpus_allowed without locking.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 kernel/workqueue.c |   14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

--- linux-2.6-for-ingo.orig/kernel/workqueue.c
+++ linux-2.6-for-ingo/kernel/workqueue.c
@@ -991,8 +991,8 @@ static void do_work_for_cpu(struct work_
  * @fn: the function to run
  * @arg: the function arg
  *
- * This will return -EINVAL in the cpu is not online, or the return value
- * of @fn otherwise.
+ * This will return the value @fn returns.
+ * It is up to the caller to ensure that the cpu doesn't go offline.
  */
 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
 {
@@ -1001,14 +1001,8 @@ long work_on_cpu(unsigned int cpu, long 
 	INIT_WORK(&wfc.work, do_work_for_cpu);
 	wfc.fn = fn;
 	wfc.arg = arg;
-	get_online_cpus();
-	if (unlikely(!cpu_online(cpu)))
-		wfc.ret = -EINVAL;
-	else {
-		schedule_work_on(cpu, &wfc.work);
-		flush_work(&wfc.work);
-	}
-	put_online_cpus();
+	schedule_work_on(cpu, &wfc.work);
+	flush_work(&wfc.work);
 
 	return wfc.ret;
 }

-- 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
  2009-01-16 19:11 ` [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu Mike Travis
@ 2009-01-16 19:11 ` Mike Travis
  2009-01-24  8:15   ` Andrew Morton
  2009-01-16 19:11 ` [PATCH 3/3] cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
  2009-01-16 23:38 ` [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request] Mike Travis
  3 siblings, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-16 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Dave Jones; +Cc: Rusty Russell, cpufreq, linux-kernel

[-- Attachment #1: separate-work-queue.patch --]
[-- Type: text/plain, Size: 1274 bytes --]

From: Rusty Russell <rusty@rustcorp.com.au>

Impact: remove potential clashes with generic kevent workqueue

Annoyingly, some places we want to use work_on_cpu are already in
workqueues.  As per Ingo's suggestion, we create a different workqueue
for work_on_cpu.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 kernel/workqueue.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- linux-2.6-for-ingo.orig/kernel/workqueue.c
+++ linux-2.6-for-ingo/kernel/workqueue.c
@@ -971,6 +971,8 @@ undo:
 }
 
 #ifdef CONFIG_SMP
+static struct workqueue_struct *work_on_cpu_wq __read_mostly;
+
 struct work_for_cpu {
 	struct work_struct work;
 	long (*fn)(void *);
@@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long 
 	INIT_WORK(&wfc.work, do_work_for_cpu);
 	wfc.fn = fn;
 	wfc.arg = arg;
-	schedule_work_on(cpu, &wfc.work);
+	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
 	flush_work(&wfc.work);
 
 	return wfc.ret;
@@ -1019,4 +1021,8 @@ void __init init_workqueues(void)
 	hotcpu_notifier(workqueue_cpu_callback, 0);
 	keventd_wq = create_workqueue("events");
 	BUG_ON(!keventd_wq);
+#ifdef CONFIG_SMP
+	work_on_cpu_wq = create_workqueue("work_on_cpu");
+	BUG_ON(!work_on_cpu_wq);
+#endif
 }

-- 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 3/3] cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
  2009-01-16 19:11 ` [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu Mike Travis
  2009-01-16 19:11 ` [PATCH 2/3] work_on_cpu: Use our own workqueue Mike Travis
@ 2009-01-16 19:11 ` Mike Travis
  2009-01-16 23:38 ` [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request] Mike Travis
  3 siblings, 0 replies; 70+ messages in thread
From: Mike Travis @ 2009-01-16 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Dave Jones; +Cc: Rusty Russell, cpufreq, linux-kernel

[-- Attachment #1: 7503bfba.patch --]
[-- Type: text/plain, Size: 2478 bytes --]

Impact: use new work_on_cpu function to reduce stack usage

Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().

Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.

Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

--- linux-2.6-for-ingo.orig/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6-for-ingo/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -150,8 +150,9 @@ struct drv_cmd {
 	u32 val;
 };
 
-static void do_drv_read(struct drv_cmd *cmd)
+static long do_drv_read(void *_cmd)
 {
+	struct drv_cmd *cmd = _cmd;
 	u32 h;
 
 	switch (cmd->type) {
@@ -166,10 +167,12 @@ static void do_drv_read(struct drv_cmd *
 	default:
 		break;
 	}
+	return 0;
 }
 
-static void do_drv_write(struct drv_cmd *cmd)
+static long do_drv_write(void *_cmd)
 {
+	struct drv_cmd *cmd = _cmd;
 	u32 lo, hi;
 
 	switch (cmd->type) {
@@ -186,30 +189,23 @@ static void do_drv_write(struct drv_cmd 
 	default:
 		break;
 	}
+	return 0;
 }
 
 static void drv_read(struct drv_cmd *cmd)
 {
-	cpumask_t saved_mask = current->cpus_allowed;
 	cmd->val = 0;
 
-	set_cpus_allowed_ptr(current, cmd->mask);
-	do_drv_read(cmd);
-	set_cpus_allowed_ptr(current, &saved_mask);
+	work_on_cpu(cpumask_any(cmd->mask), do_drv_read, cmd);
 }
 
 static void drv_write(struct drv_cmd *cmd)
 {
-	cpumask_t saved_mask = current->cpus_allowed;
 	unsigned int i;
 
 	for_each_cpu(i, cmd->mask) {
-		set_cpus_allowed_ptr(current, cpumask_of(i));
-		do_drv_write(cmd);
+		work_on_cpu(i, do_drv_write, cmd);
 	}
-
-	set_cpus_allowed_ptr(current, &saved_mask);
-	return;
 }
 
 static u32 get_cur_val(const struct cpumask *mask)
@@ -367,7 +363,7 @@ static unsigned int get_cur_freq_on_cpu(
 	return freq;
 }
 
-static unsigned int check_freqs(const cpumask_t *mask, unsigned int freq,
+static unsigned int check_freqs(const struct cpumask *mask, unsigned int freq,
 				struct acpi_cpufreq_data *data)
 {
 	unsigned int cur_freq;

-- 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request]
  2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
                   ` (2 preceding siblings ...)
  2009-01-16 19:11 ` [PATCH 3/3] cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
@ 2009-01-16 23:38 ` Mike Travis
  2009-01-17 22:08   ` Ingo Molnar
  3 siblings, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-16 23:38 UTC (permalink / raw)
  To: Ingo Molnar, Dave Jones; +Cc: Rusty Russell, cpufreq, linux-kernel

Mike Travis wrote:
> This set of patches improves the work_on_cpu() function to
> eliminate circular lock dependencies from occurring by requiring
> the caller to insure the cpu does not go offline.  This effectively
> models the previous behaviour using set_cpus_allowed().
> 
> The other fix is to use a separate work queue for work_on_cpu so
> it does not clash with kevent items that might already be on the
> queue from the same caller.
> 
> This fixes the boot up and suspend/resume to disk problems previously
> seen.
> 
> Signed-off-by: Mike Travis <travis@sgi.com>
> Cc: Dieter Ries <clip2@gmx.de>
> Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
> Cc: Dave Jones <davej@redhat.com>
> Cc: <cpufreq@vger.kernel.org>
> 

Hi Ingo,

If you are ready for these, I've pushed them to the cpus4096-for-ingo
tree.  (Note it's been renamed.)

Thanks!
Mike
--- 

The following changes since commit c99dbbe9f8f6b3e9383e64710217e873431d1c31:
  Mike Travis (1):
        sched: fix warning on ia64

are available in the git repository at:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/travis/linux-2.6-cpus4096-for-ingo.git master

Mike Travis (1):
      cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write

Rusty Russell (2):
      work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
      work_on_cpu: Use our own workqueue.

 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   22 +++++++++-------------
 kernel/workqueue.c                         |   20 ++++++++++----------
 2 files changed, 19 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request]
  2009-01-16 23:38 ` [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request] Mike Travis
@ 2009-01-17 22:08   ` Ingo Molnar
  2009-01-19 17:11     ` Mike Travis
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-17 22:08 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, Dave Jones, Rusty Russell, cpufreq, linux-kernel


* Mike Travis <travis@sgi.com> wrote:

> Hi Ingo,
> 
> If you are ready for these, I've pushed them to the cpus4096-for-ingo 
> tree.  (Note it's been renamed.)

that might be fine for v2.6.30, but we need separate patches in a separate 
branch, for all -git relevant fixes - in as safe way as possible.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request]
  2009-01-17 22:08   ` Ingo Molnar
@ 2009-01-19 17:11     ` Mike Travis
  2009-01-19 17:26       ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-19 17:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ingo Molnar, Dave Jones, Rusty Russell, cpufreq, linux-kernel

Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
> 
>> Hi Ingo,
>>
>> If you are ready for these, I've pushed them to the cpus4096-for-ingo 
>> tree.  (Note it's been renamed.)
> 
> that might be fine for v2.6.30, but we need separate patches in a separate 
> branch, for all -git relevant fixes - in as safe way as possible.
> 
> 	Ingo

So you want me to create a separate branch for each patch?

And don't you think the fix to work_on_cpu should be put into 2.6.29?

Thanks,
Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request]
  2009-01-19 17:11     ` Mike Travis
@ 2009-01-19 17:26       ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2009-01-19 17:26 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, Dave Jones, Rusty Russell, cpufreq, linux-kernel


* Mike Travis <travis@sgi.com> wrote:

> Ingo Molnar wrote:
> > * Mike Travis <travis@sgi.com> wrote:
> > 
> >> Hi Ingo,
> >>
> >> If you are ready for these, I've pushed them to the cpus4096-for-ingo 
> >> tree.  (Note it's been renamed.)
> > 
> > that might be fine for v2.6.30, but we need separate patches in a 
> > separate branch, for all -git relevant fixes - in as safe way as 
> > possible.
> > 
> > 	Ingo
> 
> So you want me to create a separate branch for each patch?
> 
> And don't you think the fix to work_on_cpu should be put into 2.6.29?

Getting it into v2.6.29 is precisely the point: please put the essential 
fixes (and only the fixes) into a separate (single), -git based (i.e. not 
tip/cpus4096 based) branch that i can pull. The work_on_cpu() commit(s) 
would be part of that branch.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-16 19:11 ` [PATCH 2/3] work_on_cpu: Use our own workqueue Mike Travis
@ 2009-01-24  8:15   ` Andrew Morton
       [not found]     ` <200901261711.43943.rusty@rustcorp.com.au>
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-24  8:15 UTC (permalink / raw)
  To: Mike Travis; +Cc: Ingo Molnar, Dave Jones, Rusty Russell, cpufreq, linux-kernel

On Fri, 16 Jan 2009 11:11:10 -0800 Mike Travis <travis@sgi.com> wrote:

> From: Rusty Russell <rusty@rustcorp.com.au>
> 
> Impact: remove potential clashes with generic kevent workqueue
> 
> Annoyingly, some places we want to use work_on_cpu are already in
> workqueues.  As per Ingo's suggestion, we create a different workqueue
> for work_on_cpu.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
>  kernel/workqueue.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> --- linux-2.6-for-ingo.orig/kernel/workqueue.c
> +++ linux-2.6-for-ingo/kernel/workqueue.c
> @@ -971,6 +971,8 @@ undo:
>  }
>  
>  #ifdef CONFIG_SMP
> +static struct workqueue_struct *work_on_cpu_wq __read_mostly;

Pity the poor reader who comes along trying to work out why this exists.

>  struct work_for_cpu {
>  	struct work_struct work;
>  	long (*fn)(void *);
> @@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long 
>  	INIT_WORK(&wfc.work, do_work_for_cpu);
>  	wfc.fn = fn;
>  	wfc.arg = arg;
> -	schedule_work_on(cpu, &wfc.work);
> +	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
>  	flush_work(&wfc.work);
>  
>  	return wfc.ret;
> @@ -1019,4 +1021,8 @@ void __init init_workqueues(void)
>  	hotcpu_notifier(workqueue_cpu_callback, 0);
>  	keventd_wq = create_workqueue("events");
>  	BUG_ON(!keventd_wq);
> +#ifdef CONFIG_SMP
> +	work_on_cpu_wq = create_workqueue("work_on_cpu");
> +	BUG_ON(!work_on_cpu_wq);
> +#endif

Yet another kernel thread for each CPU.  All because of some dung way
down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.

Is there no other way?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
       [not found]     ` <200901261711.43943.rusty@rustcorp.com.au>
@ 2009-01-26  7:01       ` Andrew Morton
  2009-01-26 17:16         ` Ingo Molnar
  2009-01-27  7:05         ` Rusty Russell
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-26  7:01 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Mon, 26 Jan 2009 17:11:43 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Saturday 24 January 2009 18:45:37 Andrew Morton wrote:
> > On Fri, 16 Jan 2009 11:11:10 -0800 Mike Travis <travis@sgi.com> wrote:
> > 
> > > From: Rusty Russell <rusty@rustcorp.com.au>
> > > 
> > > Impact: remove potential clashes with generic kevent workqueue
> > > 
> > > Annoyingly, some places we want to use work_on_cpu are already in
> > > workqueues.  As per Ingo's suggestion, we create a different workqueue
> > > for work_on_cpu.
> > > 
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > Signed-off-by: Mike Travis <travis@sgi.com>
> > > ---
> > >  kernel/workqueue.c |    8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > --- linux-2.6-for-ingo.orig/kernel/workqueue.c
> > > +++ linux-2.6-for-ingo/kernel/workqueue.c
> > > @@ -971,6 +971,8 @@ undo:
> > >  }
> > >  
> > >  #ifdef CONFIG_SMP
> > > +static struct workqueue_struct *work_on_cpu_wq __read_mostly;
> > 
> > Pity the poor reader who comes along trying to work out why this exists.

(chirp, chirp)

> > >  struct work_for_cpu {
> > >  	struct work_struct work;
> > >  	long (*fn)(void *);
> > > @@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long 
> > >  	INIT_WORK(&wfc.work, do_work_for_cpu);
> > >  	wfc.fn = fn;
> > >  	wfc.arg = arg;
> > > -	schedule_work_on(cpu, &wfc.work);
> > > +	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
> > >  	flush_work(&wfc.work);
> > >  
> > >  	return wfc.ret;
> > > @@ -1019,4 +1021,8 @@ void __init init_workqueues(void)
> > >  	hotcpu_notifier(workqueue_cpu_callback, 0);
> > >  	keventd_wq = create_workqueue("events");
> > >  	BUG_ON(!keventd_wq);
> > > +#ifdef CONFIG_SMP
> > > +	work_on_cpu_wq = create_workqueue("work_on_cpu");
> > > +	BUG_ON(!work_on_cpu_wq);
> > > +#endif
> > 
> > Yet another kernel thread for each CPU.  All because of some dung way
> > down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.
> > 
> > Is there no other way?
> 
> Perhaps, but this works.  Trying to be clever got me into this mess in the first place.
> 
> We could stop using workqueues and change work_on_cpu to create a thread every time, which would give it a new failure mode so I don't know that everyone could use it any more.  Or we could keep a single thread around to do all the cpus, and duplicate much of the workqueue code.
> 
> None of these options are appealing...

Can we try harder please?  10 screenfuls of kernel threads in the ps
output is just irritating.

How about banning the use of work_on_cpu() from schedule_work()
handlers and then fixing that driver somehow?


What _is_ the bug anyway?  The only description we were given was

  Impact: remove potential clashes with generic kevent workqueue

  Annoyingly, some places we want to use work_on_cpu are already in
  workqueues.  As per Ingo's suggestion, we create a different
  workqueue for work_on_cpu.

which didn't bother telling anyone squat.


When was this bug added?  Was it added into that driver or was it due
to infrastructural changes?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26  7:01       ` Andrew Morton
@ 2009-01-26 17:16         ` Ingo Molnar
  2009-01-26 18:35           ` Andrew Morton
  2009-01-27  7:05         ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 17:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rusty Russell, Mike Travis, Ingo Molnar, Dave Jones, cpufreq,
	linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > > Yet another kernel thread for each CPU.  All because of some dung 
> > > way down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.
> > > 
> > > Is there no other way?
> > 
> > Perhaps, but this works.  Trying to be clever got me into this mess in 
> > the first place.
> > 
> > We could stop using workqueues and change work_on_cpu to create a 
> > thread every time, which would give it a new failure mode so I don't 
> > know that everyone could use it any more.  Or we could keep a single 
> > thread around to do all the cpus, and duplicate much of the workqueue 
> > code.
> > 
> > None of these options are appealing...
> 
> Can we try harder please?  10 screenfuls of kernel threads in the ps 
> output is just irritating.
> 
> How about banning the use of work_on_cpu() from schedule_work() handlers 
> and then fixing that driver somehow?

Yes, but that's fundamentally fragile: anyone who happens to stick the 
wrong thing into keventd (and it's dead easy because schedule_work() is 
easy to use) will lock up work_on_cpu() users.

work_on_cpu() is an important (and lowlevel enough) facility to be 
isolated from casual interaction like that.

> What _is_ the bug anyway?  The only description we were given was
> 
>   Impact: remove potential clashes with generic kevent workqueue
> 
>   Annoyingly, some places we want to use work_on_cpu are already in
>   workqueues.  As per Ingo's suggestion, we create a different
>   workqueue for work_on_cpu.
> 
> which didn't bother telling anyone squat.
> 
> When was this bug added?  Was it added into that driver or was it due to 
> infrastructural changes?

This fixes lockups during bootup caused by the cpumask changes/cleanups 
which changed set_cpus_allowed()+on-kernel-stack-cpumask_t to 
work_on_cpu().

Which was fine except it didnt take into account the interaction with the 
kevents workqueue and the very wide cross section for worklet dependencies 
that this brings with itself. work_on_cpu() was rarely used before so this 
didnt show up.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 17:16         ` Ingo Molnar
@ 2009-01-26 18:35           ` Andrew Morton
  2009-01-26 20:20             ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 18:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 18:16:18 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > > Yet another kernel thread for each CPU.  All because of some dung 
> > > > way down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.
> > > > 
> > > > Is there no other way?
> > > 
> > > Perhaps, but this works.  Trying to be clever got me into this mess in 
> > > the first place.
> > > 
> > > We could stop using workqueues and change work_on_cpu to create a 
> > > thread every time, which would give it a new failure mode so I don't 
> > > know that everyone could use it any more.  Or we could keep a single 
> > > thread around to do all the cpus, and duplicate much of the workqueue 
> > > code.
> > > 
> > > None of these options are appealing...
> > 
> > Can we try harder please?  10 screenfuls of kernel threads in the ps 
> > output is just irritating.
> > 
> > How about banning the use of work_on_cpu() from schedule_work() handlers 
> > and then fixing that driver somehow?
> 
> Yes, but that's fundamentally fragile: anyone who happens to stick the 
> wrong thing into keventd (and it's dead easy because schedule_work() is 
> easy to use) will lock up work_on_cpu() users.
> 

--- a/kernel/workqueue.c~a
+++ a/kernel/workqueue.c
@@ -998,6 +998,8 @@ long work_on_cpu(unsigned int cpu, long 
 {
 	struct work_for_cpu wfc;
 
+	BUG_ON(current_is_keventd());
+
 	INIT_WORK(&wfc.work, do_work_for_cpu);
 	wfc.fn = fn;
 	wfc.arg = arg;
_


That wasn't so hard.

> work_on_cpu() is an important (and lowlevel enough) facility to be 
> isolated from casual interaction like that.

We have one single (known) caller in the whole kernel.  This is not
worth adding another great pile of kernel threads for!

> > What _is_ the bug anyway?  The only description we were given was
> > 
> >   Impact: remove potential clashes with generic kevent workqueue
> > 
> >   Annoyingly, some places we want to use work_on_cpu are already in
> >   workqueues.  As per Ingo's suggestion, we create a different
> >   workqueue for work_on_cpu.
> > 
> > which didn't bother telling anyone squat.
> > 
> > When was this bug added?  Was it added into that driver or was it due to 
> > infrastructural changes?
> 
> This fixes lockups during bootup caused by the cpumask changes/cleanups 
> which changed set_cpus_allowed()+on-kernel-stack-cpumask_t to 
> work_on_cpu().
> 
> Which was fine except it didnt take into account the interaction with the 
> kevents workqueue and the very wide cross section for worklet dependencies 
> that this brings with itself. work_on_cpu() was rarely used before so this 
> didnt show up.

Am still awaiting an understandable description of this alleged bug :(

If someone can be persuaded to cough up this information (which should
have been in the changelog from day #1) then perhaps someone will be
able to think of a more pleasing fix.  That's one of the reasons for
writing changelogs.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 18:35           ` Andrew Morton
@ 2009-01-26 20:20             ` Ingo Molnar
  2009-01-26 20:43               ` Mike Travis
  2009-01-26 21:00               ` Andrew Morton
  0 siblings, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 20:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 26 Jan 2009 18:16:18 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > > Yet another kernel thread for each CPU.  All because of some dung 
> > > > > way down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.
> > > > > 
> > > > > Is there no other way?
> > > > 
> > > > Perhaps, but this works.  Trying to be clever got me into this mess in 
> > > > the first place.
> > > > 
> > > > We could stop using workqueues and change work_on_cpu to create a 
> > > > thread every time, which would give it a new failure mode so I don't 
> > > > know that everyone could use it any more.  Or we could keep a single 
> > > > thread around to do all the cpus, and duplicate much of the workqueue 
> > > > code.
> > > > 
> > > > None of these options are appealing...
> > > 
> > > Can we try harder please?  10 screenfuls of kernel threads in the ps 
> > > output is just irritating.
> > > 
> > > How about banning the use of work_on_cpu() from schedule_work() handlers 
> > > and then fixing that driver somehow?
> > 
> > Yes, but that's fundamentally fragile: anyone who happens to stick the 
> > wrong thing into keventd (and it's dead easy because schedule_work() is 
> > easy to use) will lock up work_on_cpu() users.
> > 
> 
> --- a/kernel/workqueue.c~a
> +++ a/kernel/workqueue.c
> @@ -998,6 +998,8 @@ long work_on_cpu(unsigned int cpu, long 
>  {
>  	struct work_for_cpu wfc;
>  
> +	BUG_ON(current_is_keventd());
> +
>  	INIT_WORK(&wfc.work, do_work_for_cpu);
>  	wfc.fn = fn;
>  	wfc.arg = arg;
> _
> 
> 
> That wasn't so hard.

What is the purpose of your change? I'm not sure you understood the 
problem. The problem is not with work_on_cpu() usage. The problem is:

 1) holding locks while calling work_on_cpu()

 2) same locks being taken by a worklet used by some other code

work_on_cpu() really wants to serialize on its own workload only, not on 
the other stuff that might be sometimes be queued up in the keventd 
workqueue.

> > work_on_cpu() is an important (and lowlevel enough) facility to be 
> > isolated from casual interaction like that.
> 
> We have one single (known) caller in the whole kernel.  This is not 
> worth adding another great pile of kernel threads for!

i'd expect there to be more as part of the cpumask stack reduction 
patches that Rusty and Mike are working on.

in any case it's a correctness issue: work_on_cpu() is a just as generic 
facility as on_each_cpu() - with the difference that it can handle 
blocking contexts too.

So if it's generic it ought to be implemented in a generic way - not a 
"dont use from any codepath that has a lock held that might occasionally 
also be held in a keventd worklet". (which is a totally unmaintainable 
proposition and which would just cause repeat bugs again and again.)

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 20:20             ` Ingo Molnar
@ 2009-01-26 20:43               ` Mike Travis
  2009-01-26 21:00               ` Andrew Morton
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Travis @ 2009-01-26 20:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, rusty, mingo, davej, cpufreq, linux-kernel

...
>> We have one single (known) caller in the whole kernel.  This is not 
>> worth adding another great pile of kernel threads for!
> 
> i'd expect there to be more as part of the cpumask stack reduction 
> patches that Rusty and Mike are working on.

There are currently 10 instances queued up in tip/cpus4096 along with
another four "FIXME:  use work_on_cpu()"  comments.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 20:20             ` Ingo Molnar
  2009-01-26 20:43               ` Mike Travis
@ 2009-01-26 21:00               ` Andrew Morton
  2009-01-26 21:27                 ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 21:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel, Oleg Nesterov

On Mon, 26 Jan 2009 21:20:22 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Mon, 26 Jan 2009 18:16:18 +0100
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > > > Yet another kernel thread for each CPU.  All because of some dung 
> > > > > > way down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c.
> > > > > > 
> > > > > > Is there no other way?
> > > > > 
> > > > > Perhaps, but this works.  Trying to be clever got me into this mess in 
> > > > > the first place.
> > > > > 
> > > > > We could stop using workqueues and change work_on_cpu to create a 
> > > > > thread every time, which would give it a new failure mode so I don't 
> > > > > know that everyone could use it any more.  Or we could keep a single 
> > > > > thread around to do all the cpus, and duplicate much of the workqueue 
> > > > > code.
> > > > > 
> > > > > None of these options are appealing...
> > > > 
> > > > Can we try harder please?  10 screenfuls of kernel threads in the ps 
> > > > output is just irritating.
> > > > 
> > > > How about banning the use of work_on_cpu() from schedule_work() handlers 
> > > > and then fixing that driver somehow?
> > > 
> > > Yes, but that's fundamentally fragile: anyone who happens to stick the 
> > > wrong thing into keventd (and it's dead easy because schedule_work() is 
> > > easy to use) will lock up work_on_cpu() users.
> > > 
> > 
> > --- a/kernel/workqueue.c~a
> > +++ a/kernel/workqueue.c
> > @@ -998,6 +998,8 @@ long work_on_cpu(unsigned int cpu, long 
> >  {
> >  	struct work_for_cpu wfc;
> >  
> > +	BUG_ON(current_is_keventd());
> > +
> >  	INIT_WORK(&wfc.work, do_work_for_cpu);
> >  	wfc.fn = fn;
> >  	wfc.arg = arg;
> > _
> > 
> > 
> > That wasn't so hard.
> 
> What is the purpose of your change? I'm not sure you understood the 
> problem.

Well.  That's because I was forced to resort to guesswork.

> The problem is not with work_on_cpu() usage. The problem is:
> 
>  1) holding locks while calling work_on_cpu()
> 
>  2) same locks being taken by a worklet used by some other code
> 
> work_on_cpu() really wants to serialize on its own workload only, not on 
> the other stuff that might be sometimes be queued up in the keventd 
> workqueue.

but but but, we fixed that ages ago, I think.  But I don't see the code
there.

If we want to wait on a *particular* keventd work item then we
shouldn't wait on all the other queued ones.

- If it's currently running, wait on it

- If it isn't yet running, detach it from the queue and run it directly.

Maybe I'm thinking of a different subsystem, but I don't think so. 
Maybe Oleg recalls what happened to that?

> > > work_on_cpu() is an important (and lowlevel enough) facility to be 
> > > isolated from casual interaction like that.
> > 
> > We have one single (known) caller in the whole kernel.  This is not 
> > worth adding another great pile of kernel threads for!
> 
> i'd expect there to be more as part of the cpumask stack reduction 
> patches that Rusty and Mike are working on.
> 
> in any case it's a correctness issue: work_on_cpu() is a just as generic 
> facility as on_each_cpu() - with the difference that it can handle 
> blocking contexts too.

Well on_each_cpu() has restrictions.  Can't all it with local
interrupts disabled.  Can't call it (synchronously) while holding locks
which the callback takes.

> So if it's generic it ought to be implemented in a generic way - not a 
> "dont use from any codepath that has a lock held that might occasionally 
> also be held in a keventd worklet". (which is a totally unmaintainable 
> proposition and which would just cause repeat bugs again and again.)

That's different.  The core fault here lies in the keventd workqueue
handling code.  If we're flushing work A then we shouldn't go and block
behind unrelated work B.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:00               ` Andrew Morton
@ 2009-01-26 21:27                 ` Ingo Molnar
  2009-01-26 21:35                   ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 21:27 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov, Peter Zijlstra
  Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel, Oleg Nesterov


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > So if it's generic it ought to be implemented in a generic way - not a 
> > "dont use from any codepath that has a lock held that might 
> > occasionally also be held in a keventd worklet". (which is a totally 
> > unmaintainable proposition and which would just cause repeat bugs 
> > again and again.)
> 
> That's different.  The core fault here lies in the keventd workqueue 
> handling code.  If we're flushing work A then we shouldn't go and block 
> behind unrelated work B.

the blocking is inherent in the concept of "a queue of worklets handled by 
a single thread".

If a worklet is blocked then all other work performed by that thread is 
blocked as well. So by waiting on a piece of work in the queue, we wait 
for all prior work queued up there as well.

The only way to decouple that and to make them independent (and hence 
independently flushable) is to create more parallel flows of execution: 
i.e. by creating another thread (another workqueue).

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:27                 ` Ingo Molnar
@ 2009-01-26 21:35                   ` Andrew Morton
  2009-01-26 21:45                     ` Ingo Molnar
  2009-01-26 21:50                     ` Oleg Nesterov
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 21:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 22:27:27 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > So if it's generic it ought to be implemented in a generic way - not a 
> > > "dont use from any codepath that has a lock held that might 
> > > occasionally also be held in a keventd worklet". (which is a totally 
> > > unmaintainable proposition and which would just cause repeat bugs 
> > > again and again.)
> > 
> > That's different.  The core fault here lies in the keventd workqueue 
> > handling code.  If we're flushing work A then we shouldn't go and block 
> > behind unrelated work B.
> 
> the blocking is inherent in the concept of "a queue of worklets handled by 
> a single thread".
> 
> If a worklet is blocked then all other work performed by that thread is 
> blocked as well. So by waiting on a piece of work in the queue, we wait 
> for all prior work queued up there as well.
> 
> The only way to decouple that and to make them independent (and hence 
> independently flushable) is to create more parallel flows of execution: 
> i.e. by creating another thread (another workqueue).
> 

Nope.  As I said, the caller of flush_work() can detach the work item
and run it directly.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:35                   ` Andrew Morton
@ 2009-01-26 21:45                     ` Ingo Molnar
  2009-01-26 22:01                       ` Andrew Morton
  2009-01-26 21:50                     ` Oleg Nesterov
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 21:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 26 Jan 2009 22:27:27 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > So if it's generic it ought to be implemented in a generic way - not a 
> > > > "dont use from any codepath that has a lock held that might 
> > > > occasionally also be held in a keventd worklet". (which is a totally 
> > > > unmaintainable proposition and which would just cause repeat bugs 
> > > > again and again.)
> > > 
> > > That's different.  The core fault here lies in the keventd workqueue 
> > > handling code.  If we're flushing work A then we shouldn't go and 
> > > block behind unrelated work B.
> > 
> > the blocking is inherent in the concept of "a queue of worklets 
> > handled by a single thread".
> > 
> > If a worklet is blocked then all other work performed by that thread 
> > is blocked as well. So by waiting on a piece of work in the queue, we 
> > wait for all prior work queued up there as well.
> > 
> > The only way to decouple that and to make them independent (and hence 
> > independently flushable) is to create more parallel flows of 
> > execution: i.e. by creating another thread (another workqueue).
> > 
> 
> Nope.  As I said, the caller of flush_work() can detach the work item 
> and run it directly.

that would change the concept of execution but indeed it would be 
interesting to try. It's outside the scope of late -rcs i guess, but 
worthwile nevertheless.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:35                   ` Andrew Morton
  2009-01-26 21:45                     ` Ingo Molnar
@ 2009-01-26 21:50                     ` Oleg Nesterov
  2009-01-26 22:17                       ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-26 21:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq,
	linux-kernel

On 01/26, Andrew Morton wrote:
>
> On Mon, 26 Jan 2009 22:27:27 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > So if it's generic it ought to be implemented in a generic way - not a 
> > > > "dont use from any codepath that has a lock held that might 
> > > > occasionally also be held in a keventd worklet". (which is a totally 
> > > > unmaintainable proposition and which would just cause repeat bugs 
> > > > again and again.)
> > > 
> > > That's different.  The core fault here lies in the keventd workqueue 
> > > handling code.  If we're flushing work A then we shouldn't go and block 
> > > behind unrelated work B.
> > 
> > the blocking is inherent in the concept of "a queue of worklets handled by 
> > a single thread".
> > 
> > If a worklet is blocked then all other work performed by that thread is 
> > blocked as well. So by waiting on a piece of work in the queue, we wait 
> > for all prior work queued up there as well.
> > 
> > The only way to decouple that and to make them independent (and hence 
> > independently flushable) is to create more parallel flows of execution: 
> > i.e. by creating another thread (another workqueue).
> > 
> 
> Nope.  As I said, the caller of flush_work() can detach the work item
> and run it directly.

I am totally confused, just can't understand this thread...

I guess, we can't detach and execute because we can run on the
different CPU.

But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu."
removes get_online_cpus/put_online_cpus, this means the work can run on
the wrong CPU anyway. Or work_on_cpu() can hang forever if CPU has already
gone away before queue_work_on().

Confused.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:45                     ` Ingo Molnar
@ 2009-01-26 22:01                       ` Andrew Morton
  2009-01-26 22:05                         ` Ingo Molnar
  2009-01-26 22:15                         ` Oleg Nesterov
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 22:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 22:45:16 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Mon, 26 Jan 2009 22:27:27 +0100
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > > So if it's generic it ought to be implemented in a generic way - not a 
> > > > > "dont use from any codepath that has a lock held that might 
> > > > > occasionally also be held in a keventd worklet". (which is a totally 
> > > > > unmaintainable proposition and which would just cause repeat bugs 
> > > > > again and again.)
> > > > 
> > > > That's different.  The core fault here lies in the keventd workqueue 
> > > > handling code.  If we're flushing work A then we shouldn't go and 
> > > > block behind unrelated work B.
> > > 
> > > the blocking is inherent in the concept of "a queue of worklets 
> > > handled by a single thread".
> > > 
> > > If a worklet is blocked then all other work performed by that thread 
> > > is blocked as well. So by waiting on a piece of work in the queue, we 
> > > wait for all prior work queued up there as well.
> > > 
> > > The only way to decouple that and to make them independent (and hence 
> > > independently flushable) is to create more parallel flows of 
> > > execution: i.e. by creating another thread (another workqueue).
> > > 
> > 
> > Nope.  As I said, the caller of flush_work() can detach the work item 
> > and run it directly.
> 
> that would change the concept of execution but indeed it would be 
> interesting to try. It's outside the scope of late -rcs i guess, but 
> worthwile nevertheless.
> 

Well it turns out that I was having a less-than-usually-senile moment:

: commit b89deed32ccc96098bd6bc953c64bba6b847774f
: Author:     Oleg Nesterov <oleg@tv-sign.ru>
: AuthorDate: Wed May 9 02:33:52 2007 -0700
: Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
: CommitDate: Wed May 9 12:30:50 2007 -0700
: 
:     implement flush_work()
:     
:     A basic problem with flush_scheduled_work() is that it blocks behind _all_
:     presently-queued works, rather than just the work whcih the caller wants to
:     flush.  If the caller holds some lock, and if one of the queued work happens
:     to want that lock as well then accidental deadlocks can occur.
:     
:     One example of this is the phy layer: it wants to flush work while holding
:     rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
:     deadlock because the linkwatch callback function takes rtnl_lock.
:     
:     So we implement a new function which will flush a *single* work - just the one
:     which the caller wants to free up.  Thus we avoid the accidental deadlocks
:     which can arise from unrelated subsystems' callbacks taking shared locks.
:     
:     flush_work() non-blockingly dequeues the work_struct which we want to kill,
:     then it waits for its handler to complete on all CPUs.
:     
:     Add ->current_work to the "struct cpu_workqueue_struct", it points to
:     currently running "struct work_struct". When flush_work(work) detects
:     ->current_work == work, it inserts a barrier at the _head_ of ->worklist
:     (and thus right _after_ that work) and waits for completition. This means
:     that the next work fired on that CPU will be this barrier, or another
:     barrier queued by concurrent flush_work(), so the caller of flush_work()
:     will be woken before any "regular" work has a chance to run.
:     
:     When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
:     against CPU hotplug), CPU may go away. But in that case take_over_work() will
:     move a barrier we queued to another CPU, it will be fired sometime, and
:     wait_on_work() will be woken.
:     
:     Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
:     take_over_work(), so cwq->thread should complete its ->worklist (and thus
:     the barrier), because currently we don't check kthread_should_stop() in
:     run_workqueue(). But even if we did, everything should be ok.


Why isn't that working in this case??


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:01                       ` Andrew Morton
@ 2009-01-26 22:05                         ` Ingo Molnar
  2009-01-26 22:16                           ` Andrew Morton
  2009-01-26 22:15                         ` Oleg Nesterov
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> Well it turns out that I was having a less-than-usually-senile moment:
> 
> :     implement flush_work()

> Why isn't that working in this case??

how would that work in this case? We defer processing into the workqueue 
exactly because we want its per-CPU properties. We want work_on_cpu() to 
be done in the workqueue context on the CPUs that were specified, not in 
the local CPU context.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:01                       ` Andrew Morton
  2009-01-26 22:05                         ` Ingo Molnar
@ 2009-01-26 22:15                         ` Oleg Nesterov
  2009-01-26 22:24                           ` Ingo Molnar
  1 sibling, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-26 22:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq,
	linux-kernel

On 01/26, Andrew Morton wrote:
>
> On Mon, 26 Jan 2009 22:45:16 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
>
> > that would change the concept of execution but indeed it would be 
> > interesting to try. It's outside the scope of late -rcs i guess, but 
> > worthwile nevertheless.
> >
>
> Well it turns out that I was having a less-than-usually-senile moment:
>
> : commit b89deed32ccc96098bd6bc953c64bba6b847774f
> : Author:     Oleg Nesterov <oleg@tv-sign.ru>
> : AuthorDate: Wed May 9 02:33:52 2007 -0700
> : Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
> : CommitDate: Wed May 9 12:30:50 2007 -0700
> : 
> :     implement flush_work()
> :     
> :     A basic problem with flush_scheduled_work() is that it blocks behind _all_
> :     presently-queued works, rather than just the work whcih the caller wants to
> :     flush.  If the caller holds some lock, and if one of the queued work happens
> :     to want that lock as well then accidental deadlocks can occur.
> :     
> :     One example of this is the phy layer: it wants to flush work while holding
> :     rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
> :     deadlock because the linkwatch callback function takes rtnl_lock.
> :     
> :     So we implement a new function which will flush a *single* work - just the one
> :     which the caller wants to free up.  Thus we avoid the accidental deadlocks
> :     which can arise from unrelated subsystems' callbacks taking shared locks.
> :     
> :     flush_work() non-blockingly dequeues the work_struct which we want to kill,
> :     then it waits for its handler to complete on all CPUs.
> :     
> :     Add ->current_work to the "struct cpu_workqueue_struct", it points to
> :     currently running "struct work_struct". When flush_work(work) detects
> :     ->current_work == work, it inserts a barrier at the _head_ of ->worklist
> :     (and thus right _after_ that work) and waits for completition. This means
> :     that the next work fired on that CPU will be this barrier, or another
> :     barrier queued by concurrent flush_work(), so the caller of flush_work()
> :     will be woken before any "regular" work has a chance to run.
> :     
> :     When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
> :     against CPU hotplug), CPU may go away. But in that case take_over_work() will
> :     move a barrier we queued to another CPU, it will be fired sometime, and
> :     wait_on_work() will be woken.
> :     
> :     Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
> :     take_over_work(), so cwq->thread should complete its ->worklist (and thus
> :     the barrier), because currently we don't check kthread_should_stop() in
> :     run_workqueue(). But even if we did, everything should be ok.
> 
> 
> Why isn't that working in this case??

Cough. Because that "flush_work()" was renamed to cancel_work_sync(). Because
it really cancells the work_struct if it can.

Now we have flush_work() which does not cancel, but waits for completion of
the single work_struct. Of course, it can hang if the caller holds the lock
which can be taken by another work in that workqueue.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:05                         ` Ingo Molnar
@ 2009-01-26 22:16                           ` Andrew Morton
  2009-01-26 22:20                             ` Ingo Molnar
  2009-01-26 22:31                             ` Oleg Nesterov
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 22:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 23:05:37 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Well it turns out that I was having a less-than-usually-senile moment:
> > 
> > :     implement flush_work()
> 
> > Why isn't that working in this case??
> 
> how would that work in this case? We defer processing into the workqueue 
> exactly because we want its per-CPU properties.

It detaches the work item, moves it to head-of-queue, reinserts it then
waits on it.  I think.

This might have a race+hole.  If a currently-running "unrelated"
work item tries to take the lock which the flush_work() caller is holding
then there's no way in which keventd will come back to execute
the work item which we just put on the head of queue.

> We want work_on_cpu() to 
> be done in the workqueue context on the CPUs that were specified, not in 
> the local CPU context.

flush_work() is supposed to work in the way which you describe.

But Oleg's "we may be running on a different CPU" comment has me all
confused.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 21:50                     ` Oleg Nesterov
@ 2009-01-26 22:17                       ` Ingo Molnar
  2009-01-26 23:01                         ` Mike Travis
  2009-01-27  7:15                         ` Rusty Russell
  0 siblings, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, a.p.zijlstra, rusty, travis, mingo, davej,
	cpufreq, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in 
> work_on_cpu." removes get_online_cpus/put_online_cpus, this means the 
> work can run on the wrong CPU anyway. Or work_on_cpu() can hang forever 
> if CPU has already gone away before queue_work_on().
> 
> Confused.

The idea was to require work_on_cpu() users to be CPU hotplug-safe. But 
... Rusty pointed it out in the past that this might be fragile, and we 
could put back the get_online_cpus()/put_online_cpus() calls.

Rusty, what do you think?

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:16                           ` Andrew Morton
@ 2009-01-26 22:20                             ` Ingo Molnar
  2009-01-26 22:50                               ` Andrew Morton
  2009-01-26 22:31                             ` Oleg Nesterov
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 26 Jan 2009 23:05:37 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > Well it turns out that I was having a less-than-usually-senile moment:
> > > 
> > > :     implement flush_work()
> > 
> > > Why isn't that working in this case??
> > 
> > how would that work in this case? We defer processing into the workqueue 
> > exactly because we want its per-CPU properties.
> 
> It detaches the work item, moves it to head-of-queue, reinserts it then 
> waits on it.  I think.
> 
> This might have a race+hole.  If a currently-running "unrelated" work 
> item tries to take the lock which the flush_work() caller is holding 
> then there's no way in which keventd will come back to execute the work 
> item which we just put on the head of queue.

Correct - or the unrelated worklet might also be blocked on something - so 
the window is rather large.

> > We want work_on_cpu() to be done in the workqueue context on the CPUs 
> > that were specified, not in the local CPU context.
> 
> flush_work() is supposed to work in the way which you describe.
> 
> But Oleg's "we may be running on a different CPU" comment has me all 
> confused.

well, we call this on any arbitrary CPU:

   long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)

To execute fn() on 'cpu'. We converted wacky callers that did direct 
p->cpus_allowed twiddling (and on-stack saving) and set_cpus_allowed() 
calls to this elegant-looking work_on_cpu() call which just promised 
exactly this functionality but cleanly so.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:15                         ` Oleg Nesterov
@ 2009-01-26 22:24                           ` Ingo Molnar
  2009-01-26 22:37                             ` Oleg Nesterov
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, a.p.zijlstra, rusty, travis, mingo, davej,
	cpufreq, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 01/26, Andrew Morton wrote:
> >
> > On Mon, 26 Jan 2009 22:45:16 +0100
> > Ingo Molnar <mingo@elte.hu> wrote:
> >
> > > that would change the concept of execution but indeed it would be 
> > > interesting to try. It's outside the scope of late -rcs i guess, but 
> > > worthwile nevertheless.
> > >
> >
> > Well it turns out that I was having a less-than-usually-senile moment:
> >
> > : commit b89deed32ccc96098bd6bc953c64bba6b847774f
> > : Author:     Oleg Nesterov <oleg@tv-sign.ru>
> > : AuthorDate: Wed May 9 02:33:52 2007 -0700
> > : Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
> > : CommitDate: Wed May 9 12:30:50 2007 -0700
> > : 
> > :     implement flush_work()
> > :     
> > :     A basic problem with flush_scheduled_work() is that it blocks behind _all_
> > :     presently-queued works, rather than just the work whcih the caller wants to
> > :     flush.  If the caller holds some lock, and if one of the queued work happens
> > :     to want that lock as well then accidental deadlocks can occur.
> > :     
> > :     One example of this is the phy layer: it wants to flush work while holding
> > :     rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
> > :     deadlock because the linkwatch callback function takes rtnl_lock.
> > :     
> > :     So we implement a new function which will flush a *single* work - just the one
> > :     which the caller wants to free up.  Thus we avoid the accidental deadlocks
> > :     which can arise from unrelated subsystems' callbacks taking shared locks.
> > :     
> > :     flush_work() non-blockingly dequeues the work_struct which we want to kill,
> > :     then it waits for its handler to complete on all CPUs.
> > :     
> > :     Add ->current_work to the "struct cpu_workqueue_struct", it points to
> > :     currently running "struct work_struct". When flush_work(work) detects
> > :     ->current_work == work, it inserts a barrier at the _head_ of ->worklist
> > :     (and thus right _after_ that work) and waits for completition. This means
> > :     that the next work fired on that CPU will be this barrier, or another
> > :     barrier queued by concurrent flush_work(), so the caller of flush_work()
> > :     will be woken before any "regular" work has a chance to run.
> > :     
> > :     When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
> > :     against CPU hotplug), CPU may go away. But in that case take_over_work() will
> > :     move a barrier we queued to another CPU, it will be fired sometime, and
> > :     wait_on_work() will be woken.
> > :     
> > :     Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
> > :     take_over_work(), so cwq->thread should complete its ->worklist (and thus
> > :     the barrier), because currently we don't check kthread_should_stop() in
> > :     run_workqueue(). But even if we did, everything should be ok.
> > 
> > 
> > Why isn't that working in this case??
> 
> Cough. Because that "flush_work()" was renamed to cancel_work_sync(). 
> Because it really cancells the work_struct if it can.
> 
> Now we have flush_work() which does not cancel, but waits for completion 
> of the single work_struct. Of course, it can hang if the caller holds 
> the lock which can be taken by another work in that workqueue.
> 
> Oleg.

Andrew's suggestion does make sense though: for any not-in-progress 
worklet we can dequeue that worklet and execute it in the flushing 
context. [ And if that worklet cannot be dequeued because it's being 
processed then that's fine and we can wait on that single worklet, without 
waiting on any other 'unrelated' worklets. ]

That does not help work_on_cpu() though: that facility really uses the 
fact that workqueues are implemented via per CPU threads - hence we cannot 
remove the worklet from the queue and execute it in the flushing context.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:16                           ` Andrew Morton
  2009-01-26 22:20                             ` Ingo Molnar
@ 2009-01-26 22:31                             ` Oleg Nesterov
  1 sibling, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-26 22:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq,
	linux-kernel

On 01/26, Andrew Morton wrote:
>
> On Mon, 26 Jan 2009 23:05:37 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
>
> >
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > > Well it turns out that I was having a less-than-usually-senile moment:
> > >
> > > :     implement flush_work()
> >
> > > Why isn't that working in this case??
> >
> > how would that work in this case? We defer processing into the workqueue 
> > exactly because we want its per-CPU properties.
>
> It detaches the work item, moves it to head-of-queue, reinserts it then
> waits on it.  I think.

No, no helper works this way.

The reinsert doesn't make sense for cancel_work.

As for flush_work(), I think it is possible to do, but can't help to
avoid the deadlocks. Because we still have to wait for ->current_work.

> This might have a race+hole.  If a currently-running "unrelated"
> work item tries to take the lock which the flush_work() caller is holding
> then there's no way in which keventd will come back to execute
> the work item which we just put on the head of queue.

Yes.

> > We want work_on_cpu() to
> > be done in the workqueue context on the CPUs that were specified, not in
> > the local CPU context.
>
> flush_work() is supposed to work in the way which you describe.

Yes,

> But Oleg's "we may be running on a different CPU" comment has me all
> confused.

I meant, that

	> the caller of flush_work() can detach the work item
	> and run it directly.

this is not possible in work_on_cpu() case, we can't run it directly,
we want it to run on the target CPU.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:24                           ` Ingo Molnar
@ 2009-01-26 22:37                             ` Oleg Nesterov
  2009-01-26 22:42                               ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-26 22:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, a.p.zijlstra, rusty, travis, mingo, davej,
	cpufreq, linux-kernel

On 01/26, Ingo Molnar wrote:
>
> Andrew's suggestion does make sense though: for any not-in-progress
> worklet we can dequeue that worklet and execute it in the flushing
> context. [ And if that worklet cannot be dequeued because it's being
> processed then that's fine and we can wait on that single worklet, without
> waiting on any other 'unrelated' worklets. ]

Yes sure. This is easy, and I am not sure we need the special handler.
If the caller wants this behaviour, it can do:

	if (cancel_work_sync(work))
		work->func(work);

But flush_work() was specially introduced for the case when we can't
do the above,

> That does not help work_on_cpu() though: that facility really uses the
> fact that workqueues are implemented via per CPU threads - hence we cannot
> remove the worklet from the queue and execute it in the flushing context.

Yes.

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:37                             ` Oleg Nesterov
@ 2009-01-26 22:42                               ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, a.p.zijlstra, rusty, travis, mingo, davej,
	cpufreq, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 01/26, Ingo Molnar wrote:
> >
> > Andrew's suggestion does make sense though: for any not-in-progress
> > worklet we can dequeue that worklet and execute it in the flushing
> > context. [ And if that worklet cannot be dequeued because it's being
> > processed then that's fine and we can wait on that single worklet, without
> > waiting on any other 'unrelated' worklets. ]
> 
> Yes sure. This is easy, and I am not sure we need the special handler.
> If the caller wants this behaviour, it can do:
> 
> 	if (cancel_work_sync(work))
> 		work->func(work);
> 
> But flush_work() was specially introduced for the case when we can't
> do the above,

it would be better to have this implicit in some wait_for_work() facility 
(which flush_work() really is) - because it is not intuitive to code a 
serialization as a 'cancel + execute open-coded' sequence.

> > That does not help work_on_cpu() though: that facility really uses the 
> > fact that workqueues are implemented via per CPU threads - hence we 
> > cannot remove the worklet from the queue and execute it in the 
> > flushing context.
> 
> Yes.

it's arguably a (mild, albeit elegant) abuse of workqueue internals 
though.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:20                             ` Ingo Molnar
@ 2009-01-26 22:50                               ` Andrew Morton
  2009-01-26 22:59                                 ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 22:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 23:20:02 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Mon, 26 Jan 2009 23:05:37 +0100
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > Well it turns out that I was having a less-than-usually-senile moment:
> > > > 
> > > > :     implement flush_work()
> > > 
> > > > Why isn't that working in this case??
> > > 
> > > how would that work in this case? We defer processing into the workqueue 
> > > exactly because we want its per-CPU properties.
> > 
> > It detaches the work item, moves it to head-of-queue, reinserts it then 
> > waits on it.  I think.
> > 
> > This might have a race+hole.  If a currently-running "unrelated" work 
> > item tries to take the lock which the flush_work() caller is holding 
> > then there's no way in which keventd will come back to execute the work 
> > item which we just put on the head of queue.
> 
> Correct - or the unrelated worklet might also be blocked on something - so 
> the window is rather large.
> 

hm, OK, that sucks.

But the deadlock still exists with Rusty's patches, doesn't it?  We
still have a single kernel thread per CPU processing all the unrelated
work_on_cpu() callers.  All we've done is to decouple work_on_cpu()
from the keventd queue.

If correct, we'd need to create a gaggle of kernel threads on each call
to work_on_cpu(), which doesn't sound nice.

A more efficient but trickier approach would be to create kernel
threads within flush_work(), with which to run the CPU-specific
worklet.  We only need to do that in the case where the CPU's keventd
thread was off doing something and might deadlock, which will be rare. 
If the keventd was just parked waiting for something to do then we can
safely feed it the to-be-flushed work item for immediate processing.

It'd be saner to just say "don't call work_on_cpu() while holding locks" :(
I bet there's some lockdep infrastructre which we could peek into to
add the assertion check...

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:50                               ` Andrew Morton
@ 2009-01-26 22:59                                 ` Ingo Molnar
  2009-01-26 23:42                                   ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 26 Jan 2009 23:20:02 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Mon, 26 Jan 2009 23:05:37 +0100
> > > Ingo Molnar <mingo@elte.hu> wrote:
> > > 
> > > > 
> > > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > 
> > > > > Well it turns out that I was having a less-than-usually-senile moment:
> > > > > 
> > > > > :     implement flush_work()
> > > > 
> > > > > Why isn't that working in this case??
> > > > 
> > > > how would that work in this case? We defer processing into the workqueue 
> > > > exactly because we want its per-CPU properties.
> > > 
> > > It detaches the work item, moves it to head-of-queue, reinserts it then 
> > > waits on it.  I think.
> > > 
> > > This might have a race+hole.  If a currently-running "unrelated" work 
> > > item tries to take the lock which the flush_work() caller is holding 
> > > then there's no way in which keventd will come back to execute the work 
> > > item which we just put on the head of queue.
> > 
> > Correct - or the unrelated worklet might also be blocked on something - so 
> > the window is rather large.
> > 
> 
> hm, OK, that sucks.
> 
> But the deadlock still exists with Rusty's patches, doesn't it?  We 
> still have a single kernel thread per CPU processing all the unrelated 
> work_on_cpu() callers.  All we've done is to decouple work_on_cpu() from 
> the keventd queue.

This particular deadlock does not exist - but you are indeed right that 
similar types of 'unrelated' interactions might exist in the future, as 
the usage of this facility is extended.

> If correct, we'd need to create a gaggle of kernel threads on each call 
> to work_on_cpu(), which doesn't sound nice.
> 
> A more efficient but trickier approach would be to create kernel threads 
> within flush_work(), with which to run the CPU-specific worklet.  We 
> only need to do that in the case where the CPU's keventd thread was off 
> doing something and might deadlock, which will be rare. If the keventd 
> was just parked waiting for something to do then we can safely feed it 
> the to-be-flushed work item for immediate processing.

i think what you describe is a variant of the syslet thread pool ;-)

> It'd be saner to just say "don't call work_on_cpu() while holding locks"
> :( I bet there's some lockdep infrastructre which we could peek into to
> add the assertion check...

The problem isnt doing the assertions - lockdep already covers workqueue 
dependencies very efficiently.

The problem is the intrinsic utility of work_on_cpu(): we _really_ want 
such a generic facility to be usable from any (blockable) context, just 
like on_each_cpu(func, info) does for atomic functions, without 
restrictions on locking context.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:17                       ` Ingo Molnar
@ 2009-01-26 23:01                         ` Mike Travis
  2009-01-27  0:09                           ` Oleg Nesterov
  2009-01-27  7:15                         ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-26 23:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Andrew Morton, a.p.zijlstra, rusty, mingo, davej,
	cpufreq, linux-kernel

Ingo Molnar wrote:
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
>> But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in 
>> work_on_cpu." removes get_online_cpus/put_online_cpus, this means the 
>> work can run on the wrong CPU anyway. Or work_on_cpu() can hang forever 
>> if CPU has already gone away before queue_work_on().
>>
>> Confused.
> 
> The idea was to require work_on_cpu() users to be CPU hotplug-safe. But 
> ... Rusty pointed it out in the past that this might be fragile, and we 
> could put back the get_online_cpus()/put_online_cpus() calls.
> 
> Rusty, what do you think?
> 
> 	Ingo


I believe that is the intention, in that the caller should insure that
the cpu does not go offline.  But also as Rusty stated, the previous usages
of set_cpus_allowed did not always insure this, so it's at least not a
regression.

I'll put it on my todo list to check the references in tip/cpus4096 to see
where they stand on the get_online_cpus() issue.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:59                                 ` Ingo Molnar
@ 2009-01-26 23:42                                   ` Andrew Morton
  2009-01-26 23:53                                     ` Ingo Molnar
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-26 23:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Mon, 26 Jan 2009 23:59:57 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Mon, 26 Jan 2009 23:20:02 +0100
> > Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > On Mon, 26 Jan 2009 23:05:37 +0100
> > > > Ingo Molnar <mingo@elte.hu> wrote:
> > > > 
> > > > > 
> > > > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > 
> > > > > > Well it turns out that I was having a less-than-usually-senile moment:
> > > > > > 
> > > > > > :     implement flush_work()
> > > > > 
> > > > > > Why isn't that working in this case??
> > > > > 
> > > > > how would that work in this case? We defer processing into the workqueue 
> > > > > exactly because we want its per-CPU properties.
> > > > 
> > > > It detaches the work item, moves it to head-of-queue, reinserts it then 
> > > > waits on it.  I think.
> > > > 
> > > > This might have a race+hole.  If a currently-running "unrelated" work 
> > > > item tries to take the lock which the flush_work() caller is holding 
> > > > then there's no way in which keventd will come back to execute the work 
> > > > item which we just put on the head of queue.
> > > 
> > > Correct - or the unrelated worklet might also be blocked on something - so 
> > > the window is rather large.
> > > 
> > 
> > hm, OK, that sucks.
> > 
> > But the deadlock still exists with Rusty's patches, doesn't it?  We 
> > still have a single kernel thread per CPU processing all the unrelated 
> > work_on_cpu() callers.  All we've done is to decouple work_on_cpu() from 
> > the keventd queue.
> 
> This particular deadlock does not exist - but you are indeed right that 
> similar types of 'unrelated' interactions might exist in the future, as 
> the usage of this facility is extended.

So... can we call the new kernel threads [kbandaidd]?

> > If correct, we'd need to create a gaggle of kernel threads on each call 
> > to work_on_cpu(), which doesn't sound nice.
> > 
> > A more efficient but trickier approach would be to create kernel threads 
> > within flush_work(), with which to run the CPU-specific worklet.  We 
> > only need to do that in the case where the CPU's keventd thread was off 
> > doing something and might deadlock, which will be rare. If the keventd 
> > was just parked waiting for something to do then we can safely feed it 
> > the to-be-flushed work item for immediate processing.
> 
> i think what you describe is a variant of the syslet thread pool ;-)

It's fairly simple.  The slightly tricky bit will be ensuring that
keventd will reliably swallow the work item which we just fed it,
before anything else.  But I guess the spinlock will be sufficient there.
But I don't think we should do any of this.

> > It'd be saner to just say "don't call work_on_cpu() while holding locks"
> > :( I bet there's some lockdep infrastructre which we could peek into to
> > add the assertion check...
> 
> The problem isnt doing the assertions - lockdep already covers workqueue 
> dependencies very efficiently.
> 
> The problem is the intrinsic utility of work_on_cpu(): we _really_ want 
> such a generic facility to be usable from any (blockable) context, just 
> like on_each_cpu(func, info) does for atomic functions, without 
> restrictions on locking context.

Do we?  work_on_cpu() is some last-gasp oh-i-screwed-my-code-up thing. 
We _really_ want people to use on_each_cpu()!

We should bust a gut to keep the number of callers to the
resource-intensive (deadlocky!) work_on_cpu() to a minimum.

(And to think that adding add_timer_on() creeped me out).


hm.  None of that was very helpful.  How to move forward?

I think I disagree that work_on_cpu() should be made into some robust,
smiled-upon core kernel facility.  It _is_ slow, it _is_ deadlockable. 
It should be positioned as something which is only used as a last
resort.  And if you _have_ to use it, sort out your locking!

Plus the number of code sites which want to fiddle with other CPUs in
this manner will always be small.  cpufreq, MCE, irq-affinity, things
like that.

What is the deadlock in acpi-cpufreq?  Which lock, and who is the
"other" holder of that lock?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 23:42                                   ` Andrew Morton
@ 2009-01-26 23:53                                     ` Ingo Molnar
  2009-01-27  0:42                                       ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-26 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > The problem is the intrinsic utility of work_on_cpu(): we _really_ 
> > want such a generic facility to be usable from any (blockable) 
> > context, just like on_each_cpu(func, info) does for atomic functions, 
> > without restrictions on locking context.
> 
> Do we?  work_on_cpu() is some last-gasp oh-i-screwed-my-code-up thing. 
> We _really_ want people to use on_each_cpu()!

why? on_each_cpu() is limited and runs in IRQ context. Is there a 
requirement that worklets need to be atomic?

> We should bust a gut to keep the number of callers to the 
> resource-intensive (deadlocky!) work_on_cpu() to a minimum.

i wouldnt call +10K 'resource intensive'.

> (And to think that adding add_timer_on() creeped me out).
> 
> hm.  None of that was very helpful.  How to move forward?
> 
> I think I disagree that work_on_cpu() should be made into some robust, 
> smiled-upon core kernel facility.  It _is_ slow, it _is_ deadlockable. 

uhm, why is it slow? It could be faster in fact in some cases: the main 
overhead in on_each_cpu() is having to wait for the IPIs - with a thread 
based approach if the other CPUs are idle we can get an IPI-less wakeup.

> It should be positioned as something which is only used as a last 
> resort.  And if you _have_ to use it, sort out your locking!
> 
> Plus the number of code sites which want to fiddle with other CPUs in 
> this manner will always be small.  cpufreq, MCE, irq-affinity, things 
> like that.
> 
> What is the deadlock in acpi-cpufreq?  Which lock, and who is the 
> "other" holder of that lock?

a quick look suggests that it's dbs_mutex.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 23:01                         ` Mike Travis
@ 2009-01-27  0:09                           ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-27  0:09 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Andrew Morton, a.p.zijlstra, rusty, mingo, davej,
	cpufreq, linux-kernel

On 01/26, Mike Travis wrote:
>
> Ingo Molnar wrote:
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in
> >> work_on_cpu." removes get_online_cpus/put_online_cpus, this means the
> >> work can run on the wrong CPU anyway. Or work_on_cpu() can hang forever
> >> if CPU has already gone away before queue_work_on().
> >>
> >> Confused.
> >
> > The idea was to require work_on_cpu() users to be CPU hotplug-safe. But
> > ... Rusty pointed it out in the past that this might be fragile, and we
> > could put back the get_online_cpus()/put_online_cpus() calls.
> >
> > Rusty, what do you think?
> >
> > 	Ingo
>
>
> I believe that is the intention, in that the caller should insure that
> the cpu does not go offline.  But also as Rusty stated, the previous usages
> of set_cpus_allowed did not always insure this, so it's at least not a
> regression.

Not sure I understand.

arch/x86/kernel/cpu/mcheck/mce_amd_64.c:store_interrupt_enable() can
race with cpu_down(), but at worst work_on_cpu() returns -EINVAL.

However, after the 1/3 patch we can hang forever.

Yes, afaics the code was not correct before it was converted to use
work_on_cpu(). But now it becomes wrong again?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 23:53                                     ` Ingo Molnar
@ 2009-01-27  0:42                                       ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-27  0:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: oleg, a.p.zijlstra, rusty, travis, mingo, davej, cpufreq, linux-kernel

On Tue, 27 Jan 2009 00:53:31 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > The problem is the intrinsic utility of work_on_cpu(): we _really_ 
> > > want such a generic facility to be usable from any (blockable) 
> > > context, just like on_each_cpu(func, info) does for atomic functions, 
> > > without restrictions on locking context.
> > 
> > Do we?  work_on_cpu() is some last-gasp oh-i-screwed-my-code-up thing. 
> > We _really_ want people to use on_each_cpu()!
> 
> why? on_each_cpu() is limited and runs in IRQ context.

It's worked OK for the great majority of callers.

> Is there a 
> requirement that worklets need to be atomic?

Blocking leads to deadlocks.

> > We should bust a gut to keep the number of callers to the 
> > resource-intensive (deadlocky!) work_on_cpu() to a minimum.
> 
> i wouldnt call +10K 'resource intensive'.

per CPU.  Plus there's the `ps aux | wth?' effect.

We've busted a gut over far, far less.

Plus the bugfixed, undeadlockable version will be more expensive still.

> > (And to think that adding add_timer_on() creeped me out).
> > 
> > hm.  None of that was very helpful.  How to move forward?
> > 
> > I think I disagree that work_on_cpu() should be made into some robust, 
> > smiled-upon core kernel facility.  It _is_ slow, it _is_ deadlockable. 
> 
> uhm, why is it slow? It could be faster in fact in some cases: the main 
> overhead in on_each_cpu() is having to wait for the IPIs - with a thread 
> based approach if the other CPUs are idle we can get an IPI-less wakeup.

spose so, if the CPU can do mwait?  If the CPU was idle, etc.  If a CPU
was busy then the call could take a long time.

> > It should be positioned as something which is only used as a last 
> > resort.  And if you _have_ to use it, sort out your locking!
> > 
> > Plus the number of code sites which want to fiddle with other CPUs in 
> > this manner will always be small.  cpufreq, MCE, irq-affinity, things 
> > like that.
> > 
> > What is the deadlock in acpi-cpufreq?  Which lock, and who is the 
> > "other" holder of that lock?
> 
> a quick look suggests that it's dbs_mutex.
> 

Can't see it.

In fact all work_on_cpu() handlers in
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c appear to be atomic. 
Couldn't the whole thing be converted to use smp_call_function_many()?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26  7:01       ` Andrew Morton
  2009-01-26 17:16         ` Ingo Molnar
@ 2009-01-27  7:05         ` Rusty Russell
  2009-01-27  7:25           ` Andrew Morton
  1 sibling, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-01-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Monday 26 January 2009 17:31:30 Andrew Morton wrote:
> On Mon, 26 Jan 2009 17:11:43 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Saturday 24 January 2009 18:45:37 Andrew Morton wrote:
> > > Pity the poor reader who comes along trying to work out why this exists.
> 
> (chirp, chirp)

I disagree.  It's simple; we create a workqueue and we use it.  There's no
confusion here.

> > None of these options are appealing...
> 
> Can we try harder please?  10 screenfuls of kernel threads in the ps
> output is just irritating.
> 
> How about banning the use of work_on_cpu() from schedule_work()
> handlers and then fixing that driver somehow?

Again, that's how we got here in the first place.  I didn't realize the
twisty path by which the acpi cpufreq code could be called.  And there
may well be others.  So I want work_on_cpu to be completely generic.

Using the existing infrastructure seemed like the right thing.

But since you asked nicely, I'll come up with something cleverer.  It'll
take some serious testing though.

> What _is_ the bug anyway?

There is no "bug".  See the commit message.

> The only description we were given was
> 
>   Impact: remove potential clashes with generic kevent workqueue
> 
>   Annoyingly, some places we want to use work_on_cpu are already in
>   workqueues.  As per Ingo's suggestion, we create a different
>   workqueue for work_on_cpu.
> 
> which didn't bother telling anyone squat.

Well, it could have referred to the currently-known case:
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c should be using work_on_cpu,
but we reverted the change because of this problem.

But it's a general comment about fixing a general issue.  The currently
known case is not directly relevent; that it can happen and it's restricting
the use of this otherwise-general API is.

> When was this bug added?  Was it added into that driver or was it due
> to infrastructural changes?

I'm not hiding a bug from you.  Really.

A little confused at all this vitriol,
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-26 22:17                       ` Ingo Molnar
  2009-01-26 23:01                         ` Mike Travis
@ 2009-01-27  7:15                         ` Rusty Russell
  2009-01-27 17:55                           ` Oleg Nesterov
  1 sibling, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-01-27  7:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Andrew Morton, a.p.zijlstra, travis, mingo, davej,
	cpufreq, linux-kernel

On Tuesday 27 January 2009 08:47:29 Ingo Molnar wrote:
> 
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in 
> > work_on_cpu." removes get_online_cpus/put_online_cpus, this means the 
> > work can run on the wrong CPU anyway. Or work_on_cpu() can hang forever 
> > if CPU has already gone away before queue_work_on().
> > 
> > Confused.
> 
> The idea was to require work_on_cpu() users to be CPU hotplug-safe. But 
> ... Rusty pointed it out in the past that this might be fragile, and we 
> could put back the get_online_cpus()/put_online_cpus() calls.

Old code used to do:

	tmp = current->cpus_allowed;
	set_cpus_allowed(current, cpumask_of_cpu(cpu));
	function(arg);
	set_cpus_allowed(current, tmp);

We replaced it with:

	work_on_cpu(cpu, function, arg);

I thought I'd be clever and reliably check that the cpu they asked for
was online inside work_on_cpu.  Leading to locking problems.  But if they
didn't previously ensure cpu hotplug didn't happen, they were buggy already,
so I took out the check and hence the hotplug lock.

So we're no *worse* than we were before, but yes, an audit would probably
lead to fixes.

Hope that clarifies?
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-27  7:05         ` Rusty Russell
@ 2009-01-27  7:25           ` Andrew Morton
  2009-01-27 15:28             ` Ingo Molnar
  2009-01-28 13:02             ` Rusty Russell
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-27  7:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Tue, 27 Jan 2009 17:35:11 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Monday 26 January 2009 17:31:30 Andrew Morton wrote:
> > On Mon, 26 Jan 2009 17:11:43 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > On Saturday 24 January 2009 18:45:37 Andrew Morton wrote:
> > > > Pity the poor reader who comes along trying to work out why this exists.
> > 
> > (chirp, chirp)
> 
> I disagree.  It's simple; we create a workqueue and we use it.  There's no
> confusion here.

Reader's first and most important question is "why does this exist".

> > > None of these options are appealing...
> > 
> > Can we try harder please?  10 screenfuls of kernel threads in the ps
> > output is just irritating.
> > 
> > How about banning the use of work_on_cpu() from schedule_work()
> > handlers and then fixing that driver somehow?
> 
> Again, that's how we got here in the first place.  I didn't realize the
> twisty path by which the acpi cpufreq code could be called.  And there
> may well be others.  So I want work_on_cpu to be completely generic.

But it isn't generic.  The patch just moved the deadlock from one queue
to another.  Making work_on_cu() truly generic is quite hard!

> But it's a general comment about fixing a general issue.  The currently
> known case is not directly relevent; that it can happen and it's restricting
> the use of this otherwise-general API is.

I think we should switch acpi-cpufreq to smp_call_function(), revert
this stuff and ban the calling of work_on_cpu() under locks.

> A little confused at all this vitriol,

Well let's see.

- it was badly changelogged

- it was badly commented

- it's slow.  In many ways, including the unnecessary serialisation
  of each cross-cpu call in acpi-cpufreq.

- it consumes a tremendous amount of resources just to fix some
  acpi locking snafu

- it adds yet another zillion kernel threads

- it's still deadlockable

- it got sent to Linus while still under active discussion

- and it got merged

- Oleg is the usual workqueue developer and wasn't even cc'ed.

- I am the usual workqueue reviewer/merger (and would prefer to remain
  thus, please) and I wasn't cc'ed either.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-27  7:25           ` Andrew Morton
@ 2009-01-27 15:28             ` Ingo Molnar
  2009-01-27 16:51               ` Andrew Morton
  2009-01-28 13:02             ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-27 15:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rusty Russell, Mike Travis, Ingo Molnar, Dave Jones, cpufreq,
	linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > But it's a general comment about fixing a general issue.  The 
> > currently known case is not directly relevent; that it can happen and 
> > it's restricting the use of this otherwise-general API is.
> 
> I think we should switch acpi-cpufreq to smp_call_function(), revert 
> this stuff and ban the calling of work_on_cpu() under locks.

I agree that do_drv_read()/write() should be converted to 
smp_function_call() (what it does is atomic: msr or PIO cycles).

Then work_on_cpu() can be removed for good, to not lure people into using 
it. You seem to agree that work_on_cpu() is unfixable so it's far better 
to offer nothing than to offer such a deceivingly named but fundamentally 
limited facility.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-27 15:28             ` Ingo Molnar
@ 2009-01-27 16:51               ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-27 16:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Mike Travis, Ingo Molnar, Dave Jones, cpufreq,
	linux-kernel

On Tue, 27 Jan 2009 16:28:30 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > But it's a general comment about fixing a general issue.  The 
> > > currently known case is not directly relevent; that it can happen and 
> > > it's restricting the use of this otherwise-general API is.
> > 
> > I think we should switch acpi-cpufreq to smp_call_function(), revert 
> > this stuff and ban the calling of work_on_cpu() under locks.
> 
> I agree that do_drv_read()/write() should be converted to 
> smp_function_call() (what it does is atomic: msr or PIO cycles).
> 
> Then work_on_cpu() can be removed for good, to not lure people into using 
> it. You seem to agree that work_on_cpu() is unfixable so it's far better 
> to offer nothing than to offer such a deceivingly named but fundamentally 
> limited facility.
> 

Well, I don't think it's unfixable.  But a full fix would, I think,
require a kernel thread for each callback invokation.  As dicussed
earlier, this could be optimised to only create the new kernel thread
if the keventd thread is presently off doing something else.

Is work_on_cpu() valuable enough to justify doing all that?  Dunno.
It appears to have six callers in three drivers at present, which is
quite a large number.

Or perhaps there's a smarter way of fixing it all.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-27  7:15                         ` Rusty Russell
@ 2009-01-27 17:55                           ` Oleg Nesterov
  0 siblings, 0 replies; 70+ messages in thread
From: Oleg Nesterov @ 2009-01-27 17:55 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Andrew Morton, a.p.zijlstra, travis, mingo, davej,
	cpufreq, linux-kernel

On 01/27, Rusty Russell wrote:
>
> On Tuesday 27 January 2009 08:47:29 Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > But "[PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in
> > > work_on_cpu." removes get_online_cpus/put_online_cpus, this means the
> > > work can run on the wrong CPU anyway. Or work_on_cpu() can hang forever
> > > if CPU has already gone away before queue_work_on().
> > >
> > > Confused.
> >
> > The idea was to require work_on_cpu() users to be CPU hotplug-safe. But
> > ... Rusty pointed it out in the past that this might be fragile, and we
> > could put back the get_online_cpus()/put_online_cpus() calls.
>
> Old code used to do:
>
> 	tmp = current->cpus_allowed;
> 	set_cpus_allowed(current, cpumask_of_cpu(cpu));
> 	function(arg);
> 	set_cpus_allowed(current, tmp);
>
> We replaced it with:
>
> 	work_on_cpu(cpu, function, arg);
>
> I thought I'd be clever and reliably check that the cpu they asked for
> was online inside work_on_cpu.  Leading to locking problems.  But if they
> didn't previously ensure cpu hotplug didn't happen, they were buggy already,
> so I took out the check and hence the hotplug lock.
>
> So we're no *worse* than we were before, but yes, an audit would probably
> lead to fixes.

I agree, we are no worse than we were before.

But I can't understand why we can't be better ;)

If we add the special workqueue for work_on_cpu() (patch 2/3), then why
do we need [PATCH 1/3] ?

IOW, given that the 2nd patch adds the special wq, which locking problems
solves the 1st patch?

Perhaps I missed something, and the patches are questionable anyway,
but I think these 2 patches are "conflicting".

If we use a special wq, then work_on_cpu()->get_online_cpus() is fine,
it can't deadlock with cpu_hotplug_begin().

No?

Oleg.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-27  7:25           ` Andrew Morton
  2009-01-27 15:28             ` Ingo Molnar
@ 2009-01-28 13:02             ` Rusty Russell
  2009-01-28 17:19               ` Mike Travis
  2009-01-28 19:44               ` Andrew Morton
  1 sibling, 2 replies; 70+ messages in thread
From: Rusty Russell @ 2009-01-28 13:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Tuesday 27 January 2009 17:55:19 Andrew Morton wrote:[ lots of good stuff ]
> Making work_on_cu() truly generic is quite hard!
Yes.  What do you think of this workqueue-less approach?
Subject: work_on_cpu: non-workqueue solution.
work_on_cpu was designed to replace the meme of:
	cpumask_t saved = current->cpus_allowed;	set_cpus_allowed(cpumask_of(dst));	    ... do something on cpu dst ...	set_cpus_allowed(saved);
It should have been a trivial routine, and a trivial conversion.
The original version did a get_online_cpus(), then put the work on thekevent queue, called flush_work(), then put_online_cpus().  See2d3854a37e8b767a51aba38ed6d22817b0631e33.
Several locking warnings resulted from different users being convertedover the months this patch was used leading up to 2.6.29.  One wasfixed to use smp_call_function_single(b2bb85549134c005e997e5a7ed303bda6a1ae738), but it's always risky toconvert code which was running in user context into interrupt context.When another one was reported, we dropped the get_online_cpus() andrelied on the callers to do that (as they should have been doingbefore) as seen in 31ad9081200c06ccc350625d41d1f8b2d1cef29f.
But there was still a locking issue witharch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c, so that conversion wasreverted in Ingo's tree.  Turns out that it the conversion causedwork_on_cpu to be called from inside keventd in some paths.
The obvious solution was to change to using our own workqueue, so wecould use it in ACPI and so it would actually be a generically usefulfunction as intended.
Andrew Morton complained about NR_CPUS new threads.  Which I happen toagree with.  (He also bitched about the comment being too short, sothis one is a fucking novel).
While I was at linux.conf.au and avoiding reading my mail, a longdiscussion ensued (See lkml "[PATCH 2/3] work_on_cpu: Use our ownworkqueue."), including useful advice such as not ever grabbing a lockinside a work function.  Or something; it gets a bit confusing.
To make the pain stop, this attempts to create work_on_cpu withoutusing workqueues at all.  It does create one new thread.  But as longas you don't call work_on_cpu inside work_on_cpu, or create normallocking inversions, it should work just peachy.
FIXME: It includes work_on_cpu.h from workqueue.h to avoid chasing allthe users, but they should include it directly.
I tested it lightly (as seen in this patch), but my test machinedoesn't hit any work_on_cpu paths.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/include/linux/work_on_cpu.h b/include/linux/work_on_cpu.hnew file mode 100644--- /dev/null+++ b/include/linux/work_on_cpu.h@@ -0,0 +1,13 @@+#ifndef _LINUX_WORK_ON_CPU_H+#define _LINUX_WORK_ON_CPU_H++#ifndef CONFIG_SMP+static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)+{+	return fn(arg);+}+#else+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);+#endif /* CONFIG_SMP */++#endif /* _LINUX_WORK_ON_CPU_H */diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h--- a/include/linux/workqueue.h+++ b/include/linux/workqueue.h@@ -9,6 +9,7 @@ #include <linux/linkage.h> #include <linux/bitops.h> #include <linux/lockdep.h>+#include <linux/work_on_cpu.h> #include <asm/atomic.h>  struct workqueue_struct;@@ -251,13 +252,4 @@ void cancel_rearming_delayed_work(struct { 	cancel_delayed_work_sync(work); }--#ifndef CONFIG_SMP-static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)-{-	return fn(arg);-}-#else-long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);-#endif /* CONFIG_SMP */ #endifdiff --git a/kernel/Makefile b/kernel/Makefile--- a/kernel/Makefile+++ b/kernel/Makefile@@ -92,6 +92,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace/ obj-$(CONFIG_FUNCTION_TRACER) += trace/ obj-$(CONFIG_TRACING) += trace/ obj-$(CONFIG_SMP) += sched_cpupri.o+obj-$(CONFIG_SMP) += work_on_cpu.o  ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer isdiff --git a/kernel/work_on_cpu.c b/kernel/work_on_cpu.cnew file mode 100644--- /dev/null+++ b/kernel/work_on_cpu.c@@ -0,0 +1,108 @@+#include <linux/work_on_cpu.h>+#include <linux/kthread.h>+#include <linux/mutex.h>+#include <linux/completion.h>+#include <linux/cpumask.h>+#include <linux/module.h>++/* The thread waits for new work on this waitqueue. */+static DECLARE_WAIT_QUEUE_HEAD(woc_wq);+/* The lock ensures only one job is done at a time. */+static DEFINE_MUTEX(woc_mutex);++/* The details of the current job. */+struct work_for_cpu {+	unsigned int cpu;+	long (*fn)(void *);+	void *arg;+	long ret;+	struct completion done;+};++/* The pointer to the current job.  NULL if nothing pending. */+static struct work_for_cpu *current_work;++/* We leave our thread on whatever cpu it was on last.  We can get+ * batted onto another CPU by move_task_off_dead_cpu if that cpu goes+ * down, but the caller of work_on_cpu() was supposed to ensure that+ * doesn't happen, so it should only happen when idle. */+static int do_work_on_cpu(void *unused)+{+	for (;;) {+		struct completion *done;++		wait_event(woc_wq, current_work != NULL);++		set_cpus_allowed_ptr(current, cpumask_of(current_work->cpu));+		WARN_ON(smp_processor_id() != current_work->cpu);++		current_work->ret = current_work->fn(current_work->arg);+		/* Make sure ret is set before we complete().  Paranoia. */+		wmb();++		/* Reset current_work so we don't spin. */+		done = &current_work->done;+		current_work = NULL;++		/* Reset current_work for next work_on_cpu(). */+		complete(done);+	}+}++/**+ * work_on_cpu - run a function in user context on a particular cpu+ * @cpu: the cpu to run on+ * @fn: the function to run+ * @arg: the function arg+ *+ * This will return the value @fn returns.+ * It is up to the caller to ensure that the cpu doesn't go offline.+ */+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)+{+	struct work_for_cpu work;++	work.cpu = cpu;+	work.fn = fn;+	work.arg = arg;+	init_completion(&work.done);++	mutex_lock(&woc_mutex);+	/* Make sure all is in place before it sees fn set. */+	wmb();+	current_work = &work;+	wake_up(&woc_wq);++	wait_for_completion(&work.done);+	BUG_ON(current_work);+	mutex_unlock(&woc_mutex);++	return work.ret;+}+EXPORT_SYMBOL_GPL(work_on_cpu);++#if 1+static long test_fn(void *arg)+{+	printk("%u: %lu\n", smp_processor_id(), (long)arg);+	return (long)arg + 100;+}+#endif++static int __init init(void)+{+	unsigned int i;++	kthread_run(do_work_on_cpu, NULL, "kwork_on_cpu");++#if 1+	for_each_online_cpu(i) {+		long ret = work_on_cpu(i, test_fn, (void *)i);+		printk("CPU %i returned %li\n", i, ret);+		BUG_ON(ret != i + 100);+	}+#endif++	return 0;+}+module_init(init);diff --git a/kernel/workqueue.c b/kernel/workqueue.c--- a/kernel/workqueue.c+++ b/kernel/workqueue.c@@ -970,47 +970,6 @@ undo: 	return ret; } -#ifdef CONFIG_SMP-static struct workqueue_struct *work_on_cpu_wq __read_mostly;--struct work_for_cpu {-	struct work_struct work;-	long (*fn)(void *);-	void *arg;-	long ret;-};--static void do_work_for_cpu(struct work_struct *w)-{-	struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);--	wfc->ret = wfc->fn(wfc->arg);-}--/**- * work_on_cpu - run a function in user context on a particular cpu- * @cpu: the cpu to run on- * @fn: the function to run- * @arg: the function arg- *- * This will return the value @fn returns.- * It is up to the caller to ensure that the cpu doesn't go offline.- */-long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)-{-	struct work_for_cpu wfc;--	INIT_WORK(&wfc.work, do_work_for_cpu);-	wfc.fn = fn;-	wfc.arg = arg;-	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);-	flush_work(&wfc.work);--	return wfc.ret;-}-EXPORT_SYMBOL_GPL(work_on_cpu);-#endif /* CONFIG_SMP */- void __init init_workqueues(void) { 	alloc_cpumask_var(&cpu_populated_map, GFP_KERNEL);@@ -1021,8 +980,4 @@ void __init init_workqueues(void) 	hotcpu_notifier(workqueue_cpu_callback, 0); 	keventd_wq = create_workqueue("events"); 	BUG_ON(!keventd_wq);-#ifdef CONFIG_SMP-	work_on_cpu_wq = create_workqueue("work_on_cpu");-	BUG_ON(!work_on_cpu_wq);-#endif }\0ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-28 13:02             ` Rusty Russell
@ 2009-01-28 17:19               ` Mike Travis
  2009-01-28 17:32                 ` Mike Travis
  2009-01-28 19:44               ` Andrew Morton
  1 sibling, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-28 17:19 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

Hi Rusty,

I'm testing this now on x86_64 and one question comes up.  The
initialization of the woc_wq thread happens quite late.  Might it
be better to initialize it earlier?

(I haven't tested all work_on_cpu callers yet, plus there are a
couple that are waiting in the queue that I'm also testing.)

Oops, one problem just popped up.  After running a couple of
offline/online tests, this came up:

[ 1080.185022] INFO: task kwork_on_cpu:491 blocked for more than 480 seconds.
[ 1080.205849] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.229745] kwork_on_cpu  D 0000000000000000  3432   491      2
[ 1080.229750]  ffff88012e043ec0 0000000000000046 000000002e043e60 ffffffff81d5ef00
[ 1080.229754]  ffffffff81d5ef00 ffffffff81d5ef00 ffffffff81d5ef00 ffffffff81d5ef00
[ 1080.229757]  ffffffff81d5ef00 ffffffff81d58580 ffffffff81d5ef00 ffff88012e03ab40
[ 1080.229760] Call Trace:
[ 1080.229770]  [<ffffffff8106e241>] ? trace_hardirqs_on+0xd/0xf
[ 1080.229774]  [<ffffffff810a6bc5>] do_work_on_cpu+0x72/0x13d
[ 1080.229778]  [<ffffffff8105d994>] ? autoremove_wake_function+0x0/0x3d
[ 1080.229781]  [<ffffffff810a6b53>] ? do_work_on_cpu+0x0/0x13d
[ 1080.229783]  [<ffffffff810a6b53>] ? do_work_on_cpu+0x0/0x13d
[ 1080.229786]  [<ffffffff8105d809>] kthread+0x4e/0x7d
[ 1080.229790]  [<ffffffff8100d7ba>] child_rip+0xa/0x20
[ 1080.229793]  [<ffffffff8100d1bc>] ? restore_args+0x0/0x30
[ 1080.229795]  [<ffffffff8105d796>] ? kthreadd+0x167/0x18c
[ 1080.229798]  [<ffffffff8105d7bb>] ? kthread+0x0/0x7d
[ 1080.229801]  [<ffffffff8100d7b0>] ? child_rip+0x0/0x20
[ 1080.229802] INFO: lockdep is turned off.

Thanks,
Mike

Rusty Russell wrote:
> On Tuesday 27 January 2009 17:55:19 Andrew Morton wrote:
> [ lots of good stuff ]
> 
>> Making work_on_cu() truly generic is quite hard!
> 
> Yes.  What do you think of this workqueue-less approach?
> 
> Subject: work_on_cpu: non-workqueue solution.
> 
> work_on_cpu was designed to replace the meme of:
> 
> 	cpumask_t saved = current->cpus_allowed;
> 	set_cpus_allowed(cpumask_of(dst));
> 	    ... do something on cpu dst ...
> 	set_cpus_allowed(saved);
> 
> It should have been a trivial routine, and a trivial conversion.
> 
> The original version did a get_online_cpus(), then put the work on the
> kevent queue, called flush_work(), then put_online_cpus().  See
> 2d3854a37e8b767a51aba38ed6d22817b0631e33.
> 
> Several locking warnings resulted from different users being converted
> over the months this patch was used leading up to 2.6.29.  One was
> fixed to use smp_call_function_single
> (b2bb85549134c005e997e5a7ed303bda6a1ae738), but it's always risky to
> convert code which was running in user context into interrupt context.
> When another one was reported, we dropped the get_online_cpus() and
> relied on the callers to do that (as they should have been doing
> before) as seen in 31ad9081200c06ccc350625d41d1f8b2d1cef29f.
> 
> But there was still a locking issue with
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c, so that conversion was
> reverted in Ingo's tree.  Turns out that it the conversion caused
> work_on_cpu to be called from inside keventd in some paths.
> 
> The obvious solution was to change to using our own workqueue, so we
> could use it in ACPI and so it would actually be a generically useful
> function as intended.
> 
> Andrew Morton complained about NR_CPUS new threads.  Which I happen to
> agree with.  (He also bitched about the comment being too short, so
> this one is a fucking novel).
> 
> While I was at linux.conf.au and avoiding reading my mail, a long
> discussion ensued (See lkml "[PATCH 2/3] work_on_cpu: Use our own
> workqueue."), including useful advice such as not ever grabbing a lock
> inside a work function.  Or something; it gets a bit confusing.
> 
> To make the pain stop, this attempts to create work_on_cpu without
> using workqueues at all.  It does create one new thread.  But as long
> as you don't call work_on_cpu inside work_on_cpu, or create normal
> locking inversions, it should work just peachy.
> 
> FIXME: It includes work_on_cpu.h from workqueue.h to avoid chasing all
> the users, but they should include it directly.
> 
> I tested it lightly (as seen in this patch), but my test machine
> doesn't hit any work_on_cpu paths.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/work_on_cpu.h b/include/linux/work_on_cpu.h
> new file mode 100644
> --- /dev/null
> +++ b/include/linux/work_on_cpu.h
> @@ -0,0 +1,13 @@
> +#ifndef _LINUX_WORK_ON_CPU_H
> +#define _LINUX_WORK_ON_CPU_H
> +
> +#ifndef CONFIG_SMP
> +static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> +	return fn(arg);
> +}
> +#else
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _LINUX_WORK_ON_CPU_H */
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -9,6 +9,7 @@
>  #include <linux/linkage.h>
>  #include <linux/bitops.h>
>  #include <linux/lockdep.h>
> +#include <linux/work_on_cpu.h>
>  #include <asm/atomic.h>
>  
>  struct workqueue_struct;
> @@ -251,13 +252,4 @@ void cancel_rearming_delayed_work(struct
>  {
>  	cancel_delayed_work_sync(work);
>  }
> -
> -#ifndef CONFIG_SMP
> -static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> -{
> -	return fn(arg);
> -}
> -#else
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
> -#endif /* CONFIG_SMP */
>  #endif
> diff --git a/kernel/Makefile b/kernel/Makefile
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -92,6 +92,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace/
>  obj-$(CONFIG_FUNCTION_TRACER) += trace/
>  obj-$(CONFIG_TRACING) += trace/
>  obj-$(CONFIG_SMP) += sched_cpupri.o
> +obj-$(CONFIG_SMP) += work_on_cpu.o
>  
>  ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
>  # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
> diff --git a/kernel/work_on_cpu.c b/kernel/work_on_cpu.c
> new file mode 100644
> --- /dev/null
> +++ b/kernel/work_on_cpu.c
> @@ -0,0 +1,108 @@
> +#include <linux/work_on_cpu.h>
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
> +#include <linux/completion.h>
> +#include <linux/cpumask.h>
> +#include <linux/module.h>
> +
> +/* The thread waits for new work on this waitqueue. */
> +static DECLARE_WAIT_QUEUE_HEAD(woc_wq);
> +/* The lock ensures only one job is done at a time. */
> +static DEFINE_MUTEX(woc_mutex);
> +
> +/* The details of the current job. */
> +struct work_for_cpu {
> +	unsigned int cpu;
> +	long (*fn)(void *);
> +	void *arg;
> +	long ret;
> +	struct completion done;
> +};
> +
> +/* The pointer to the current job.  NULL if nothing pending. */
> +static struct work_for_cpu *current_work;
> +
> +/* We leave our thread on whatever cpu it was on last.  We can get
> + * batted onto another CPU by move_task_off_dead_cpu if that cpu goes
> + * down, but the caller of work_on_cpu() was supposed to ensure that
> + * doesn't happen, so it should only happen when idle. */
> +static int do_work_on_cpu(void *unused)
> +{
> +	for (;;) {
> +		struct completion *done;
> +
> +		wait_event(woc_wq, current_work != NULL);
> +
> +		set_cpus_allowed_ptr(current, cpumask_of(current_work->cpu));
> +		WARN_ON(smp_processor_id() != current_work->cpu);
> +
> +		current_work->ret = current_work->fn(current_work->arg);
> +		/* Make sure ret is set before we complete().  Paranoia. */
> +		wmb();
> +
> +		/* Reset current_work so we don't spin. */
> +		done = &current_work->done;
> +		current_work = NULL;
> +
> +		/* Reset current_work for next work_on_cpu(). */
> +		complete(done);
> +	}
> +}
> +
> +/**
> + * work_on_cpu - run a function in user context on a particular cpu
> + * @cpu: the cpu to run on
> + * @fn: the function to run
> + * @arg: the function arg
> + *
> + * This will return the value @fn returns.
> + * It is up to the caller to ensure that the cpu doesn't go offline.
> + */
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> +	struct work_for_cpu work;
> +
> +	work.cpu = cpu;
> +	work.fn = fn;
> +	work.arg = arg;
> +	init_completion(&work.done);
> +
> +	mutex_lock(&woc_mutex);
> +	/* Make sure all is in place before it sees fn set. */
> +	wmb();
> +	current_work = &work;
> +	wake_up(&woc_wq);
> +
> +	wait_for_completion(&work.done);
> +	BUG_ON(current_work);
> +	mutex_unlock(&woc_mutex);
> +
> +	return work.ret;
> +}
> +EXPORT_SYMBOL_GPL(work_on_cpu);
> +
> +#if 1
> +static long test_fn(void *arg)
> +{
> +	printk("%u: %lu\n", smp_processor_id(), (long)arg);
> +	return (long)arg + 100;
> +}
> +#endif
> +
> +static int __init init(void)
> +{
> +	unsigned int i;
> +
> +	kthread_run(do_work_on_cpu, NULL, "kwork_on_cpu");
> +
> +#if 1
> +	for_each_online_cpu(i) {
> +		long ret = work_on_cpu(i, test_fn, (void *)i);
> +		printk("CPU %i returned %li\n", i, ret);
> +		BUG_ON(ret != i + 100);
> +	}
> +#endif
> +
> +	return 0;
> +}
> +module_init(init);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -970,47 +970,6 @@ undo:
>  	return ret;
>  }
>  
> -#ifdef CONFIG_SMP
> -static struct workqueue_struct *work_on_cpu_wq __read_mostly;
> -
> -struct work_for_cpu {
> -	struct work_struct work;
> -	long (*fn)(void *);
> -	void *arg;
> -	long ret;
> -};
> -
> -static void do_work_for_cpu(struct work_struct *w)
> -{
> -	struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
> -
> -	wfc->ret = wfc->fn(wfc->arg);
> -}
> -
> -/**
> - * work_on_cpu - run a function in user context on a particular cpu
> - * @cpu: the cpu to run on
> - * @fn: the function to run
> - * @arg: the function arg
> - *
> - * This will return the value @fn returns.
> - * It is up to the caller to ensure that the cpu doesn't go offline.
> - */
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> -{
> -	struct work_for_cpu wfc;
> -
> -	INIT_WORK(&wfc.work, do_work_for_cpu);
> -	wfc.fn = fn;
> -	wfc.arg = arg;
> -	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
> -	flush_work(&wfc.work);
> -
> -	return wfc.ret;
> -}
> -EXPORT_SYMBOL_GPL(work_on_cpu);
> -#endif /* CONFIG_SMP */
> -
>  void __init init_workqueues(void)
>  {
>  	alloc_cpumask_var(&cpu_populated_map, GFP_KERNEL);
> @@ -1021,8 +980,4 @@ void __init init_workqueues(void)
>  	hotcpu_notifier(workqueue_cpu_callback, 0);
>  	keventd_wq = create_workqueue("events");
>  	BUG_ON(!keventd_wq);
> -#ifdef CONFIG_SMP
> -	work_on_cpu_wq = create_workqueue("work_on_cpu");
> -	BUG_ON(!work_on_cpu_wq);
> -#endif
>  }
>  


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-28 17:19               ` Mike Travis
@ 2009-01-28 17:32                 ` Mike Travis
  2009-01-29 10:39                   ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Travis @ 2009-01-28 17:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

Mike Travis wrote:
> Hi Rusty,
> 
> I'm testing this now on x86_64 and one question comes up.  The
> initialization of the woc_wq thread happens quite late.  Might it
> be better to initialize it earlier?

Umm, definitely needed earlier...  A bug catcher caught this.  Work_on_cpu
is being called before it's initialized.

[   16.541297] calling  microcode_init+0x0/0x13a @ 1
[   16.555989] ------------[ cut here ]------------
[   16.556955] kernel BUG at .../kernel/work_on_cpu.c:67!
[   16.556955] invalid opcode: 0000 [#1] SMP
[   16.556955] last sysfs file:
[   16.556955] CPU 5
[   16.556955] Pid: 1, comm: swapper Tainted: G        W  2.6.29-rc1-4k-defconfig.01280914-00263-g0ee2ad8-dirty #100
[   16.556955] RIP: 0010:[<ffffffff810a69df>]  [<ffffffff810a69df>] work_on_cpu+0x3d/0xeb
[   16.556955] RSP: 0018:ffff88012fac9b60  EFLAGS: 00010202
[   16.556955] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   16.556955] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[   16.556955] RBP: ffff88012fac9c00 R08: 0000000000000000 R09: ffff88012fac9b80
[   16.556955] R10: ffffffff8187b778 R11: 0000000081049799 R12: 0000000000000000
[   16.556955] R13: ffffffff81026313 R14: 0000000000000000 R15: 0000000000000000
[   16.556955] FS:  0000000000000000(0000) GS:ffff88022ec3a000(0000) knlGS:0000000000000000
[   16.556955] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[   16.556955] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006a0
[   16.556955] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.556955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   16.556955] Process swapper (pid: 1, threadinfo ffff88012fac8000, task ffff88022faa8000)
[   16.556955] Stack:
[   16.556955]  000000002fac9bb0 ffffffff810260a6 000000002e0d75f0 0000000000000002
[   16.556955]  2222222222222222 2222222222222222 2222222222222222 2222222222222222
[   16.556955]  2222222222222222 0000000000000000 ffff88012fac9bd0 0000000000000000
[   16.556955] Call Trace:
[   16.556955]  [<ffffffff810260a6>] ? microcode_init_cpu+0x1e/0x3e
[   16.556955]  [<ffffffff810260b6>] microcode_init_cpu+0x2e/0x3e
[   16.556955]  [<ffffffff8102620c>] mc_sysdev_add+0x67/0x71
[   16.556955]  [<ffffffff812ffde8>] sysdev_driver_register+0xcb/0x125
[   16.556955]  [<ffffffff81a5bc32>] microcode_init+0xb7/0x13a
[   16.556955]  [<ffffffff81a5bb7b>] ? microcode_init+0x0/0x13a
[   16.556955]  [<ffffffff81009075>] _stext+0x75/0x17f
[   16.556955]  [<ffffffff8106dc2c>] ? mark_lock+0x1c/0x369
[   16.556955]  [<ffffffff8106f0bc>] ? __lock_acquire+0xba0/0xbc1
[   16.556955]  [<ffffffff812444a1>] ? ida_get_new_above+0x171/0x18d
[   16.556955]  [<ffffffff8112d764>] ? proc_register+0x184/0x198
[   16.556955]  [<ffffffff8125b54e>] ? _raw_spin_unlock+0xc7/0xf6
[   16.556955]  [<ffffffff815ce60b>] ? _spin_unlock+0x2b/0x2f
[   16.556955]  [<ffffffff8112d764>] ? proc_register+0x184/0x198
[   16.556955]  [<ffffffff8112d895>] ? create_proc_entry+0x79/0x8f
[   16.556955]  [<ffffffff81094705>] ? register_irq_proc+0xb3/0xcf
[   16.556955]  [<ffffffff81120000>] ? load_elf_binary+0xc38/0x1aa8
[   16.556955]  [<ffffffff81a4a902>] kernel_init+0x14b/0x1a1
[   16.556955]  [<ffffffff8100d7ba>] child_rip+0xa/0x20
[   16.556955]  [<ffffffff8100d1bc>] ? restore_args+0x0/0x30
[   16.556955]  [<ffffffff81a4a7b7>] ? kernel_init+0x0/0x1a1
[   16.556955]  [<ffffffff8100d7b0>] ? child_rip+0x0/0x20
[   16.556955] Code: 00 00 31 db 83 3d 75 24 50 01 00 41 89 fc 49 89 f5 49 89 d6 48 c7 c7 e8 d8 96 81 0f 94 c3 31 d2 89 de e8 8e f1 ff ff 85 db 74 04 <0f> 0b eb fe 48 8d 9d 60 ff ff ff 44 89 a5 60 ff ff ff 4c 89 ad
[   16.556955] RIP  [<ffffffff810a69df>] work_on_cpu+0x3d/0xeb
[   16.556955]  RSP <ffff88012fac9b60>
[   17.493250] ---[ end trace 4eaa2a86a8e2da23 ]---
[   17.507376] swapper used greatest stack depth: 3776 bytes left
[   17.525131] Kernel panic - not syncing: Attempted to kill init!




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-28 13:02             ` Rusty Russell
  2009-01-28 17:19               ` Mike Travis
@ 2009-01-28 19:44               ` Andrew Morton
  2009-01-29  1:43                 ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-28 19:44 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Wed, 28 Jan 2009 23:32:28 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> +static int do_work_on_cpu(void *unused)
> +{
> +	for (;;) {
> +		struct completion *done;
> +
> +		wait_event(woc_wq, current_work != NULL);
> +
> +		set_cpus_allowed_ptr(current, cpumask_of(current_work->cpu));
> +		WARN_ON(smp_processor_id() != current_work->cpu);
> +
> +		current_work->ret = current_work->fn(current_work->arg);
> +		/* Make sure ret is set before we complete().  Paranoia. */
> +		wmb();
> +
> +		/* Reset current_work so we don't spin. */
> +		done = &current_work->done;
> +		current_work = NULL;
> +
> +		/* Reset current_work for next work_on_cpu(). */
> +		complete(done);
> +	}
> +}
> +
> +/**
> + * work_on_cpu - run a function in user context on a particular cpu
> + * @cpu: the cpu to run on
> + * @fn: the function to run
> + * @arg: the function arg
> + *
> + * This will return the value @fn returns.
> + * It is up to the caller to ensure that the cpu doesn't go offline.
> + */
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> +	struct work_for_cpu work;
> +
> +	work.cpu = cpu;
> +	work.fn = fn;
> +	work.arg = arg;
> +	init_completion(&work.done);
> +
> +	mutex_lock(&woc_mutex);
> +	/* Make sure all is in place before it sees fn set. */
> +	wmb();
> +	current_work = &work;
> +	wake_up(&woc_wq);
> +
> +	wait_for_completion(&work.done);
> +	BUG_ON(current_work);
> +	mutex_unlock(&woc_mutex);
> +
> +	return work.ret;
> +}

We still have a queue - it's implicit now, rather than explicit.

It's vulnerable to the same deadlock, I think?  Suppose we have:

- A lock, L

- A callback function which takes that lock, called function_which_takes_L()

- A task A which does work_on_cpu(function_which_takes_L)

- A task B which does

	lock(L);
	work_on_cpu(something_else);


Now,

- A calls work_on_cpu() and takes woc_mutex.

- Before function_which_takes_L() has started to execute, task B takes L
  then calls work_on_cpu() and task B blocks on woc_mutex.

- Now function_which_takes_L() runs, and blocks on L

Nothing else happens...

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-28 19:44               ` Andrew Morton
@ 2009-01-29  1:43                 ` Rusty Russell
  2009-01-29  2:12                   ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-01-29  1:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> It's vulnerable to the same deadlock, I think?  Suppose we have:
...
> - A calls work_on_cpu() and takes woc_mutex.
> 
> - Before function_which_takes_L() has started to execute, task B takes L
>   then calls work_on_cpu() and task B blocks on woc_mutex.
> 
> - Now function_which_takes_L() runs, and blocks on L

Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.

Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-29  1:43                 ` Rusty Russell
@ 2009-01-29  2:12                   ` Andrew Morton
  2009-01-30  6:03                     ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-29  2:12 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > It's vulnerable to the same deadlock, I think?  Suppose we have:
> ...
> > - A calls work_on_cpu() and takes woc_mutex.
> > 
> > - Before function_which_takes_L() has started to execute, task B takes L
> >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > 
> > - Now function_which_takes_L() runs, and blocks on L
> 
> Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> 
> Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> 

hm.  it's a bit of a timebomb.

y'know, the original way in which acpi-cpufreq did this is starting to
look attractive.  Migrate self to that CPU then just call the dang
function.  Slow, but no deadlocks (I think)?



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-28 17:32                 ` Mike Travis
@ 2009-01-29 10:39                   ` Rusty Russell
  0 siblings, 0 replies; 70+ messages in thread
From: Rusty Russell @ 2009-01-29 10:39 UTC (permalink / raw)
  To: Mike Travis; +Cc: Andrew Morton, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Thursday 29 January 2009 04:02:42 Mike Travis wrote:
> Mike Travis wrote:
> > Hi Rusty,
> > 
> > I'm testing this now on x86_64 and one question comes up.  The
> > initialization of the woc_wq thread happens quite late.  Might it
> > be better to initialize it earlier?
> 
> Umm, definitely needed earlier...  A bug catcher caught this.  Work_on_cpu
> is being called before it's initialized.
>
> [   16.541297] calling  microcode_init+0x0/0x13a @ 1

OK, core_initcall will be sufficient to call before this one.

I also want to change the code so that the affinity is set from work_on_cpu rather than the thread itself; it's slightly more efficient.

Here's a patch-on-top.

work_on_cpu: bug fix and enhancements

Make it a core_initcall, since a module_initcall needs it.

Also, make the caller set the affinity of the worker thread: this is
more efficient than setting our own affinity (which requires the
migration thread's help).

This has two side effects:
1) We will oops if work_on_cpu is called too early,
2) We can WARN_ON and just run on the wrong cpu rather than locking up if
   they ask for an offline cpu (bug compatible old method of calling
   set_cpus_allowed).

Test code exercises WARN_ON; you probably want to remove it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/kernel/work_on_cpu.c b/kernel/work_on_cpu.c
--- a/kernel/work_on_cpu.c
+++ b/kernel/work_on_cpu.c
@@ -5,6 +5,10 @@
 #include <linux/cpumask.h>
 #include <linux/module.h>
 
+#define DEBUG
+
+/* The thread which actually does the work. */
+static struct task_struct *woc_thread;
 /* The thread waits for new work on this waitqueue. */
 static DECLARE_WAIT_QUEUE_HEAD(woc_wq);
 /* The lock ensures only one job is done at a time. */
@@ -12,7 +16,9 @@ static DEFINE_MUTEX(woc_mutex);
 
 /* The details of the current job. */
 struct work_for_cpu {
+#ifdef DEBUG
 	unsigned int cpu;
+#endif
 	long (*fn)(void *);
 	void *arg;
 	long ret;
@@ -33,8 +39,9 @@ static int do_work_on_cpu(void *unused)
 
 		wait_event(woc_wq, current_work != NULL);
 
-		set_cpus_allowed_ptr(current, cpumask_of(current_work->cpu));
+#ifdef DEBUG
 		WARN_ON(smp_processor_id() != current_work->cpu);
+#endif
 
 		current_work->ret = current_work->fn(current_work->arg);
 		/* Make sure ret is set before we complete().  Paranoia. */
@@ -62,12 +69,21 @@ long work_on_cpu(unsigned int cpu, long 
 {
 	struct work_for_cpu work;
 
+#ifdef DEBUG
 	work.cpu = cpu;
+#endif
 	work.fn = fn;
 	work.arg = arg;
 	init_completion(&work.done);
 
 	mutex_lock(&woc_mutex);
+	if (set_cpus_allowed_ptr(woc_thread, cpumask_of(cpu)) != 0) {
+		WARN(1, "work_on_cpu on offline cpu %i?\n", cpu);
+#ifdef DEBUG
+		/* Avoids the additional WARN_ON in the thread. */
+		work.cpu = task_cpu(woc_thread);
+#endif
+	}
 	/* Make sure all is in place before it sees fn set. */
 	wmb();
 	current_work = &work;
@@ -81,7 +97,7 @@ long work_on_cpu(unsigned int cpu, long 
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
 
-#if 1
+#ifdef DEBUG
 static long test_fn(void *arg)
 {
 	printk("%u: %lu\n", smp_processor_id(), (long)arg);
@@ -93,16 +109,16 @@ static int __init init(void)
 {
 	unsigned int i;
 
-	kthread_run(do_work_on_cpu, NULL, "kwork_on_cpu");
+	woc_thread = kthread_run(do_work_on_cpu, NULL, "kwork_on_cpu");
 
-#if 1
-	for_each_online_cpu(i) {
+#ifdef DEBUG
+	for_each_possible_cpu(i) {
 		long ret = work_on_cpu(i, test_fn, (void *)i);
 		printk("CPU %i returned %li\n", i, ret);
-		BUG_ON(ret != i + 100);
+		BUG_ON(cpu_online(i) && ret != i + 100);
 	}
 #endif
 
 	return 0;
 }
-module_init(init);
+core_initcall(init);

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-29  2:12                   ` Andrew Morton
@ 2009-01-30  6:03                     ` Rusty Russell
  2009-01-30  6:30                       ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-01-30  6:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Thursday 29 January 2009 12:42:05 Andrew Morton wrote:
> On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > > It's vulnerable to the same deadlock, I think?  Suppose we have:
> > ...
> > > - A calls work_on_cpu() and takes woc_mutex.
> > > 
> > > - Before function_which_takes_L() has started to execute, task B takes L
> > >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > > 
> > > - Now function_which_takes_L() runs, and blocks on L
> > 
> > Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> > 
> > Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> > 
> 
> hm.  it's a bit of a timebomb.
> 
> y'know, the original way in which acpi-cpufreq did this is starting to
> look attractive.  Migrate self to that CPU then just call the dang
> function.  Slow, but no deadlocks (I think)?

Just buggy.  What random thread was it mugging?  If there's any path where
it's not a kthread, what if userspace does the same thing at the same time?
We risk running on the wrong cpu, *then* overriding userspace when we restore
it.

In general these cpumask games are a bad idea.

Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30  6:03                     ` Rusty Russell
@ 2009-01-30  6:30                       ` Andrew Morton
  2009-01-30 13:49                         ` Ingo Molnar
  2009-01-30 21:59                         ` Rusty Russell
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-30  6:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Fri, 30 Jan 2009 16:33:53 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Thursday 29 January 2009 12:42:05 Andrew Morton wrote:
> > On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > > > It's vulnerable to the same deadlock, I think?  Suppose we have:
> > > ...
> > > > - A calls work_on_cpu() and takes woc_mutex.
> > > > 
> > > > - Before function_which_takes_L() has started to execute, task B takes L
> > > >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > > > 
> > > > - Now function_which_takes_L() runs, and blocks on L
> > > 
> > > Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> > > 
> > > Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> > > 
> > 
> > hm.  it's a bit of a timebomb.
> > 
> > y'know, the original way in which acpi-cpufreq did this is starting to
> > look attractive.  Migrate self to that CPU then just call the dang
> > function.  Slow, but no deadlocks (I think)?
> 
> Just buggy.  What random thread was it mugging?  If there's any path where
> it's not a kthread, what if userspace does the same thing at the same time?
> We risk running on the wrong cpu, *then* overriding userspace when we restore
> it.

hm, Ok, not unficable but not pleasant.

> In general these cpumask games are a bad idea.

So we still don't have any non-buggy proposal.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30  6:30                       ` Andrew Morton
@ 2009-01-30 13:49                         ` Ingo Molnar
  2009-01-30 17:08                           ` Andrew Morton
  2009-01-30 21:59                         ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-01-30 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rusty Russell, Mike Travis, Ingo Molnar, Dave Jones, cpufreq,
	linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 30 Jan 2009 16:33:53 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Thursday 29 January 2009 12:42:05 Andrew Morton wrote:
> > > On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > 
> > > > On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > > > > It's vulnerable to the same deadlock, I think?  Suppose we have:
> > > > ...
> > > > > - A calls work_on_cpu() and takes woc_mutex.
> > > > > 
> > > > > - Before function_which_takes_L() has started to execute, task B takes L
> > > > >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > > > > 
> > > > > - Now function_which_takes_L() runs, and blocks on L
> > > > 
> > > > Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> > > > 
> > > > Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> > > > 
> > > 
> > > hm.  it's a bit of a timebomb.
> > > 
> > > y'know, the original way in which acpi-cpufreq did this is starting to
> > > look attractive.  Migrate self to that CPU then just call the dang
> > > function.  Slow, but no deadlocks (I think)?
> > 
> > Just buggy.  What random thread was it mugging?  If there's any path 
> > where it's not a kthread, what if userspace does the same thing at the 
> > same time? We risk running on the wrong cpu, *then* overriding 
> > userspace when we restore it.
> 
> hm, Ok, not unficable but not pleasant.
> 
> > In general these cpumask games are a bad idea.
> 
> So we still don't have any non-buggy proposal.

Current upstream code is not pretty (due to the extra workqueue) but not 
buggy either. You'd be right to point out that it is easy to insert a bug 
into it and thus it's not pleasant (more of a workaround than a real fix) 
but if it's outright buggy then please talk up.

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30 13:49                         ` Ingo Molnar
@ 2009-01-30 17:08                           ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2009-01-30 17:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Mike Travis, Ingo Molnar, Dave Jones, cpufreq,
	linux-kernel

On Fri, 30 Jan 2009 14:49:35 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> > So we still don't have any non-buggy proposal.
> 
> Current upstream code is not pretty (due to the extra workqueue) but not 
> buggy either. You'd be right to point out that it is easy to insert a bug 
> into it and thus it's not pleasant (more of a workaround than a real fix) 
> but if it's outright buggy then please talk up.

OK, so we're not aware of anything in there which will trigger the bug
yet.  Although allocate_threshold_blocks() takes about half the locks in
the kernel - it can run an ext3 commit, it does netlink tx, synchronous
process exec, etc.

Why not give up and kill the whole work_on_cpu() thing?  afaict the
only caller which cannot be immediately switched to use an IPI is
mce_amd_64's allocate_threshold_blocks(), and it looks like that can be
trivially fixed by moving the entire function except for one rdmsr() up
into the caller.  


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30  6:30                       ` Andrew Morton
  2009-01-30 13:49                         ` Ingo Molnar
@ 2009-01-30 21:59                         ` Rusty Russell
  2009-01-30 22:17                           ` Andrew Morton
  1 sibling, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-01-30 21:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Travis, Ingo Molnar, Dave Jones, cpufreq, linux-kernel

On Friday 30 January 2009 17:00:42 Andrew Morton wrote:
> On Fri, 30 Jan 2009 16:33:53 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Thursday 29 January 2009 12:42:05 Andrew Morton wrote:
> > > On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > 
> > > > On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > > > > It's vulnerable to the same deadlock, I think?  Suppose we have:
> > > > ...
> > > > > - A calls work_on_cpu() and takes woc_mutex.
> > > > > 
> > > > > - Before function_which_takes_L() has started to execute, task B takes L
> > > > >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > > > > 
> > > > > - Now function_which_takes_L() runs, and blocks on L
> > > > 
> > > > Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> > > > 
> > > > Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> > > > 
> > > 
> > > hm.  it's a bit of a timebomb.
> > > 
> > > y'know, the original way in which acpi-cpufreq did this is starting to
> > > look attractive.  Migrate self to that CPU then just call the dang
> > > function.  Slow, but no deadlocks (I think)?
> > 
> > Just buggy.  What random thread was it mugging?  If there's any path where
> > it's not a kthread, what if userspace does the same thing at the same time?
> > We risk running on the wrong cpu, *then* overriding userspace when we restore
> > it.
> 
> hm, Ok, not unficable but not pleasant.
> 
> > In general these cpumask games are a bad idea.
> 
> So we still don't have any non-buggy proposal.

I disagree about the avoiding-workqueue one being buggy.  The same logic
applies to any simple callback function.

Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30 21:59                         ` Rusty Russell
@ 2009-01-30 22:17                           ` Andrew Morton
  2009-02-02 12:35                             ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-01-30 22:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Sat, 31 Jan 2009 08:29:15 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 30 January 2009 17:00:42 Andrew Morton wrote:
> > On Fri, 30 Jan 2009 16:33:53 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > On Thursday 29 January 2009 12:42:05 Andrew Morton wrote:
> > > > On Thu, 29 Jan 2009 12:13:32 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > > 
> > > > > On Thursday 29 January 2009 06:14:40 Andrew Morton wrote:
> > > > > > It's vulnerable to the same deadlock, I think?  Suppose we have:
> > > > > ...
> > > > > > - A calls work_on_cpu() and takes woc_mutex.
> > > > > > 
> > > > > > - Before function_which_takes_L() has started to execute, task B takes L
> > > > > >   then calls work_on_cpu() and task B blocks on woc_mutex.
> > > > > > 
> > > > > > - Now function_which_takes_L() runs, and blocks on L
> > > > > 
> > > > > Agreed, but now it's a fairly simple case.  Both sides have to take lock L, and both have to call work_on_cpu.
> > > > > 
> > > > > Workqueues are more generic and widespread, and an amazing amount of stuff gets called from them.  That's why I felt uncomfortable with removing the one known problematic caller.
> > > > > 
> > > > 
> > > > hm.  it's a bit of a timebomb.
> > > > 
> > > > y'know, the original way in which acpi-cpufreq did this is starting to
> > > > look attractive.  Migrate self to that CPU then just call the dang
> > > > function.  Slow, but no deadlocks (I think)?
> > > 
> > > Just buggy.  What random thread was it mugging?  If there's any path where
> > > it's not a kthread, what if userspace does the same thing at the same time?
> > > We risk running on the wrong cpu, *then* overriding userspace when we restore
> > > it.
> > 
> > hm, Ok, not unficable but not pleasant.
> > 
> > > In general these cpumask games are a bad idea.
> > 
> > So we still don't have any non-buggy proposal.
> 
> I disagree about the avoiding-workqueue one being buggy.
> 

I assume you're talking about the patch I looked at a couple of days ago.

It's vulnerable to the same deadlock as work_on_cpu() has always been.

Just as an example, take a look at allocate_threshold_blocks().  That
function way down in the innards of x86 has blotted out large amounts of
kernel code, so that code can now not use work_on_cpu().  Anything which
happens inside ext3 commit (the entire block layer and all drivers
underneath it).  Large lumps of networking code.  Parts of the page
allocator and the VFS which I haven't started to think about yet.

>  The same logic
> applies to any simple callback function.

Not!  The difference here is the queueing and serialisation which
introduces dependencies between unrelated subsystems which happen to
use this piece of core infrastructure.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-01-30 22:17                           ` Andrew Morton
@ 2009-02-02 12:35                             ` Rusty Russell
  2009-02-03  4:06                               ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-02-02 12:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Saturday 31 January 2009 08:47:44 Andrew Morton wrote:
> Just as an example, take a look at allocate_threshold_blocks().  That
> function way down in the innards of x86 has blotted out large amounts of
> kernel code, so that code can now not use work_on_cpu().  Anything which
> happens inside ext3 commit (the entire block layer and all drivers
> underneath it).  Large lumps of networking code.  Parts of the page
> allocator and the VFS which I haven't started to think about yet.

Yes, you're right.  Any infrastructure which does callouts holding a lock
has the same problem.  We have several of those, as I pointed out, but the
problem comes when invoking any two; more likely when they're general.

And I did so much under work_on_cpu here (Mike is credited, but it looks like
my work) precisely because I have no idea what this code is doing, so chose
the simplest conversion.

AFAICT, it just wants to rdmsr on a particular CPU.  rdmsr_on_cpu() is pretty
easy to implement which would fix *this* case (and IIRC, would be useful
elsewhere)

If we want a general work_on_cpu(), we need this:

Subject: work_on_cpu: __work_on_cpu and singlethread work_on_cpu

Andrew Morton points out two problems with the current work_on_cpu
implementation.  Firstly, it adds a thread per cpu, which is wasteful.
Secondly, by holding a lock across a generic callback, it creates more
potential for lock inversion: any lock grabbed by the callback is a
lock which another unrelated caller to work_on_cpu() can't hold.

(A similar issue has plagued kevent).

This patch does two things: firstly, it changes to a singlethread
workqueue which simply moves itself to the appropriate CPU.  Secondly,
it adds an __work_on_cpu() for callbacks which need to grab locks:
this allows them to use their own independent workqueue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/linux/workqueue.h |    7 +++++
 kernel/workqueue.c        |   59 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -257,7 +257,14 @@ static inline long work_on_cpu(unsigned 
 {
 	return fn(arg);
 }
+static inline long __work_on_cpu(struct workqueue_struct *swq,
+				 unsigned cpu, long (*fn)(void *), void *arg)
+{
+	return fn(arg);
+}
 #else
+long __work_on_cpu(struct workqueue_struct *swq,
+		   unsigned int cpu, long (*fn)(void *), void *arg);
 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
 #endif /* CONFIG_SMP */
 #endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -977,6 +977,7 @@ struct work_for_cpu {
 	struct work_struct work;
 	long (*fn)(void *);
 	void *arg;
+	unsigned int cpu;
 	long ret;
 };
 
@@ -984,8 +985,48 @@ static void do_work_for_cpu(struct work_
 {
 	struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
 
+	if (set_cpus_allowed_ptr(current, cpumask_of(wfc->cpu)) != 0)
+		WARN(1, "work_on_cpu on offline cpu %u?\n", wfc->cpu);
 	wfc->ret = wfc->fn(wfc->arg);
 }
+
+
+/**
+ * __work_on_cpu - run a function in a workqueue on a particular cpu
+ * @swq: the (singlethreaded) workqueue
+ * @cpu: the cpu to run on
+ * @fn: the function to run
+ * @arg: the function arg
+ *
+ * This will return the value @fn returns.
+ * It is up to the caller to ensure that the cpu doesn't go offline.
+ *
+ * Example:
+ *	int ret;
+ *	struct workqueue_struct *wq = create_singlethread_workqueue("myq");
+ *	if (unlikely(!wq))
+ *		ret = -ENOMEM;
+ *	else {
+ *		ret = __work_on_cpu(wq, cpu, fn, arg);
+ *		destroy_workqueue(wq);
+ *	}
+ */
+long __work_on_cpu(struct workqueue_struct *swq,
+		   unsigned int cpu, long (*fn)(void *), void *arg)
+{
+	struct work_for_cpu wfc;
+
+	INIT_WORK(&wfc.work, do_work_for_cpu);
+	wfc.fn = fn;
+	wfc.arg = arg;
+	wfc.cpu = cpu;
+	BUG_ON(!swq->singlethread);
+	queue_work(swq, &wfc.work);
+	flush_work(&wfc.work);
+
+	return wfc.ret;
+}
+EXPORT_SYMBOL_GPL(__work_on_cpu);
 
 /**
  * work_on_cpu - run a function in user context on a particular cpu
@@ -995,18 +1036,16 @@ static void do_work_for_cpu(struct work_
  *
  * This will return the value @fn returns.
  * It is up to the caller to ensure that the cpu doesn't go offline.
+ *
+ * @fn is called with a lock held (the work_on_cpu workqueue's lock):
+ * if it grabs any externally-visible locks, you might get a locking
+ * inversion against others who grab those locks then call
+ * work_on_cpu().  You can use your own private workqueue to avoid
+ * this.
  */
 long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
 {
-	struct work_for_cpu wfc;
-
-	INIT_WORK(&wfc.work, do_work_for_cpu);
-	wfc.fn = fn;
-	wfc.arg = arg;
-	queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
-	flush_work(&wfc.work);
-
-	return wfc.ret;
+	return __work_on_cpu(work_on_cpu_wq, cpu, fn, arg);
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
 #endif /* CONFIG_SMP */
@@ -1022,7 +1061,7 @@ void __init init_workqueues(void)
 	keventd_wq = create_workqueue("events");
 	BUG_ON(!keventd_wq);
 #ifdef CONFIG_SMP
-	work_on_cpu_wq = create_workqueue("work_on_cpu");
+	work_on_cpu_wq = create_singlethread_workqueue("work_on_cpu");
 	BUG_ON(!work_on_cpu_wq);
 #endif
 }


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-02 12:35                             ` Rusty Russell
@ 2009-02-03  4:06                               ` Andrew Morton
  2009-02-04  2:44                                 ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-02-03  4:06 UTC (permalink / raw)
  To: Rusty Russell; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Mon, 2 Feb 2009 23:05:50 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> +/**
> + * __work_on_cpu - run a function in a workqueue on a particular cpu
> + * @swq: the (singlethreaded) workqueue
> + * @cpu: the cpu to run on
> + * @fn: the function to run
> + * @arg: the function arg
> + *
> + * This will return the value @fn returns.
> + * It is up to the caller to ensure that the cpu doesn't go offline.
> + *
> + * Example:
> + *	int ret;
> + *	struct workqueue_struct *wq = create_singlethread_workqueue("myq");
> + *	if (unlikely(!wq))
> + *		ret = -ENOMEM;
> + *	else {
> + *		ret = __work_on_cpu(wq, cpu, fn, arg);
> + *		destroy_workqueue(wq);
> + *	}
> + */
> +long __work_on_cpu(struct workqueue_struct *swq,
> +		   unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> +	struct work_for_cpu wfc;
> +
> +	INIT_WORK(&wfc.work, do_work_for_cpu);
> +	wfc.fn = fn;
> +	wfc.arg = arg;
> +	wfc.cpu = cpu;
> +	BUG_ON(!swq->singlethread);
> +	queue_work(swq, &wfc.work);
> +	flush_work(&wfc.work);
> +
> +	return wfc.ret;
> +}
> +EXPORT_SYMBOL_GPL(__work_on_cpu);

If we're going to do this then we might as well create the thread right
here, use it once and then let it exit.

That's slower, but I suspect that we could get a lot of that inefficiency
back by coming up with a kernel_thread_on_cpu(), so we don't go and schedule
the function on a random CPU only to immediately switch it over to a different
one, dunno.

But as you say, rdmsr_on_cpu() is easy to do, using
smp_call_function_single().  Then we can easily convert all other
work_on_cpu() callers to smp_call_function_single() and zap
work_on_cpu().  The best outcome, methinks.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-03  4:06                               ` Andrew Morton
@ 2009-02-04  2:44                                 ` Rusty Russell
  2009-02-04  3:01                                   ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-02-04  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Tuesday 03 February 2009 14:36:49 Andrew Morton wrote:
> But as you say, rdmsr_on_cpu() is easy to do, using
> smp_call_function_single().  Then we can easily convert all other
> work_on_cpu() callers to smp_call_function_single() and zap
> work_on_cpu().  The best outcome, methinks.

grepping for set_cpus_allowed reveals the following users in the
first third (when I got tired of it):

File                         Function            Context      Calls
powerpc/kernel/smp.c         smp_cpus_done       kernel_init smp_ops->setup_cpu()
x86/.../speedstep-ich.c      _speedstep_get      cpufreq     speedstep_get_processor_frequency()
x86/.../powernow-k8.c        check_supported_cpu cpufreq     cpuid_*()
x86/.../powernow-k8.c        powernowk8_target   cpufreq     mutex_lock, etc
x86/.../powernow-k8.c        powernowk8_cpu_init boot/hotplg rdmsr,wrmsr
x86/.../powernow-k8.c        powernowk8_get      cpufreq     rdmsr
x86/.../speedstep-centrino.c get_cur_freq        cpufreq     rdmsr
x86/.../speedstep-centrino.c centrino_target     cpufreq     rdmsr,wrmsr
x86/kernel/reboot.c          nativ..._shutdown   sys_reboot  misc
x86/.../microcode_core.c     do_microcode_update sys_write   microcode_ops->request_microcode_user
x86/.../microcode_core.c     reload_store        sys_write   mutex_lock, etc

Some of these (reboot) we don't care what they do to cpumask.  Some are
poor style, but not critical (boot time).  Some can use smp_call_function.

But there are some which should *not* be frobbing their cpus_allowed, and
really do seem to want a user context.

I think you're right though: smp_call_function_single (or neat wrappers)
where possible, work_on_cpu which can fail for the others, and we'll just
have to plumb in the error returns.

I'll start the patchstream for 2.6.30.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04  2:44                                 ` Rusty Russell
@ 2009-02-04  3:01                                   ` Andrew Morton
  2009-02-04 10:41                                     ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-02-04  3:01 UTC (permalink / raw)
  To: Rusty Russell; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Wed, 4 Feb 2009 13:14:31 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Tuesday 03 February 2009 14:36:49 Andrew Morton wrote:
> > But as you say, rdmsr_on_cpu() is easy to do, using
> > smp_call_function_single().  Then we can easily convert all other
> > work_on_cpu() callers to smp_call_function_single() and zap
> > work_on_cpu().  The best outcome, methinks.
> 
> grepping for set_cpus_allowed reveals the following users in the
> first third (when I got tired of it):
> 
> File                         Function            Context      Calls
> powerpc/kernel/smp.c         smp_cpus_done       kernel_init smp_ops->setup_cpu()
> x86/.../speedstep-ich.c      _speedstep_get      cpufreq     speedstep_get_processor_frequency()
> x86/.../powernow-k8.c        check_supported_cpu cpufreq     cpuid_*()
> x86/.../powernow-k8.c        powernowk8_target   cpufreq     mutex_lock, etc
> x86/.../powernow-k8.c        powernowk8_cpu_init boot/hotplg rdmsr,wrmsr
> x86/.../powernow-k8.c        powernowk8_get      cpufreq     rdmsr
> x86/.../speedstep-centrino.c get_cur_freq        cpufreq     rdmsr
> x86/.../speedstep-centrino.c centrino_target     cpufreq     rdmsr,wrmsr
> x86/kernel/reboot.c          nativ..._shutdown   sys_reboot  misc
> x86/.../microcode_core.c     do_microcode_update sys_write   microcode_ops->request_microcode_user
> x86/.../microcode_core.c     reload_store        sys_write   mutex_lock, etc
> 
> Some of these (reboot) we don't care what they do to cpumask.  Some are
> poor style, but not critical (boot time).  Some can use smp_call_function.
> 
> But there are some which should *not* be frobbing their cpus_allowed, and
> really do seem to want a user context.
> 
> I think you're right though: smp_call_function_single (or neat wrappers)
> where possible, work_on_cpu which can fail for the others, and we'll just
> have to plumb in the error returns.

I bet a lot of those can use plain old schedule_work_on().

Could it be that after converting things to
smp_call_function_single()/schedule_work_on(), we don't need
work_on_cpu()?


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04  3:01                                   ` Andrew Morton
@ 2009-02-04 10:41                                     ` Rusty Russell
  2009-02-04 15:36                                       ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-02-04 10:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Wednesday 04 February 2009 13:31:11 Andrew Morton wrote:
> On Wed, 4 Feb 2009 13:14:31 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > I think you're right though: smp_call_function_single (or neat wrappers)
> > where possible, work_on_cpu which can fail for the others, and we'll just
> > have to plumb in the error returns.
> 
> I bet a lot of those can use plain old schedule_work_on().

Which is where work_on_cpu started: a little wrapper around schedule_work_on.

We're going in circles, no?
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 10:41                                     ` Rusty Russell
@ 2009-02-04 15:36                                       ` Andrew Morton
  2009-02-04 21:35                                         ` Ingo Molnar
  2009-02-10  8:54                                         ` Rusty Russell
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2009-02-04 15:36 UTC (permalink / raw)
  To: Rusty Russell; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Wed, 4 Feb 2009 21:11:35 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Wednesday 04 February 2009 13:31:11 Andrew Morton wrote:
> > On Wed, 4 Feb 2009 13:14:31 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > I think you're right though: smp_call_function_single (or neat wrappers)
> > > where possible, work_on_cpu which can fail for the others, and we'll just
> > > have to plumb in the error returns.
> > 
> > I bet a lot of those can use plain old schedule_work_on().
> 
> Which is where work_on_cpu started: a little wrapper around schedule_work_on.
> 
> We're going in circles, no?

No, we've made some progress.  We have a better understanding of what
the restrictions, shortcomings and traps are in this stuff.  We've
learned (surprise!) that a one-size-fits-all big hammer wasn't such a
great idea.

Proposed schedule_work_on() rule: either the flush_work() caller or the
callback should not hold any explicit or implicit sleeping locks.

Quick scan:

arch/x86/kernel/reboot.c:native_machine_shutdown()

  I think this can/should continue to use set_cpus_allowed(). 
  Although it could be converted to schedule_work_on(), as long as the
  scheduler is still working properly at this time.

x86 microcode:

  This code is just nuts.  What's the point in pinning itself to a
  CPU for the act of loading the microcode into main memory?  It's only
  the loading of the microcode which should care about which CPU
  executes the code.  ie: apply_microcode().

  The code needs some laundering, switch to schedule_work_on(). 
  Ensure that the callback functions don't take microcode_mutex.

arch/x86/kernel/apm_32.c:

  It's a kernel thread.  Add kthread_bind() to caller, done.

arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c:

  Switch to rdmsr_on_cpu() and friends

arch/x86/kernel/cpu/cpufreq/powernow-k8.c

  Could get similar treatment (needs new cpuid_on_cpu()?).  Looks
  like schedule_work_on() would be OK to use as well.

arch/x86/kernel/cpu/cpufreq/speedstep-ich.c

  Use rdmsr_on_cpu().

arch/blackfin/kernel/ipipe.c

  Use kthread_bind()

arch/arm/mach-integrator/cpu.c

  Not sure.

arch/sparc/kernel/us3_cpufreq.c

  Use smp_call_function_single().  Write sparc64_get_clock_tick_on_cpu().

arch/sparc/kernel/us2e_cpufreq.c

  Ditto

arch/sparc/kernel/sysfs.c

  If sun4v_mmustat_info() can be called from interrupt, use
  smp_call_function_single().  Otherwise schedule_work_on().

arch/powerpc/platforms/pseries/rtasd.c

  Ditto

arch/powerpc/kernel/smp.c

  Ditto

arch/powerpc/kernel/sysfs.c

  Blah.  Looks like schedule_work_on() will be OK.

arch/mips/kernel/traps.c

  NFI what this does

arch/sh/kernel/cpufreq.c

  schedule_work_on()

arch/ia64/sn/kernel/sn2/sn_hwperf.c

  schedule_work_on()

arch/ia64/kernel/salinfo.c:

  schedule_work_on()

arch/ia64/kernel/topology.c:

  schedule_work_on()

arch/ia64/kernel/cpufreq/acpi-cpufreq.c:

  schedule_work_on()

mm/pdflush.c:

  wtf what the heck is all that stuff and who added it?  weird.

  Leave it alone I guess.  Can admins manually move kernel threads to
  other CPUs?

mm/vmscan.c:

  switch to kthread_bind()

net/sunrpc/svc.c:

  switch to kthread_bind()

kernel/kmod.c:

  switch to kthreads, write then switch to kthread_bind_mask().

drivers/firmware/dcdbas.c:

  smp_call_function_single()

drivers/misc/sgi-xp/xpc_main.c:

  kthread_bind()

drivers/acpi/processor_throttling.c:

  already done



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 15:36                                       ` Andrew Morton
@ 2009-02-04 21:35                                         ` Ingo Molnar
  2009-02-04 21:48                                           ` Andrew Morton
  2009-02-10  8:54                                         ` Rusty Russell
  1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2009-02-04 21:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rusty Russell, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> mm/pdflush.c:
> 
>   wtf what the heck is all that stuff and who added it?  weird.
> 
>   Leave it alone I guess.  Can admins manually move kernel threads to
>   other CPUs?

they can - and there's even tools that do that (there's some -rt tools where 
you can put kernel thread priorities into a config file).

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 21:35                                         ` Ingo Molnar
@ 2009-02-04 21:48                                           ` Andrew Morton
  2009-02-04 21:54                                             ` Ingo Molnar
                                                               ` (3 more replies)
  0 siblings, 4 replies; 70+ messages in thread
From: Andrew Morton @ 2009-02-04 21:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel

On Wed, 4 Feb 2009 22:35:19 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > mm/pdflush.c:
> > 
> >   wtf what the heck is all that stuff and who added it?  weird.
> > 
> >   Leave it alone I guess.  Can admins manually move kernel threads to
> >   other CPUs?
> 
> they can - and there's even tools that do that (there's some -rt tools where 
> you can put kernel thread priorities into a config file).
> 

Oh well, DontDoThatThen.

I expect that the same argument applies to most of the set_cpus_allowed()
callsites - they're run by root-only code.  Sure, root can (with
careful timing) move root's own thread onto the wrong CPU in the middle
of microcode loading.  In which case root gets to own both pieces.

We only really need to worry about the places where non-root code can
run set_cpus_allowed().  And then we only need to worry a little bit.

Yes?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 21:48                                           ` Andrew Morton
@ 2009-02-04 21:54                                             ` Ingo Molnar
  2009-02-04 23:45                                             ` Rusty Russell
                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2009-02-04 21:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rusty, travis, mingo, davej, cpufreq, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 4 Feb 2009 22:35:19 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > mm/pdflush.c:
> > > 
> > >   wtf what the heck is all that stuff and who added it?  weird.
> > > 
> > >   Leave it alone I guess.  Can admins manually move kernel threads to
> > >   other CPUs?
> > 
> > they can - and there's even tools that do that (there's some -rt tools where 
> > you can put kernel thread priorities into a config file).
> > 
> 
> Oh well, DontDoThatThen.
> 
> I expect that the same argument applies to most of the set_cpus_allowed()
> callsites - they're run by root-only code.  Sure, root can (with
> careful timing) move root's own thread onto the wrong CPU in the middle
> of microcode loading.  In which case root gets to own both pieces.
> 
> We only really need to worry about the places where non-root code can
> run set_cpus_allowed().  And then we only need to worry a little bit.
> 
> Yes?

Given the alternatives i suspect the best answer is a yes :)

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 21:48                                           ` Andrew Morton
  2009-02-04 21:54                                             ` Ingo Molnar
@ 2009-02-04 23:45                                             ` Rusty Russell
  2009-02-05 12:19                                             ` Pavel Machek
  2009-02-05 17:44                                             ` Dmitry Adamushko
  3 siblings, 0 replies; 70+ messages in thread
From: Rusty Russell @ 2009-02-04 23:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, travis, mingo, davej, cpufreq, linux-kernel

On Thursday 05 February 2009 08:18:23 Andrew Morton wrote:
> I expect that the same argument applies to most of the set_cpus_allowed()
> callsites - they're run by root-only code.  Sure, root can (with
> careful timing) move root's own thread onto the wrong CPU in the middle
> of microcode loading.  In which case root gets to own both pieces.

Sorry, I don't accept this at all.  It's completely reasonable for me to write a tool which clears all tasks off a cpu.  We wrote an API to do it, now we're going to say don't use it?

Yes, it's a pain to fix.  Yes, we should never have been doing it this way in the first place.  But let's not blame the users.

Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 21:48                                           ` Andrew Morton
  2009-02-04 21:54                                             ` Ingo Molnar
  2009-02-04 23:45                                             ` Rusty Russell
@ 2009-02-05 12:19                                             ` Pavel Machek
  2009-02-05 17:44                                             ` Dmitry Adamushko
  3 siblings, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2009-02-05 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, rusty, travis, mingo, davej, cpufreq, linux-kernel

> On Wed, 4 Feb 2009 22:35:19 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > mm/pdflush.c:
> > > 
> > >   wtf what the heck is all that stuff and who added it?  weird.
> > > 
> > >   Leave it alone I guess.  Can admins manually move kernel threads to
> > >   other CPUs?
> > 
> > they can - and there's even tools that do that (there's some -rt tools where 
> > you can put kernel thread priorities into a config file).
> > 
> 
> Oh well, DontDoThatThen.
> 
> I expect that the same argument applies to most of the set_cpus_allowed()
> callsites - they're run by root-only code.  Sure, root can (with
> careful timing) move root's own thread onto the wrong CPU in the middle
> of microcode loading.  In which case root gets to own both pieces.

Actually... with selinux and similar extensions, being root should not
mean you can run arbitrary code at the kernel level....


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 21:48                                           ` Andrew Morton
                                                               ` (2 preceding siblings ...)
  2009-02-05 12:19                                             ` Pavel Machek
@ 2009-02-05 17:44                                             ` Dmitry Adamushko
  3 siblings, 0 replies; 70+ messages in thread
From: Dmitry Adamushko @ 2009-02-05 17:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, rusty, travis, mingo, davej, cpufreq, linux-kernel

2009/2/4 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 4 Feb 2009 22:35:19 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
>
>>
>> * Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> > mm/pdflush.c:
>> >
>> >   wtf what the heck is all that stuff and who added it?  weird.
>> >
>> >   Leave it alone I guess.  Can admins manually move kernel threads to
>> >   other CPUs?
>>
>> they can - and there's even tools that do that (there's some -rt tools where
>> you can put kernel thread priorities into a config file).
>>
>
> Oh well, DontDoThatThen.
>
> I expect that the same argument applies to most of the set_cpus_allowed()
> callsites - they're run by root-only code.  Sure, root can (with
> careful timing) move root's own thread onto the wrong CPU in the middle
> of microcode loading.  In which case root gets to own both pieces.

Another issue is that those set_cpus_allowed() callsites may
effectivelly cancel the effect of sched_setaffinity() being run by an
administrator in parallel with a target process calling e.g.
cpufreq_get(0)

[ there are a couple of callsites in drivers/{video,pcmcia}, possibly
running in a process context for which sched_setaffinity() with mask
!= 'all_cpus_set' may be legitimate from admin's POV :-) ]

iow, not all use-cases are so obvious (like microcode) to fall into
the category of "didn't you know that this action could do some
unsynchronized cpu-mask-fiddling work behind your back". Not to say
that the existence of sych category is wrong, imho.

(regarding microcode)

> This code is just nuts.  What's the point in pinning itself to > a CPU for the act of loading the microcode into main
> memory?  It's only
> the loading of the microcode which should care about
> which CPU
> executes the code.  ie: apply_microcode().

Well, basically I tried to preserve the existing mechanisms/schemes as
much as possible when reworking this code and yes, I'm the person to
be blamed for this 'nuts' code.

> The code needs some laundering, switch to >schedule_work_on().

I had a patch doing exactly this but then there were other concerns
with the 'schedule_work_on()' approach
(e.g. http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-08/msg02827.html)

then I had another idea (run it from start_secondary() or something
like this), but I never managed to look at this issue again
(and noone else seemed to care about really running "it as early as
possible" ;-)

In any case, it should be fixable one way or another.

> Ensure that the callback functions don't take >microcode_mutex.

yes, these actually look redundant in cpu-hotplug paths (other callers
call get/put_online_cpus() so this part should be ok).

Will fix. Thanks for your comments!


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-04 15:36                                       ` Andrew Morton
  2009-02-04 21:35                                         ` Ingo Molnar
@ 2009-02-10  8:54                                         ` Rusty Russell
  2009-02-10  9:35                                           ` Andrew Morton
  1 sibling, 1 reply; 70+ messages in thread
From: Rusty Russell @ 2009-02-10  8:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Thursday 05 February 2009 02:06:36 Andrew Morton wrote:
> On Wed, 4 Feb 2009 21:11:35 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > On Wednesday 04 February 2009 13:31:11 Andrew Morton wrote:
> > > On Wed, 4 Feb 2009 13:14:31 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > > I think you're right though: smp_call_function_single (or neat wrappers)
> > > > where possible, work_on_cpu which can fail for the others, and we'll just
> > > > have to plumb in the error returns.
> > > 
> > > I bet a lot of those can use plain old schedule_work_on().
> > 
> > Which is where work_on_cpu started: a little wrapper around schedule_work_on.
> > 
> > We're going in circles, no?
> 
> No, we've made some progress.  We have a better understanding of what
> the restrictions, shortcomings and traps are in this stuff.  We've
> learned (surprise!) that a one-size-fits-all big hammer wasn't such a
> great idea.
> 
> Proposed schedule_work_on() rule: either the flush_work() caller or the
> callback should not hold any explicit or implicit sleeping locks.

But as you found out looking through these, it's really hard to tell.  I can
guess, but that's a little fraught...

How about we make work_on_cpu spawn a temp thread; if you care, use
something cleverer?  Spawning a thread just isn't that slow.

Meanwhile, I'll prepare patches to convert all the non-controversial cases
(ie. smp_call_function-style ones).

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-10  8:54                                         ` Rusty Russell
@ 2009-02-10  9:35                                           ` Andrew Morton
  2009-02-11  0:32                                             ` Rusty Russell
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-02-10  9:35 UTC (permalink / raw)
  To: Rusty Russell; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Tue, 10 Feb 2009 19:24:07 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Thursday 05 February 2009 02:06:36 Andrew Morton wrote:
> > On Wed, 4 Feb 2009 21:11:35 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > On Wednesday 04 February 2009 13:31:11 Andrew Morton wrote:
> > > > On Wed, 4 Feb 2009 13:14:31 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > > > > I think you're right though: smp_call_function_single (or neat wrappers)
> > > > > where possible, work_on_cpu which can fail for the others, and we'll just
> > > > > have to plumb in the error returns.
> > > > 
> > > > I bet a lot of those can use plain old schedule_work_on().
> > > 
> > > Which is where work_on_cpu started: a little wrapper around schedule_work_on.
> > > 
> > > We're going in circles, no?
> > 
> > No, we've made some progress.  We have a better understanding of what
> > the restrictions, shortcomings and traps are in this stuff.  We've
> > learned (surprise!) that a one-size-fits-all big hammer wasn't such a
> > great idea.
> > 
> > Proposed schedule_work_on() rule: either the flush_work() caller or the
> > callback should not hold any explicit or implicit sleeping locks.
> 
> But as you found out looking through these, it's really hard to tell.  I can
> guess, but that's a little fraught...

yup.

> How about we make work_on_cpu spawn a temp thread; if you care, use
> something cleverer?  Spawning a thread just isn't that slow.

That's what
work_on_cpu-rewrite-it-to-create-a-kernel-thread-on-demand.patch does?

> Meanwhile, I'll prepare patches to convert all the non-controversial cases
> (ie. smp_call_function-style ones).

arch-x86-kernel-acpi-cstatec-avoid-using-work_on_cpu.patch
arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-using-work_on_cpu.patch
arch-x86-kernel-cpu-mcheck-mce_amd_64c-avoid-using-work_on_cpu.patch

convert three work_on_cpu() callers.  The drivers/pci/pci-driver.c one
is a bit problematic.

I guess as long as we don't find a high frequency set_cpus_allowed()
callsite which can't be converted to smp_call_function_single() we'll
be OK.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
  2009-02-10  9:35                                           ` Andrew Morton
@ 2009-02-11  0:32                                             ` Rusty Russell
  0 siblings, 0 replies; 70+ messages in thread
From: Rusty Russell @ 2009-02-11  0:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: travis, mingo, davej, cpufreq, linux-kernel

On Tuesday 10 February 2009 20:05:48 Andrew Morton wrote:
> On Tue, 10 Feb 2009 19:24:07 +1030 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > How about we make work_on_cpu spawn a temp thread; if you care, use
> > something cleverer?  Spawning a thread just isn't that slow.
> 
> That's what
> work_on_cpu-rewrite-it-to-create-a-kernel-thread-on-demand.patch does?

Err, yeah.

> > Meanwhile, I'll prepare patches to convert all the non-controversial cases
> > (ie. smp_call_function-style ones).
> 
> arch-x86-kernel-acpi-cstatec-avoid-using-work_on_cpu.patch
> arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-using-work_on_cpu.patch
> arch-x86-kernel-cpu-mcheck-mce_amd_64c-avoid-using-work_on_cpu.patch
> 
> convert three work_on_cpu() callers.  The drivers/pci/pci-driver.c one
> is a bit problematic.

OK, I've pulled these in to play with them.  My main concern at the moment
is getting all cpumask_ts removed from core & x86 code for 2.6.30, which
usually means converting those save/restore to work_on_cpu or whatever.

> I guess as long as we don't find a high frequency set_cpus_allowed()
> callsite which can't be converted to smp_call_function_single() we'll
> be OK.

Yep.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2009-02-11  0:50 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
2009-01-16 19:11 ` [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu Mike Travis
2009-01-16 19:11 ` [PATCH 2/3] work_on_cpu: Use our own workqueue Mike Travis
2009-01-24  8:15   ` Andrew Morton
     [not found]     ` <200901261711.43943.rusty@rustcorp.com.au>
2009-01-26  7:01       ` Andrew Morton
2009-01-26 17:16         ` Ingo Molnar
2009-01-26 18:35           ` Andrew Morton
2009-01-26 20:20             ` Ingo Molnar
2009-01-26 20:43               ` Mike Travis
2009-01-26 21:00               ` Andrew Morton
2009-01-26 21:27                 ` Ingo Molnar
2009-01-26 21:35                   ` Andrew Morton
2009-01-26 21:45                     ` Ingo Molnar
2009-01-26 22:01                       ` Andrew Morton
2009-01-26 22:05                         ` Ingo Molnar
2009-01-26 22:16                           ` Andrew Morton
2009-01-26 22:20                             ` Ingo Molnar
2009-01-26 22:50                               ` Andrew Morton
2009-01-26 22:59                                 ` Ingo Molnar
2009-01-26 23:42                                   ` Andrew Morton
2009-01-26 23:53                                     ` Ingo Molnar
2009-01-27  0:42                                       ` Andrew Morton
2009-01-26 22:31                             ` Oleg Nesterov
2009-01-26 22:15                         ` Oleg Nesterov
2009-01-26 22:24                           ` Ingo Molnar
2009-01-26 22:37                             ` Oleg Nesterov
2009-01-26 22:42                               ` Ingo Molnar
2009-01-26 21:50                     ` Oleg Nesterov
2009-01-26 22:17                       ` Ingo Molnar
2009-01-26 23:01                         ` Mike Travis
2009-01-27  0:09                           ` Oleg Nesterov
2009-01-27  7:15                         ` Rusty Russell
2009-01-27 17:55                           ` Oleg Nesterov
2009-01-27  7:05         ` Rusty Russell
2009-01-27  7:25           ` Andrew Morton
2009-01-27 15:28             ` Ingo Molnar
2009-01-27 16:51               ` Andrew Morton
2009-01-28 13:02             ` Rusty Russell
2009-01-28 17:19               ` Mike Travis
2009-01-28 17:32                 ` Mike Travis
2009-01-29 10:39                   ` Rusty Russell
2009-01-28 19:44               ` Andrew Morton
2009-01-29  1:43                 ` Rusty Russell
2009-01-29  2:12                   ` Andrew Morton
2009-01-30  6:03                     ` Rusty Russell
2009-01-30  6:30                       ` Andrew Morton
2009-01-30 13:49                         ` Ingo Molnar
2009-01-30 17:08                           ` Andrew Morton
2009-01-30 21:59                         ` Rusty Russell
2009-01-30 22:17                           ` Andrew Morton
2009-02-02 12:35                             ` Rusty Russell
2009-02-03  4:06                               ` Andrew Morton
2009-02-04  2:44                                 ` Rusty Russell
2009-02-04  3:01                                   ` Andrew Morton
2009-02-04 10:41                                     ` Rusty Russell
2009-02-04 15:36                                       ` Andrew Morton
2009-02-04 21:35                                         ` Ingo Molnar
2009-02-04 21:48                                           ` Andrew Morton
2009-02-04 21:54                                             ` Ingo Molnar
2009-02-04 23:45                                             ` Rusty Russell
2009-02-05 12:19                                             ` Pavel Machek
2009-02-05 17:44                                             ` Dmitry Adamushko
2009-02-10  8:54                                         ` Rusty Russell
2009-02-10  9:35                                           ` Andrew Morton
2009-02-11  0:32                                             ` Rusty Russell
2009-01-16 19:11 ` [PATCH 3/3] cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
2009-01-16 23:38 ` [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request] Mike Travis
2009-01-17 22:08   ` Ingo Molnar
2009-01-19 17:11     ` Mike Travis
2009-01-19 17:26       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).