linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 2.6.29.1 debugobjects warning
@ 2009-04-23 14:00 Mathieu Desnoyers
  2009-04-23 23:46 ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-23 14:00 UTC (permalink / raw)
  To: Greg KH; +Cc: Andrew Morton, linux-kernel

Hi,

I got the following warning on my Thinkpad T43p laptop (single-core
32-bits x86). I run the 2.6.29.1 tree, plus LTTng patchset applied. It
seems to come from cpufreq. Any idea what is going on here ?


------------[ cut here ]------------
WARNING: at lib/debugobjects.c:217 debug_print_object+0x5a/0x70()
Hardware name: 2687D5U
ODEBUG: init active object type: timer_list
Modules linked in: irda parport nsc_ircc irtty_sir parport_pc psmouse snd_seq sn
d_seq_midi_event hid_logitech ac unix floppy output battery sir_dev nvram snd_ra
wmidi pcmcia x_tables ip_tables joydev snd_seq_midi evdev video snd_page_alloc s
oundcore led_class i2c_i801 cryptoloop snd snd_seq_dummy rfkill thinkpad_acpi sn
d_seq_oss snd_mixer_oss loop ipw2200 blowfish aes_i586 snd_pcm_oss ac97_bus snd_
seq_device agpgart snd_pcm ide_cd_mod snd_timer intel_agp snd_ac97_codec ide_gen
eric button snd_intel8x0 edd acpi_cpufreq ltt_statedump ipc_trace usbhid thermal
 mm_trace snd_intel8x0m fs_trace pcmcia_core rcu_trace lib80211 syscall_trace rs
rc_nonstatic libphy serio_raw kernel_trace tg3 trap_trace libipw crc_ccitt net_t
race dm_mod dm_log dm_region_hash dm_mirror yenta_socket dm_snapshot fat vfat nl
s_cp437 nls_iso8859_1 lp ppdev af_packet drm ntfs ipv6 auth_rpcgss binfmt_misc r
adeon lockd sunrpc nfs
Pid: 3628, comm: cpufreqd Not tainted 2.6.29.1-trace #28
Call Trace:
 [<c1044123>] warn_slowpath+0x73/0xd0
 [<c1069c28>] ? mark_held_locks+0x48/0x90
 [<c13313f5>] ? __mutex_unlock_slowpath+0xd5/0x150
 [<c1069f39>] ? trace_hardirqs_on_caller+0x199/0x1f0
 [<c1331478>] ? mutex_unlock+0x8/0x10
 [<c1331478>] ? mutex_unlock+0x8/0x10
 [<c1107b76>] ? sysfs_addrm_finish+0x16/0x230
 [<c1107671>] ? sysfs_find_dirent+0x21/0x30
 [<c116d71a>] debug_print_object+0x5a/0x70
 [<c116e054>] __debug_object_init+0x254/0x340
 [<c126533f>] ? cpufreq_governor_dbs+0x10f/0x210
 [<c116e187>] debug_object_init+0x17/0x20
 [<c104de70>] init_timer+0x10/0x30
 [<c104de9b>] init_timer_deferrable+0xb/0x20
 [<c12653fd>] cpufreq_governor_dbs+0x1cd/0x210
 [<c126205b>] __cpufreq_governor+0xab/0x120
 [<c12621cb>] __cpufreq_set_policy+0xfb/0x140
 [<c1262c24>] store_scaling_governor+0xa4/0x220
 [<c1263550>] ? handle_update+0x0/0x10
 [<c1262b80>] ? store_scaling_governor+0x0/0x220
 [<c126343e>] store+0x4e/0x70
 [<c1106b9c>] sysfs_write_file+0x9c/0x100
 [<c10b825c>] vfs_write+0x9c/0x140
 [<c1106b00>] ? sysfs_write_file+0x0/0x100
 [<c10b8447>] sys_write+0x47/0xe0
 [<c1021dde>] syscall_call+0x7/0xb
 [<c1020000>] ? sys_vfork+0x20/0x30


Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] 2.6.29.1 debugobjects warning
  2009-04-23 14:00 [BUG] 2.6.29.1 debugobjects warning Mathieu Desnoyers
@ 2009-04-23 23:46 ` Andrew Morton
  2009-04-24  4:34   ` [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1) Mathieu Desnoyers
                     ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andrew Morton @ 2009-04-23 23:46 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: gregkh, linux-kernel, cpufreq

On Thu, 23 Apr 2009 10:00:02 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> Hi,
> 
> I got the following warning on my Thinkpad T43p laptop (single-core
> 32-bits x86). I run the 2.6.29.1 tree, plus LTTng patchset applied. It
> seems to come from cpufreq. Any idea what is going on here ?
> 
> 
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:217 debug_print_object+0x5a/0x70()
> Hardware name: 2687D5U
> ODEBUG: init active object type: timer_list
> Modules linked in: irda parport nsc_ircc irtty_sir parport_pc psmouse snd_seq sn
> d_seq_midi_event hid_logitech ac unix floppy output battery sir_dev nvram snd_ra
> wmidi pcmcia x_tables ip_tables joydev snd_seq_midi evdev video snd_page_alloc s
> oundcore led_class i2c_i801 cryptoloop snd snd_seq_dummy rfkill thinkpad_acpi sn
> d_seq_oss snd_mixer_oss loop ipw2200 blowfish aes_i586 snd_pcm_oss ac97_bus snd_
> seq_device agpgart snd_pcm ide_cd_mod snd_timer intel_agp snd_ac97_codec ide_gen
> eric button snd_intel8x0 edd acpi_cpufreq ltt_statedump ipc_trace usbhid thermal
>  mm_trace snd_intel8x0m fs_trace pcmcia_core rcu_trace lib80211 syscall_trace rs
> rc_nonstatic libphy serio_raw kernel_trace tg3 trap_trace libipw crc_ccitt net_t
> race dm_mod dm_log dm_region_hash dm_mirror yenta_socket dm_snapshot fat vfat nl
> s_cp437 nls_iso8859_1 lp ppdev af_packet drm ntfs ipv6 auth_rpcgss binfmt_misc r
> adeon lockd sunrpc nfs
> Pid: 3628, comm: cpufreqd Not tainted 2.6.29.1-trace #28
> Call Trace:
>  [<c1044123>] warn_slowpath+0x73/0xd0
>  [<c1069c28>] ? mark_held_locks+0x48/0x90
>  [<c13313f5>] ? __mutex_unlock_slowpath+0xd5/0x150
>  [<c1069f39>] ? trace_hardirqs_on_caller+0x199/0x1f0
>  [<c1331478>] ? mutex_unlock+0x8/0x10
>  [<c1331478>] ? mutex_unlock+0x8/0x10
>  [<c1107b76>] ? sysfs_addrm_finish+0x16/0x230
>  [<c1107671>] ? sysfs_find_dirent+0x21/0x30
>  [<c116d71a>] debug_print_object+0x5a/0x70
>  [<c116e054>] __debug_object_init+0x254/0x340
>  [<c126533f>] ? cpufreq_governor_dbs+0x10f/0x210
>  [<c116e187>] debug_object_init+0x17/0x20
>  [<c104de70>] init_timer+0x10/0x30
>  [<c104de9b>] init_timer_deferrable+0xb/0x20
>  [<c12653fd>] cpufreq_governor_dbs+0x1cd/0x210
>  [<c126205b>] __cpufreq_governor+0xab/0x120
>  [<c12621cb>] __cpufreq_set_policy+0xfb/0x140
>  [<c1262c24>] store_scaling_governor+0xa4/0x220
>  [<c1263550>] ? handle_update+0x0/0x10
>  [<c1262b80>] ? store_scaling_governor+0x0/0x220
>  [<c126343e>] store+0x4e/0x70
>  [<c1106b9c>] sysfs_write_file+0x9c/0x100
>  [<c10b825c>] vfs_write+0x9c/0x140
>  [<c1106b00>] ? sysfs_write_file+0x0/0x100
>  [<c10b8447>] sys_write+0x47/0xe0
>  [<c1021dde>] syscall_call+0x7/0xb
>  [<c1020000>] ? sys_vfork+0x20/0x30
> 

It seems to be complaining that cpufreq_governor_dbs() is running
init_timer() against a timer which has already been initialised once.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1)
  2009-04-23 23:46 ` Andrew Morton
@ 2009-04-24  4:34   ` Mathieu Desnoyers
  2009-04-28 12:57     ` Mathieu Desnoyers
  2009-04-24  4:35   ` [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2) Mathieu Desnoyers
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-24  4:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: gregkh, stable, cpufreq, Ingo Molnar, rjw, Ben Slusky,
	Dave Jones, linux-kernel

The following patch fixes a problem more likely to happen following commit :

commit 8217e4f4c93e5fb59bb3cd1e6135213889349f86
Author: Ben Slusky <sluskyb@paranoiacs.org>
Date:   Mon Jul 7 13:16:20 2008 -0400

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Thu, 23 Apr 2009 10:00:02 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > Hi,
> > 
> > I got the following warning on my Thinkpad T43p laptop (single-core
> > 32-bits x86). I run the 2.6.29.1 tree, plus LTTng patchset applied. It
> > seems to come from cpufreq. Any idea what is going on here ?
> > 
> > 
> > ------------[ cut here ]------------
> > WARNING: at lib/debugobjects.c:217 debug_print_object+0x5a/0x70()
> > Hardware name: 2687D5U
> > ODEBUG: init active object type: timer_list
> > Modules linked in: irda parport nsc_ircc irtty_sir parport_pc psmouse snd_seq sn
> > d_seq_midi_event hid_logitech ac unix floppy output battery sir_dev nvram snd_ra
> > wmidi pcmcia x_tables ip_tables joydev snd_seq_midi evdev video snd_page_alloc s
> > oundcore led_class i2c_i801 cryptoloop snd snd_seq_dummy rfkill thinkpad_acpi sn
> > d_seq_oss snd_mixer_oss loop ipw2200 blowfish aes_i586 snd_pcm_oss ac97_bus snd_
> > seq_device agpgart snd_pcm ide_cd_mod snd_timer intel_agp snd_ac97_codec ide_gen
> > eric button snd_intel8x0 edd acpi_cpufreq ltt_statedump ipc_trace usbhid thermal
> >  mm_trace snd_intel8x0m fs_trace pcmcia_core rcu_trace lib80211 syscall_trace rs
> > rc_nonstatic libphy serio_raw kernel_trace tg3 trap_trace libipw crc_ccitt net_t
> > race dm_mod dm_log dm_region_hash dm_mirror yenta_socket dm_snapshot fat vfat nl
> > s_cp437 nls_iso8859_1 lp ppdev af_packet drm ntfs ipv6 auth_rpcgss binfmt_misc r
> > adeon lockd sunrpc nfs
> > Pid: 3628, comm: cpufreqd Not tainted 2.6.29.1-trace #28
> > Call Trace:
> >  [<c1044123>] warn_slowpath+0x73/0xd0
> >  [<c1069c28>] ? mark_held_locks+0x48/0x90
> >  [<c13313f5>] ? __mutex_unlock_slowpath+0xd5/0x150
> >  [<c1069f39>] ? trace_hardirqs_on_caller+0x199/0x1f0
> >  [<c1331478>] ? mutex_unlock+0x8/0x10
> >  [<c1331478>] ? mutex_unlock+0x8/0x10
> >  [<c1107b76>] ? sysfs_addrm_finish+0x16/0x230
> >  [<c1107671>] ? sysfs_find_dirent+0x21/0x30
> >  [<c116d71a>] debug_print_object+0x5a/0x70
> >  [<c116e054>] __debug_object_init+0x254/0x340
> >  [<c126533f>] ? cpufreq_governor_dbs+0x10f/0x210
> >  [<c116e187>] debug_object_init+0x17/0x20
> >  [<c104de70>] init_timer+0x10/0x30
> >  [<c104de9b>] init_timer_deferrable+0xb/0x20
> >  [<c12653fd>] cpufreq_governor_dbs+0x1cd/0x210
> >  [<c126205b>] __cpufreq_governor+0xab/0x120
> >  [<c12621cb>] __cpufreq_set_policy+0xfb/0x140
> >  [<c1262c24>] store_scaling_governor+0xa4/0x220
> >  [<c1263550>] ? handle_update+0x0/0x10
> >  [<c1262b80>] ? store_scaling_governor+0x0/0x220
> >  [<c126343e>] store+0x4e/0x70
> >  [<c1106b9c>] sysfs_write_file+0x9c/0x100
> >  [<c10b825c>] vfs_write+0x9c/0x140
> >  [<c1106b00>] ? sysfs_write_file+0x0/0x100
> >  [<c10b8447>] sys_write+0x47/0xe0
> >  [<c1021dde>] syscall_call+0x7/0xb
> >  [<c1020000>] ? sys_vfork+0x20/0x30
> > 
> 
> It seems to be complaining that cpufreq_governor_dbs() is running
> init_timer() against a timer which has already been initialised once.
> 

In cpufreq_conservative.c, we have :

static inline void dbs_timer_init(void)
{
        init_timer_deferrable(&dbs_work.timer);
        schedule_delayed_work(&dbs_work,
                        usecs_to_jiffies(dbs_tuners_ins.sampling_rate));
        return;
}

static inline void dbs_timer_exit(void)
{
        cancel_delayed_work(&dbs_work);
        return;
}

Called respectively upon CPUFREQ_GOV_START and CPUFREQ_GOV_STOP.

commit 8217e4f4c93e5fb59bb3cd1e6135213889349f86
adds init_timer_deferrable(&dbs_work.timer);

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit. Therefore, the work can reschedule itself if the
correct race happens (do_dbs_timer() executing before holding the mutex, work
cancelled, do_dbs_timer() scheduled again, work rescheduled).

This leads to a timer list corruption when the conservative governor is
restarted, because the timer is still active.

However, we _cannot_ use cancel_delayed_work_sync() here because we would
deadlock between wait for workqueue and the fact that this workqueue may hold
the dbs_mutex (which we also hold).

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes this problem
go away. However, if the governor is quickly stopped and then started, this
could lead to the following race :

Given that dbs_timer_exit cannot wait for the workqueue handler to exit because
it would otherwise deadlock on the dbs_mutex, making sure we don't reschedule
the work by testing the dbs_enable count seems to work most of the time.
However, there is still a potential problem if the governor is stopped and then
started while a do_dbs_timer workqueue is still active : dbs_enable could be
reenabled and multiple do_dbs_timer workqueue would run.  This is why a
synchronized teardown is required, and also why we need to use a different mutex
for do_dbs_timer so we don't deadlock.

The following patch applies to kernels 2.6.29.1 and 2.6.28.x. The problem was
there before, but it seems to really hurt with deferrable timers, which have
been introduced in the conservative governor only between 2.6.27 and 2.6.28.

Ingo : I would not be surprised if this was the mysterious memory
corruption bug I encountered on my laptop starting with kernel 2.6.28.
Back then, I switched to 2.6.27, waiting for 2.6.29 to fix it. Whenever my
computer switched between the performance and the conservative cpufreq modes,
more than once, it ended up corrupting the timer list. Given I don't switch
between battery and AC very often, and given that this bug is caused by a
preemption race on a UP machine, it explains why it was so hard to reproduce.

This would therefore close (for real this time) bug ID :

http://bugzilla.kernel.org/show_bug.cgi?id=12660

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
---
 drivers/cpufreq/cpufreq_conservative.c |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng.git/drivers/cpufreq/cpufreq_conservative.c
===================================================================
--- linux-2.6-lttng.git.orig/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:18:14.000000000 -0400
+++ linux-2.6-lttng.git/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:53:34.000000000 -0400
@@ -75,12 +75,20 @@
 static unsigned int dbs_enable;	/* number of CPUs using this policy */
 
 /*
+ * dbs_mod_timer_mutex is used to protect the timer manipulation
+ * (creation/removal) so we can use a cancel_delayed_work_sync() to stop the
+ * workqueue without deadlocking with workqueue handler execution.
+ */
+static DEFINE_MUTEX (dbs_mod_timer_mutex);
+
+/*
  * DEADLOCK ALERT! There is a ordering requirement between cpu_hotplug
  * lock and dbs_mutex. cpu_hotplug lock should always be held before
  * dbs_mutex. If any function that can potentially take cpu_hotplug lock
  * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
  * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
  * is recursive for the same process. -Venki
+ * dbs_mutex nests inside dbs_mod_timer_mutex.
  */
 static DEFINE_MUTEX (dbs_mutex);
 static DECLARE_DELAYED_WORK(dbs_work, do_dbs_timer);
@@ -468,7 +476,7 @@
 
 static inline void dbs_timer_exit(void)
 {
-	cancel_delayed_work(&dbs_work);
+	cancel_delayed_work_sync(&dbs_work);
 	return;
 }
 
@@ -490,11 +498,13 @@
 		if (this_dbs_info->enable) /* Already enabled */
 			break;
 
+		mutex_lock(&dbs_mod_timer_mutex);
 		mutex_lock(&dbs_mutex);
 
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
 		if (rc) {
 			mutex_unlock(&dbs_mutex);
+			mutex_unlock(&dbs_mod_timer_mutex);
 			return rc;
 		}
 
@@ -531,16 +541,21 @@
 
 			dbs_tuners_ins.sampling_rate = def_sampling_rate;
 
-			dbs_timer_init();
 			cpufreq_register_notifier(
 					&dbs_cpufreq_notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
 		}
 
 		mutex_unlock(&dbs_mutex);
+
+		if (dbs_enable == 1)
+			dbs_timer_init();
+
+		mutex_unlock(&dbs_mod_timer_mutex);
 		break;
 
 	case CPUFREQ_GOV_STOP:
+		mutex_lock(&dbs_mod_timer_mutex);
 		mutex_lock(&dbs_mutex);
 		this_dbs_info->enable = 0;
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
@@ -549,15 +564,18 @@
 		 * Stop the timerschedule work, when this governor
 		 * is used for first time
 		 */
-		if (dbs_enable == 0) {
-			dbs_timer_exit();
+		if (dbs_enable == 0)
 			cpufreq_unregister_notifier(
 					&dbs_cpufreq_notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
-		}
 
 		mutex_unlock(&dbs_mutex);
 
+		if (dbs_enable == 0)
+			dbs_timer_exit();
+
+		mutex_unlock(&dbs_mod_timer_mutex);
+
 		break;
 
 	case CPUFREQ_GOV_LIMITS:

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2)
  2009-04-23 23:46 ` Andrew Morton
  2009-04-24  4:34   ` [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1) Mathieu Desnoyers
@ 2009-04-24  4:35   ` Mathieu Desnoyers
  2009-04-24  6:18     ` Len Brown
  2009-04-24  4:38   ` [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2) Mathieu Desnoyers
  2009-04-24  6:49   ` [BUG] 2.6.29.1 debugobjects warning Ingo Molnar
  3 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-24  4:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: gregkh, stable, cpufreq, Ingo Molnar, rjw, Ben Slusky,
	Dave Jones, linux-kernel

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to 2.6.30-rc2.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
---
 drivers/cpufreq/cpufreq_conservative.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq_conservative.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:22:15.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:24:38.000000000 -0400
@@ -91,6 +91,9 @@ static unsigned int dbs_enable;	/* numbe
  * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
  * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
  * is recursive for the same process. -Venki
+ * DEADLOCK ALERT! (2) : do_dbs_timer() must not take the dbs_mutex, because it
+ * would deadlock with cancel_delayed_work_sync(), which is needed for proper
+ * raceless workqueue teardown.
  */
 static DEFINE_MUTEX(dbs_mutex);
 
@@ -542,7 +545,7 @@ static inline void dbs_timer_init(struct
 static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
 {
 	dbs_info->enable = 0;
-	cancel_delayed_work(&dbs_info->work);
+	cancel_delayed_work_sync(&dbs_info->work);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2)
  2009-04-23 23:46 ` Andrew Morton
  2009-04-24  4:34   ` [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1) Mathieu Desnoyers
  2009-04-24  4:35   ` [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2) Mathieu Desnoyers
@ 2009-04-24  4:38   ` Mathieu Desnoyers
  2009-04-28 13:00     ` Mathieu Desnoyers
  2009-04-24  6:49   ` [BUG] 2.6.29.1 debugobjects warning Ingo Molnar
  3 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-24  4:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: gregkh, stable, cpufreq, Ingo Molnar, rjw, Ben Slusky,
	Dave Jones, linux-kernel

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
---
 drivers/cpufreq/cpufreq_ondemand.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq_ondemand.c	2009-04-23 23:25:00.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c	2009-04-23 23:25:39.000000000 -0400
@@ -98,6 +98,9 @@ static unsigned int dbs_enable;	/* numbe
  * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
  * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
  * is recursive for the same process. -Venki
+ * DEADLOCK ALERT! (2) : do_dbs_timer() must not take the dbs_mutex, because it
+ * would deadlock with cancel_delayed_work_sync(), which is needed for proper
+ * raceless workqueue teardown.
  */
 static DEFINE_MUTEX(dbs_mutex);
 
@@ -562,7 +565,7 @@ static inline void dbs_timer_init(struct
 static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
 {
 	dbs_info->enable = 0;
-	cancel_delayed_work(&dbs_info->work);
+	cancel_delayed_work_sync(&dbs_info->work);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2)
  2009-04-24  4:35   ` [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2) Mathieu Desnoyers
@ 2009-04-24  6:18     ` Len Brown
  2009-04-24 16:12       ` Mathieu Desnoyers
  0 siblings, 1 reply; 14+ messages in thread
From: Len Brown @ 2009-04-24  6:18 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, gregkh, stable, cpufreq, Ingo Molnar, rjw,
	Ben Slusky, Dave Jones, linux-kernel

Somebody please remind me why we are spending effort to 
maintain the conservative governor instead of deleting it.

thanks,
Len Brown, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] 2.6.29.1 debugobjects warning
  2009-04-23 23:46 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2009-04-24  4:38   ` [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2) Mathieu Desnoyers
@ 2009-04-24  6:49   ` Ingo Molnar
  2009-04-24 15:07     ` Mathieu Desnoyers
  2009-04-27 11:30     ` Thomas Gleixner
  3 siblings, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2009-04-24  6:49 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner
  Cc: Mathieu Desnoyers, gregkh, linux-kernel, cpufreq


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 23 Apr 2009 10:00:02 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > ODEBUG: init active object type: timer_list

> It seems to be complaining that cpufreq_governor_dbs() is running 
> init_timer() against a timer which has already been initialised 
> once.

Not just already initialized - but also active. There's these states 
for an object:

        ODEBUG_STATE_NONE,
        ODEBUG_STATE_INIT,
        ODEBUG_STATE_INACTIVE,
        ODEBUG_STATE_ACTIVE,
        ODEBUG_STATE_DESTROYED,

So the 'init active object type' warning above suggests that an 
init_timer() has been done on an already running timer. If true then 
that is a bad bug - can corrupt timer state, etc.

Thomas, do you agree?

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] 2.6.29.1 debugobjects warning
  2009-04-24  6:49   ` [BUG] 2.6.29.1 debugobjects warning Ingo Molnar
@ 2009-04-24 15:07     ` Mathieu Desnoyers
  2009-04-27 11:30     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-24 15:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Thomas Gleixner, gregkh, linux-kernel, cpufreq

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 23 Apr 2009 10:00:02 -0400
> > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > > ODEBUG: init active object type: timer_list
> 
> > It seems to be complaining that cpufreq_governor_dbs() is running 
> > init_timer() against a timer which has already been initialised 
> > once.
> 
> Not just already initialized - but also active. There's these states 
> for an object:
> 
>         ODEBUG_STATE_NONE,
>         ODEBUG_STATE_INIT,
>         ODEBUG_STATE_INACTIVE,
>         ODEBUG_STATE_ACTIVE,
>         ODEBUG_STATE_DESTROYED,
> 
> So the 'init active object type' warning above suggests that an 
> init_timer() has been done on an already running timer. If true then 
> that is a bad bug - can corrupt timer state, etc.
> 
> Thomas, do you agree?
> 

This would fit with the behavior I've noticed and submitted a patch for
later in this thread. Basically, the incorrect use of
"cancel_delayed_work()" instead of "cancel_delayed_work_sync()" may
leave the timer active if the race explained in the mail below happen.

http://lkml.org/lkml/2009/4/24/19

Maybe we should audit other users of
schedule_delayed_work/cancel_delayed_work that are re-arming themselves
in the workqueue handler without any proper cancel_delayed_work_sync()
nor flush_workqueue() calls on teardown.

Mathieu

> 	Ingo

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2)
  2009-04-24  6:18     ` Len Brown
@ 2009-04-24 16:12       ` Mathieu Desnoyers
  2009-04-26 14:47         ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-24 16:12 UTC (permalink / raw)
  To: Len Brown
  Cc: Andrew Morton, gregkh, stable, cpufreq, Ingo Molnar, rjw,
	Ben Slusky, Dave Jones, linux-kernel

* Len Brown (lenb@kernel.org) wrote:
> Somebody please remind me why we are spending effort to 
> maintain the conservative governor instead of deleting it.
> 

Documentation/cpu-freq/governors.txt

"The CPUfreq governor "conservative", much like the "ondemand"
governor, sets the CPU depending on the current usage.  It differs in
behaviour in that it gracefully increases and decreases the CPU speed
rather than jumping to max speed the moment there is any load on the
CPU.  This behaviour more suitable in a battery powered environment."

So better battery usage seems to be the reason why conservative lives.

Mathieu


> thanks,
> Len Brown, Intel Open Source Technology Center
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2)
  2009-04-24 16:12       ` Mathieu Desnoyers
@ 2009-04-26 14:47         ` Henrique de Moraes Holschuh
  2009-04-26 16:31           ` Mathieu Desnoyers
  0 siblings, 1 reply; 14+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-04-26 14:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Len Brown, Andrew Morton, gregkh, stable, cpufreq, Ingo Molnar,
	rjw, Ben Slusky, Dave Jones, linux-kernel

On Fri, 24 Apr 2009, Mathieu Desnoyers wrote:
> * Len Brown (lenb@kernel.org) wrote:
> > Somebody please remind me why we are spending effort to 
> > maintain the conservative governor instead of deleting it.
> 
> Documentation/cpu-freq/governors.txt
> 
> "The CPUfreq governor "conservative", much like the "ondemand"
> governor, sets the CPU depending on the current usage.  It differs in
> behaviour in that it gracefully increases and decreases the CPU speed
> rather than jumping to max speed the moment there is any load on the
> CPU.  This behaviour more suitable in a battery powered environment."
> 
> So better battery usage seems to be the reason why conservative lives.

Yeah, but the question is: is it really better in practice? race-to-idle
works better with ondemand.  Note: that needs to be answered not just for
the current crop of mobile processors, but also for at least stuff as old as
the Pentium M and Pentium 4 M.

What it _does_ help, is in broken !@#$ hardware that makes a lot of noise
due to "singing capacitors" if you use ondemand (because conservative will
make less noise as it causes more smooth transitions).  NOHZ helped a great
deal there, too.

I don't know if there are battery environments where a harsher work profile
by the CPU are a bad idea.  If there are any, conservative will also help
there.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2)
  2009-04-26 14:47         ` Henrique de Moraes Holschuh
@ 2009-04-26 16:31           ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-26 16:31 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Len Brown, Andrew Morton, gregkh, stable, cpufreq, Ingo Molnar,
	rjw, Ben Slusky, Dave Jones, linux-kernel

* Henrique de Moraes Holschuh (hmh@hmh.eng.br) wrote:
> On Fri, 24 Apr 2009, Mathieu Desnoyers wrote:
> > * Len Brown (lenb@kernel.org) wrote:
> > > Somebody please remind me why we are spending effort to 
> > > maintain the conservative governor instead of deleting it.
> > 
> > Documentation/cpu-freq/governors.txt
> > 
> > "The CPUfreq governor "conservative", much like the "ondemand"
> > governor, sets the CPU depending on the current usage.  It differs in
> > behaviour in that it gracefully increases and decreases the CPU speed
> > rather than jumping to max speed the moment there is any load on the
> > CPU.  This behaviour more suitable in a battery powered environment."
> > 
> > So better battery usage seems to be the reason why conservative lives.
> 
> Yeah, but the question is: is it really better in practice? race-to-idle
> works better with ondemand.  Note: that needs to be answered not just for
> the current crop of mobile processors, but also for at least stuff as old as
> the Pentium M and Pentium 4 M.
> 
> What it _does_ help, is in broken !@#$ hardware that makes a lot of noise
> due to "singing capacitors" if you use ondemand (because conservative will
> make less noise as it causes more smooth transitions).  NOHZ helped a great
> deal there, too.
> 

I effectively have such singing capacitors on my laptop, and I still
have good ears.

> I don't know if there are battery environments where a harsher work profile
> by the CPU are a bad idea.  If there are any, conservative will also help
> there.
> 

Good question. We might also consider that anyway code duplication
between ondemand and conservative is a bad thing. Those look so similar
that part of them should probably be merged, with a sysfs flag and
kernel parameter to select the behavior.

Mathieu


> -- 
>   "One disk to rule them all, One disk to find them. One disk to bring
>   them all and in the darkness grind them. In the Land of Redmond
>   where the shadows lie." -- The Silicon Valley Tarot
>   Henrique Holschuh

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] 2.6.29.1 debugobjects warning
  2009-04-24  6:49   ` [BUG] 2.6.29.1 debugobjects warning Ingo Molnar
  2009-04-24 15:07     ` Mathieu Desnoyers
@ 2009-04-27 11:30     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2009-04-27 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Mathieu Desnoyers, gregkh, linux-kernel, cpufreq

On Fri, 24 Apr 2009, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 23 Apr 2009 10:00:02 -0400
> > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > > ODEBUG: init active object type: timer_list
> 
> > It seems to be complaining that cpufreq_governor_dbs() is running 
> > init_timer() against a timer which has already been initialised 
> > once.
> 
> Not just already initialized - but also active. There's these states 
> for an object:
> 
>         ODEBUG_STATE_NONE,
>         ODEBUG_STATE_INIT,
>         ODEBUG_STATE_INACTIVE,
>         ODEBUG_STATE_ACTIVE,
>         ODEBUG_STATE_DESTROYED,
> 
> So the 'init active object type' warning above suggests that an 
> init_timer() has been done on an already running timer. If true then 
> that is a bad bug - can corrupt timer state, etc.
> 
> Thomas, do you agree?

Yes, that kind of bug makes the timer wheel explode later on due to
list corruption.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1)
  2009-04-24  4:34   ` [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1) Mathieu Desnoyers
@ 2009-04-28 12:57     ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-28 12:57 UTC (permalink / raw)
  To: Andrew Morton, Chris Wright
  Cc: gregkh, stable, cpufreq, Ingo Molnar, rjw, Ben Slusky,
	Dave Jones, linux-kernel

I am re-posting the following patch for -stable, as it fell between the
cracks. Andrew picked up the patch for 2.6.30-rc in his tree, but this
one has been forgotten.

commit 8217e4f4c93e5fb59bb3cd1e6135213889349f86
Author: Ben Slusky <sluskyb@paranoiacs.org>
Date:   Mon Jul 7 13:16:20 2008 -0400

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Thu, 23 Apr 2009 10:00:02 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > Hi,
> > 
> > I got the following warning on my Thinkpad T43p laptop (single-core
> > 32-bits x86). I run the 2.6.29.1 tree, plus LTTng patchset applied. It
> > seems to come from cpufreq. Any idea what is going on here ?
> > 
> > 
> > ------------[ cut here ]------------
> > WARNING: at lib/debugobjects.c:217 debug_print_object+0x5a/0x70()
> > Hardware name: 2687D5U
> > ODEBUG: init active object type: timer_list
> > Modules linked in: irda parport nsc_ircc irtty_sir parport_pc psmouse snd_seq sn
> > d_seq_midi_event hid_logitech ac unix floppy output battery sir_dev nvram snd_ra
> > wmidi pcmcia x_tables ip_tables joydev snd_seq_midi evdev video snd_page_alloc s
> > oundcore led_class i2c_i801 cryptoloop snd snd_seq_dummy rfkill thinkpad_acpi sn
> > d_seq_oss snd_mixer_oss loop ipw2200 blowfish aes_i586 snd_pcm_oss ac97_bus snd_
> > seq_device agpgart snd_pcm ide_cd_mod snd_timer intel_agp snd_ac97_codec ide_gen
> > eric button snd_intel8x0 edd acpi_cpufreq ltt_statedump ipc_trace usbhid thermal
> >  mm_trace snd_intel8x0m fs_trace pcmcia_core rcu_trace lib80211 syscall_trace rs
> > rc_nonstatic libphy serio_raw kernel_trace tg3 trap_trace libipw crc_ccitt net_t
> > race dm_mod dm_log dm_region_hash dm_mirror yenta_socket dm_snapshot fat vfat nl
> > s_cp437 nls_iso8859_1 lp ppdev af_packet drm ntfs ipv6 auth_rpcgss binfmt_misc r
> > adeon lockd sunrpc nfs
> > Pid: 3628, comm: cpufreqd Not tainted 2.6.29.1-trace #28
> > Call Trace:
> >  [<c1044123>] warn_slowpath+0x73/0xd0
> >  [<c1069c28>] ? mark_held_locks+0x48/0x90
> >  [<c13313f5>] ? __mutex_unlock_slowpath+0xd5/0x150
> >  [<c1069f39>] ? trace_hardirqs_on_caller+0x199/0x1f0
> >  [<c1331478>] ? mutex_unlock+0x8/0x10
> >  [<c1331478>] ? mutex_unlock+0x8/0x10
> >  [<c1107b76>] ? sysfs_addrm_finish+0x16/0x230
> >  [<c1107671>] ? sysfs_find_dirent+0x21/0x30
> >  [<c116d71a>] debug_print_object+0x5a/0x70
> >  [<c116e054>] __debug_object_init+0x254/0x340
> >  [<c126533f>] ? cpufreq_governor_dbs+0x10f/0x210
> >  [<c116e187>] debug_object_init+0x17/0x20
> >  [<c104de70>] init_timer+0x10/0x30
> >  [<c104de9b>] init_timer_deferrable+0xb/0x20
> >  [<c12653fd>] cpufreq_governor_dbs+0x1cd/0x210
> >  [<c126205b>] __cpufreq_governor+0xab/0x120
> >  [<c12621cb>] __cpufreq_set_policy+0xfb/0x140
> >  [<c1262c24>] store_scaling_governor+0xa4/0x220
> >  [<c1263550>] ? handle_update+0x0/0x10
> >  [<c1262b80>] ? store_scaling_governor+0x0/0x220
> >  [<c126343e>] store+0x4e/0x70
> >  [<c1106b9c>] sysfs_write_file+0x9c/0x100
> >  [<c10b825c>] vfs_write+0x9c/0x140
> >  [<c1106b00>] ? sysfs_write_file+0x0/0x100
> >  [<c10b8447>] sys_write+0x47/0xe0
> >  [<c1021dde>] syscall_call+0x7/0xb
> >  [<c1020000>] ? sys_vfork+0x20/0x30
> > 
> 
> It seems to be complaining that cpufreq_governor_dbs() is running
> init_timer() against a timer which has already been initialised once.
> 

In cpufreq_conservative.c, we have :

static inline void dbs_timer_init(void)
{
        init_timer_deferrable(&dbs_work.timer);
        schedule_delayed_work(&dbs_work,
                        usecs_to_jiffies(dbs_tuners_ins.sampling_rate));
        return;
}

static inline void dbs_timer_exit(void)
{
        cancel_delayed_work(&dbs_work);
        return;
}

Called respectively upon CPUFREQ_GOV_START and CPUFREQ_GOV_STOP.

commit 8217e4f4c93e5fb59bb3cd1e6135213889349f86
adds init_timer_deferrable(&dbs_work.timer);

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit. Therefore, the work can reschedule itself if the
correct race happens (do_dbs_timer() executing before holding the mutex, work
cancelled, do_dbs_timer() scheduled again, work rescheduled).

This leads to a timer list corruption when the conservative governor is
restarted, because the timer is still active.

However, we _cannot_ use cancel_delayed_work_sync() here because we would
deadlock between wait for workqueue and the fact that this workqueue may hold
the dbs_mutex (which we also hold).

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes this problem
go away. However, if the governor is quickly stopped and then started, this
could lead to the following race :

Given that dbs_timer_exit cannot wait for the workqueue handler to exit because
it would otherwise deadlock on the dbs_mutex, making sure we don't reschedule
the work by testing the dbs_enable count seems to work most of the time.
However, there is still a potential problem if the governor is stopped and then
started while a do_dbs_timer workqueue is still active : dbs_enable could be
reenabled and multiple do_dbs_timer workqueue would run.  This is why a
synchronized teardown is required, and also why we need to use a different mutex
for do_dbs_timer so we don't deadlock.

The following patch applies to kernels 2.6.29.1 and 2.6.28.x. The problem was
there before, but it seems to really hurt with deferrable timers, which have
been introduced in the conservative governor only between 2.6.27 and 2.6.28.

Ingo : I would not be surprised if this was the mysterious memory
corruption bug I encountered on my laptop starting with kernel 2.6.28.
Back then, I switched to 2.6.27, waiting for 2.6.29 to fix it. Whenever my
computer switched between the performance and the conservative cpufreq modes,
more than once, it ended up corrupting the timer list. Given I don't switch
between battery and AC very often, and given that this bug is caused by a
preemption race on a UP machine, it explains why it was so hard to reproduce.

This would therefore close (for real this time) bug ID :

http://bugzilla.kernel.org/show_bug.cgi?id=12660

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
CC: Chris Wright <chrisw@sous-sol.org>
---
 drivers/cpufreq/cpufreq_conservative.c |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng.git/drivers/cpufreq/cpufreq_conservative.c
===================================================================
--- linux-2.6-lttng.git.orig/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:18:14.000000000 -0400
+++ linux-2.6-lttng.git/drivers/cpufreq/cpufreq_conservative.c	2009-04-23 23:53:34.000000000 -0400
@@ -75,12 +75,20 @@
 static unsigned int dbs_enable;	/* number of CPUs using this policy */
 
 /*
+ * dbs_mod_timer_mutex is used to protect the timer manipulation
+ * (creation/removal) so we can use a cancel_delayed_work_sync() to stop the
+ * workqueue without deadlocking with workqueue handler execution.
+ */
+static DEFINE_MUTEX (dbs_mod_timer_mutex);
+
+/*
  * DEADLOCK ALERT! There is a ordering requirement between cpu_hotplug
  * lock and dbs_mutex. cpu_hotplug lock should always be held before
  * dbs_mutex. If any function that can potentially take cpu_hotplug lock
  * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
  * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
  * is recursive for the same process. -Venki
+ * dbs_mutex nests inside dbs_mod_timer_mutex.
  */
 static DEFINE_MUTEX (dbs_mutex);
 static DECLARE_DELAYED_WORK(dbs_work, do_dbs_timer);
@@ -468,7 +476,7 @@
 
 static inline void dbs_timer_exit(void)
 {
-	cancel_delayed_work(&dbs_work);
+	cancel_delayed_work_sync(&dbs_work);
 	return;
 }
 
@@ -490,11 +498,13 @@
 		if (this_dbs_info->enable) /* Already enabled */
 			break;
 
+		mutex_lock(&dbs_mod_timer_mutex);
 		mutex_lock(&dbs_mutex);
 
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
 		if (rc) {
 			mutex_unlock(&dbs_mutex);
+			mutex_unlock(&dbs_mod_timer_mutex);
 			return rc;
 		}
 
@@ -531,16 +541,21 @@
 
 			dbs_tuners_ins.sampling_rate = def_sampling_rate;
 
-			dbs_timer_init();
 			cpufreq_register_notifier(
 					&dbs_cpufreq_notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
 		}
 
 		mutex_unlock(&dbs_mutex);
+
+		if (dbs_enable == 1)
+			dbs_timer_init();
+
+		mutex_unlock(&dbs_mod_timer_mutex);
 		break;
 
 	case CPUFREQ_GOV_STOP:
+		mutex_lock(&dbs_mod_timer_mutex);
 		mutex_lock(&dbs_mutex);
 		this_dbs_info->enable = 0;
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
@@ -549,15 +564,18 @@
 		 * Stop the timerschedule work, when this governor
 		 * is used for first time
 		 */
-		if (dbs_enable == 0) {
-			dbs_timer_exit();
+		if (dbs_enable == 0)
 			cpufreq_unregister_notifier(
 					&dbs_cpufreq_notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
-		}
 
 		mutex_unlock(&dbs_mutex);
 
+		if (dbs_enable == 0)
+			dbs_timer_exit();
+
+		mutex_unlock(&dbs_mod_timer_mutex);
+
 		break;
 
 	case CPUFREQ_GOV_LIMITS:

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2)
  2009-04-24  4:38   ` [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2) Mathieu Desnoyers
@ 2009-04-28 13:00     ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2009-04-28 13:00 UTC (permalink / raw)
  To: Andrew Morton, Chris Wright
  Cc: gregkh, stable, cpufreq, Ingo Molnar, rjw, Ben Slusky,
	Dave Jones, linux-kernel

(this patch too has been forgotten. It fixes the same issue in ondemand
as we find in the very very similar (read : cut and pasted) conservative
cpufreq code). It applies to mainline head and -stable kernels.

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not "seem" to be affected (read : race
condition occurs very rarely) because the "if (!dbs_info->enable)" check
at the beginning of the workqueue handler returns immediately without
rescheduling the work. The conservative governor in 2.6.30-rc has the
same check as the ondemand governor, which makes things usually run
smoothly. However, if the governor is quickly stopped and then started,
this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
CC: Chris Wright <chrisw@sous-sol.org>
---
 drivers/cpufreq/cpufreq_ondemand.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq_ondemand.c	2009-04-23 23:25:00.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c	2009-04-23 23:25:39.000000000 -0400
@@ -98,6 +98,9 @@ static unsigned int dbs_enable;	/* numbe
  * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
  * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
  * is recursive for the same process. -Venki
+ * DEADLOCK ALERT! (2) : do_dbs_timer() must not take the dbs_mutex, because it
+ * would deadlock with cancel_delayed_work_sync(), which is needed for proper
+ * raceless workqueue teardown.
  */
 static DEFINE_MUTEX(dbs_mutex);
 
@@ -562,7 +565,7 @@ static inline void dbs_timer_init(struct
 static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
 {
 	dbs_info->enable = 0;
-	cancel_delayed_work(&dbs_info->work);
+	cancel_delayed_work_sync(&dbs_info->work);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-28 13:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23 14:00 [BUG] 2.6.29.1 debugobjects warning Mathieu Desnoyers
2009-04-23 23:46 ` Andrew Morton
2009-04-24  4:34   ` [PATCH -stable] cpufreq fix timer teardown in conservative governor (2.6.28.x, 2.6.29.1) Mathieu Desnoyers
2009-04-28 12:57     ` Mathieu Desnoyers
2009-04-24  4:35   ` [PATCH] cpufreq fix timer teardown in conservative governor (2.6.30-rc2) Mathieu Desnoyers
2009-04-24  6:18     ` Len Brown
2009-04-24 16:12       ` Mathieu Desnoyers
2009-04-26 14:47         ` Henrique de Moraes Holschuh
2009-04-26 16:31           ` Mathieu Desnoyers
2009-04-24  4:38   ` [PATCH] cpufreq fix timer teardown in ondemand governor (2.6.28.x, 2.6.29.1, 2.6.30-rc2) Mathieu Desnoyers
2009-04-28 13:00     ` Mathieu Desnoyers
2009-04-24  6:49   ` [BUG] 2.6.29.1 debugobjects warning Ingo Molnar
2009-04-24 15:07     ` Mathieu Desnoyers
2009-04-27 11:30     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).