linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [announce] "kill the Big Kernel Lock (BKL)" tree
@ 2008-05-14 17:49 Ingo Molnar
  2008-05-14 18:30 ` Andi Kleen
                   ` (5 more replies)
  0 siblings, 6 replies; 78+ messages in thread
From: Ingo Molnar @ 2008-05-14 17:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro


As some of the latency junkies on lkml already know it, commit 8e3e076 
("BKL: revert back to the old spinlock implementation") in v2.6.26-rc2 
removed the preemptible BKL feature and made the Big Kernel Lock a 
spinlock and thus turned it into non-preemptible code again. This commit 
returned the BKL code to the 2.6.7 state of affairs in essence.

Linus also indicated that pretty much the only acceptable way to change 
this (to us -rt folks rather unfortunate) latency source and to get rid 
of this non-preemptible locking complication is to remove the BKL.

This task is not easy at all. 12 years after Linux has been converted to 
an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ 
lock_kernel() critical sections and 800+ ioctls. They are spread out 
across rather difficult areas of often legacy code that few people 
understand and few people dare to touch.

It takes top people like Alan Cox to map the semantics and to remove BKL 
code, and even for Alan (who is doing this for the TTY code) it is a 
long and difficult task.

According to my quick & dirty git-log analysis, at the current pace of 
BKL removal we'd have to wait more than 10 years to remove most BKL 
critical sections from the kernel and to get acceptable latencies again.
 
The biggest technical complication is that the BKL is unlike any other 
lock: it "self-releases" when schedule() is called. This makes the BKL 
spinlock very "sticky", "invisible" and viral: it's very easy to add it 
to a piece of code (even unknowingly) and you never really know whether 
it's held or not. PREEMPT_BKL made it even more invisible, because it 
made its effects even less visible to ordinary users.

Furthermore, the BKL is not covered by lockdep, so its dependencies are 
largely unknown and invisible, and it is all lost in the haze of the 
past ~15 years of code changes. All this has built up to a kind of Fear, 
Uncertainty and Doubt about the BKL: nobody really knows it, nobody 
really dares to touch it and code can break silently and subtly if BKL 
locking is wrong.

So with these current rules of the game we cannot realistically fix this 
amount of BKL code in the kernel. People wont just be able to change 
1300 very difficult and fragile legacy codepaths in the kernel 
overnight, just to improve the latencies of the kernel.

So ... because i find a 10+ year wait rather unacceptable, here is a 
different attempt: lets try and change the rules of the game :-)

The technical goal is to make BKL removal much more easy and much more 
natural - to make the BKL more visible and to remove its FUD component.

To achieve those goals i've created and uploaded the "kill-the-BKL" 
prototype branch to the -tip tree, which branch consists of 19 various 
commits at the moment:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git kill-the-BKL

This branch (against latest -git) implements the biggest (and by far 
most critical) core kernel changes towards fast BKL elimination:

 - it fixes all "the BKL auto-releases on schedule()" assumptions i 
   could trigger on my testboxes.

 - it adds a handful of debug facilities to warn about common BKL 
   assumptions that are not valid anymore under the new locking model

 - it turns the BKL into an ordinary mutex and removes all 
   "auto-release" BKL legacy code from the scheduler.

 - it thus adds lockdep support to the BKL

 - it activates the BKL on UP && !PREEMPT too - this makes the code 
   simpler and more universal and hopefully activates more people to get 
   rid of the BKL.

 - makes BKL sections again preemptible

 - ... simplifies the BKL code greatly, and moves it out of the core 
   kernel

In other words: the kill-the-BKL tree turns the BKL into an ordinary 
albeit somewhat big mutex, with a quirky lock/unlock interface called 
"lock_kernel()" and "unlock_kernel()".

Certainly the most interesting commit to check is aa3187000:

   "remove the BKL: remove it from the core kernel!".

Once this tree stabilizes, elimination of the BKL can be done the usual 
and well-known way of eliminating big locks: by pushing it down into 
subsystems and replacing it with subsystem locks, and splitting those 
locks and eliminating them. We've done this countless times in the past 
and there are lots of capable developers who can attack such problems.

In the future we might also want to try to eliminate the self-recursion 
(nested locking) feature of the BKL - this would make BKL code even more 
apparent.

Shortlog, diffstat and patches can be found below. I've build and boot 
tested it on 32-bit and 64-bit x86.

NOTE: the code is highly experimental - it is recommended to try this 
with PROVE_LOCKING and SOFTLOCKUP_DEBUG enabled. If you trigger a 
lockdep warning and a softlockup warning, please report it.

Linus, Alan: the increased visibility and debuggability of the BKL 
already uncovered a rather serious regression in upstream -git. You 
might want to cherry pick this single fix, it will apply just fine to 
current -git:

| commit d70785165e2ef13df53d7b365013aaf9c8b4444d
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Wed May 14 17:11:46 2008 +0200
|
|     tty: fix BKL related leak and crash

This bug might explain a so far undebugged atomic-scheduling crash i saw 
in overnight randconfig boot testing. I tried to keep the fix minimal 
and safe. (although it might make sense to refactor the opost() code to 
have a single exit site in the future)

Bugreports, comments and any other feedback is more than welcome,

	Ingo

------------>
Ingo Molnar (19):
      revert ("BKL: revert back to the old spinlock implementation")
      remove the BKL: change get_fs_type() BKL dependency
      remove the BKL: reduce BKL locking during bootup
      remove the BKL: restruct ->bd_mutex and BKL dependency
      remove the BKL: change ext3 BKL assumption
      remove the BKL: reduce misc_open() BKL dependency
      remove the BKL: remove "BKL auto-drop" assumption from vt_waitactive()
      remove the BKL: remove it from the core kernel!
      softlockup helper: print BKL owner
      remove the BKL: flush_workqueue() debug helper & fix
      remove the BKL: tty updates
      remove the BKL: lockdep self-test fix
      remove the BKL: request_module() debug helper
      remove the BKL: procfs debug helper and BKL elimination
      remove the BKL: do not take the BKL in init code
      remove the BKL: restructure NFS code
      tty: fix BKL related leak and crash
      remove the BKL: fix UP build
      remove the BKL: use the BKL mutex on !SMP too

 arch/mn10300/Kconfig     |   11 ++++
 drivers/char/misc.c      |    8 +++
 drivers/char/n_tty.c     |   13 +++-
 drivers/char/tty_io.c    |   14 ++++-
 drivers/char/vt_ioctl.c  |    8 +++
 fs/block_dev.c           |    4 +-
 fs/ext3/super.c          |    4 -
 fs/filesystems.c         |   12 ++++
 fs/proc/generic.c        |   12 ++--
 fs/proc/inode.c          |    3 -
 fs/proc/root.c           |    9 +--
 include/linux/hardirq.h  |   18 +++---
 include/linux/smp_lock.h |   36 ++---------
 init/Kconfig             |    5 --
 init/main.c              |    7 +-
 kernel/fork.c            |    4 +
 kernel/kmod.c            |   22 +++++++
 kernel/sched.c           |   16 +-----
 kernel/softlockup.c      |    3 +
 kernel/workqueue.c       |   13 ++++
 lib/Makefile             |    4 +-
 lib/kernel_lock.c        |  142 +++++++++++++---------------------------------
 net/sunrpc/sched.c       |    6 ++
 23 files changed, 180 insertions(+), 194 deletions(-)

commit aa3187000a86db1faaa7fb5069b1422046c6d265
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 18:14:51 2008 +0200

    remove the BKL: use the BKL mutex on !SMP too
    
    we need as much help with removing the BKL as we can: use the BKL
    mutex on UP && !PREEMPT too.
    
    This simplifies the code, gets us lockdep reports, animates UP
    developers to get rid of this overhead, etc., etc.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/smp_lock.h b/include/linux/smp_lock.h
index c5269fe..48b92dd 100644
--- a/include/linux/smp_lock.h
+++ b/include/linux/smp_lock.h
@@ -2,9 +2,7 @@
 #define __LINUX_SMPLOCK_H
 
 #include <linux/compiler.h>
-
-#ifdef CONFIG_LOCK_KERNEL
-# include <linux/sched.h>
+#include <linux/sched.h>
 
 extern void __lockfunc lock_kernel(void)	__acquires(kernel_lock);
 extern void __lockfunc unlock_kernel(void)	__releases(kernel_lock);
@@ -16,10 +14,4 @@ static inline int kernel_locked(void)
 
 extern void debug_print_bkl(void);
 
-#else
-static inline void lock_kernel(void)		__acquires(kernel_lock) { }
-static inline void unlock_kernel(void)		__releases(kernel_lock) { }
-static inline int  kernel_locked(void)		{ return 1; }
-static inline void debug_print_bkl(void)	{ }
-#endif
 #endif
diff --git a/init/Kconfig b/init/Kconfig
index 6135d07..7527c6e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -56,11 +56,6 @@ config BROKEN_ON_SMP
 	depends on BROKEN || !SMP
 	default y
 
-config LOCK_KERNEL
-	bool
-	depends on SMP || PREEMPT
-	default y
-
 config INIT_ENV_ARG_LIMIT
 	int
 	default 32 if !UML
diff --git a/lib/Makefile b/lib/Makefile
index 74b0cfb..d1c81fa 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -14,7 +14,8 @@ lib-$(CONFIG_SMP) += cpumask.o
 lib-y	+= kobject.o kref.o klist.o
 
 obj-y += div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
-	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o
+	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
+	 kernel_lock.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
@@ -32,7 +33,6 @@ lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_GENERIC_FIND_FIRST_BIT) += find_next_bit.o
 lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
-obj-$(CONFIG_LOCK_KERNEL) += kernel_lock.o
 obj-$(CONFIG_PLIST) += plist.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
 obj-$(CONFIG_DEBUG_LIST) += list_debug.o

commit d46328b4f115a24d0745d47e3c79657289f5b297
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 18:12:09 2008 +0200

    remove the BKL: fix UP build
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/smp_lock.h b/include/linux/smp_lock.h
index c318a60..c5269fe 100644
--- a/include/linux/smp_lock.h
+++ b/include/linux/smp_lock.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_SMPLOCK_H
 #define __LINUX_SMPLOCK_H
 
+#include <linux/compiler.h>
+
 #ifdef CONFIG_LOCK_KERNEL
 # include <linux/sched.h>
 
@@ -15,9 +17,9 @@ static inline int kernel_locked(void)
 extern void debug_print_bkl(void);
 
 #else
-static inline lock_kernel(void)			__acquires(kernel_lock) { }
+static inline void lock_kernel(void)		__acquires(kernel_lock) { }
 static inline void unlock_kernel(void)		__releases(kernel_lock) { }
-static inline int kernel_locked(void)		{ return 1; }
+static inline int  kernel_locked(void)		{ return 1; }
 static inline void debug_print_bkl(void)	{ }
 #endif
 #endif

commit d70785165e2ef13df53d7b365013aaf9c8b4444d
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 17:11:46 2008 +0200

    tty: fix BKL related leak and crash
    
    enabling the BKL to be lockdep tracked uncovered the following
    upstream kernel bug in the tty code, which caused a BKL
    reference leak:
    
      ================================================
      [ BUG: lock held when returning to user space! ]
      ------------------------------------------------
      dmesg/3121 is leaving the kernel with locks still held!
      1 lock held by dmesg/3121:
       #0:  (kernel_mutex){--..}, at: [<c02f34d9>] opost+0x24/0x194
    
    this might explain some of the atomicity warnings and crashes
    that -tip tree testing has been experiencing since the BKL
    was converted back to a spinlock.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 19105ec..8096389 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -282,16 +282,20 @@ static int opost(unsigned char c, struct tty_struct *tty)
 			if (O_ONLRET(tty))
 				tty->column = 0;
 			if (O_ONLCR(tty)) {
-				if (space < 2)
+				if (space < 2) {
+					unlock_kernel();
 					return -1;
+				}
 				tty_put_char(tty, '\r');
 				tty->column = 0;
 			}
 			tty->canon_column = tty->column;
 			break;
 		case '\r':
-			if (O_ONOCR(tty) && tty->column == 0)
+			if (O_ONOCR(tty) && tty->column == 0) {
+				unlock_kernel();
 				return 0;
+			}
 			if (O_OCRNL(tty)) {
 				c = '\n';
 				if (O_ONLRET(tty))
@@ -303,10 +307,13 @@ static int opost(unsigned char c, struct tty_struct *tty)
 		case '\t':
 			spaces = 8 - (tty->column & 7);
 			if (O_TABDLY(tty) == XTABS) {
-				if (space < spaces)
+				if (space < spaces) {
+					unlock_kernel();
 					return -1;
+				}
 				tty->column += spaces;
 				tty->ops->write(tty, "        ", spaces);
+				unlock_kernel();
 				return 0;
 			}
 			tty->column += spaces;

commit 352e0d25def53e6b36234e4dc2083ca7f5d712a9
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 17:31:41 2008 +0200

    remove the BKL: restructure NFS code
    
    the naked schedule() in rpc_wait_bit_killable() caused the BKL to
    be auto-dropped in the past.
    
    avoid the immediate hang in such code. Note that this still leaves
    some other locking dependencies to be sorted out in the NFS code.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 6eab9bf..e12e571 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
 
 static int rpc_wait_bit_killable(void *word)
 {
+	int bkl = kernel_locked();
+
 	if (fatal_signal_pending(current))
 		return -ERESTARTSYS;
+	if (bkl)
+		unlock_kernel();
 	schedule();
+	if (bkl)
+		lock_kernel();
 	return 0;
 }
 

commit 89c25297465376321cf54438d86441a5947bbd11
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 15:10:37 2008 +0200

    remove the BKL: do not take the BKL in init code
    
    this doesnt want to run under the BKL:
    
    ------------[ cut here ]------------
    WARNING: at fs/proc/generic.c:669 create_proc_entry+0x33/0xb9()
    Modules linked in:
    Pid: 0, comm: swapper Not tainted 2.6.26-rc2-sched-devel.git #475
     [<c013d2ed>] warn_on_slowpath+0x41/0x6d
     [<c0158530>] ? mark_held_locks+0x4e/0x66
     [<c01586e7>] ? trace_hardirqs_on+0xb/0xd
     [<c01586a7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c01586e7>] ? trace_hardirqs_on+0xb/0xd
     [<c0158530>] ? mark_held_locks+0x4e/0x66
     [<c01586e7>] ? trace_hardirqs_on+0xb/0xd
     [<c01586a7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c01586e7>] ? trace_hardirqs_on+0xb/0xd
     [<c017870f>] ? free_hot_cold_page+0x178/0x1b1
     [<c0178787>] ? free_hot_page+0xa/0xc
     [<c01787ae>] ? __free_pages+0x25/0x30
     [<c01787e2>] ? free_pages+0x29/0x2b
     [<c01c87b2>] create_proc_entry+0x33/0xb9
     [<c01c9de2>] ? loadavg_read_proc+0x0/0xdc
     [<c06f22f8>] proc_misc_init+0x1c/0x25e
     [<c06f2226>] proc_root_init+0x4a/0x97
     [<c06db853>] start_kernel+0x2c4/0x2ec
     [<c06db008>] __init_begin+0x8/0xa
    
    early init code. perhaps safe. needs more tea ...
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/init/main.c b/init/main.c
index c97d36c..e293de0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -668,6 +668,7 @@ asmlinkage void __init start_kernel(void)
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
+	unlock_kernel();
 #ifdef CONFIG_PROC_FS
 	proc_root_init();
 #endif
@@ -677,7 +678,6 @@ asmlinkage void __init start_kernel(void)
 	delayacct_init();
 
 	check_bugs();
-	unlock_kernel();
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 

commit 5fff2843de609b77d4590e87de5c976b8ac1aacd
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 14:30:33 2008 +0200

    remove the BKL: procfs debug helper and BKL elimination
    
    Add checks for the BKL in create_proc_entry() and proc_create_data().
    
    The functions, if called from the BKL, show that the calling site
    might have a dependency on the procfs code previously using the BKL
    in the dir-entry manipulation functions.
    
    With these warnings in place it is safe to remove the dir-entry BKL
    locking from fs/procfs/.
    
    This untangles the following BKL dependency:
    
    ------------->
    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.26-rc2-sched-devel.git #468
    -------------------------------------------------------
    mount/679 is trying to acquire lock:
     (&type->i_mutex_dir_key#3){--..}, at: [<c019a111>] do_lookup+0x72/0x146
    
    but task is already holding lock:
     (kernel_mutex){--..}, at: [<c04ae4c3>] lock_kernel+0x1e/0x25
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (kernel_mutex){--..}:
           [<c01593e9>] __lock_acquire+0x97d/0xae6
           [<c01598be>] lock_acquire+0x4e/0x6c
           [<c04acd18>] mutex_lock_nested+0xc2/0x22a
           [<c04ae4c3>] lock_kernel+0x1e/0x25
           [<c01c84e1>] proc_lookup_de+0x15/0xbf
           [<c01c8818>] proc_lookup+0x12/0x16
           [<c01c4dc4>] proc_root_lookup+0x11/0x2b
           [<c019a148>] do_lookup+0xa9/0x146
           [<c019bd64>] __link_path_walk+0x77a/0xb7a
           [<c019c1b0>] path_walk+0x4c/0x9b
           [<c019c4b9>] do_path_lookup+0x134/0x19a
           [<c019ce95>] __path_lookup_intent_open+0x42/0x74
           [<c019cf20>] path_lookup_open+0x10/0x12
           [<c019d184>] do_filp_open+0x9d/0x695
           [<c0192061>] do_sys_open+0x40/0xb6
           [<c0192119>] sys_open+0x1e/0x26
           [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
           [<ffffffff>] 0xffffffff
    
    -> #0 (&type->i_mutex_dir_key#3){--..}:
           [<c0159310>] __lock_acquire+0x8a4/0xae6
           [<c01598be>] lock_acquire+0x4e/0x6c
           [<c04acd18>] mutex_lock_nested+0xc2/0x22a
           [<c019a111>] do_lookup+0x72/0x146
           [<c019b8b9>] __link_path_walk+0x2cf/0xb7a
           [<c019c1b0>] path_walk+0x4c/0x9b
           [<c019c4b9>] do_path_lookup+0x134/0x19a
           [<c019cf34>] path_lookup+0x12/0x14
           [<c01a873e>] do_mount+0xe7/0x1b5
           [<c01a8870>] sys_mount+0x64/0x9b
           [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
           [<ffffffff>] 0xffffffff
    
    other info that might help us debug this:
    
    1 lock held by mount/679:
     #0:  (kernel_mutex){--..}, at: [<c04ae4c3>] lock_kernel+0x1e/0x25
    
    stack backtrace:
    Pid: 679, comm: mount Not tainted 2.6.26-rc2-sched-devel.git #468
     [<c0157adb>] print_circular_bug_tail+0x5b/0x66
     [<c0157f5c>] ? print_circular_bug_header+0xa6/0xb1
     [<c0159310>] __lock_acquire+0x8a4/0xae6
     [<c01598be>] lock_acquire+0x4e/0x6c
     [<c019a111>] ? do_lookup+0x72/0x146
     [<c04acd18>] mutex_lock_nested+0xc2/0x22a
     [<c019a111>] ? do_lookup+0x72/0x146
     [<c019a111>] ? do_lookup+0x72/0x146
     [<c019a111>] do_lookup+0x72/0x146
     [<c019b8b9>] __link_path_walk+0x2cf/0xb7a
     [<c019c1b0>] path_walk+0x4c/0x9b
     [<c019c4b9>] do_path_lookup+0x134/0x19a
     [<c019cf34>] path_lookup+0x12/0x14
     [<c01a873e>] do_mount+0xe7/0x1b5
     [<c01586cb>] ? trace_hardirqs_on+0xb/0xd
     [<c015868b>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c04ace78>] ? mutex_lock_nested+0x222/0x22a
     [<c04ae4c3>] ? lock_kernel+0x1e/0x25
     [<c01a8870>] sys_mount+0x64/0x9b
     [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
     =======================
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 43e54e8..6f68278 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -381,7 +381,6 @@ struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *dir,
 	struct inode *inode = NULL;
 	int error = -ENOENT;
 
-	lock_kernel();
 	spin_lock(&proc_subdir_lock);
 	for (de = de->subdir; de ; de = de->next) {
 		if (de->namelen != dentry->d_name.len)
@@ -399,7 +398,6 @@ struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *dir,
 	}
 	spin_unlock(&proc_subdir_lock);
 out_unlock:
-	unlock_kernel();
 
 	if (inode) {
 		dentry->d_op = &proc_dentry_operations;
@@ -434,8 +432,6 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *filp, void *dirent,
 	struct inode *inode = filp->f_path.dentry->d_inode;
 	int ret = 0;
 
-	lock_kernel();
-
 	ino = inode->i_ino;
 	i = filp->f_pos;
 	switch (i) {
@@ -489,8 +485,8 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *filp, void *dirent,
 			spin_unlock(&proc_subdir_lock);
 	}
 	ret = 1;
-out:	unlock_kernel();
-	return ret;	
+out:
+	return ret;
 }
 
 int proc_readdir(struct file *filp, void *dirent, filldir_t filldir)
@@ -670,6 +666,8 @@ struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode,
 	struct proc_dir_entry *ent;
 	nlink_t nlink;
 
+	WARN_ON_ONCE(kernel_locked());
+
 	if (S_ISDIR(mode)) {
 		if ((mode & S_IALLUGO) == 0)
 			mode |= S_IRUGO | S_IXUGO;
@@ -700,6 +698,8 @@ struct proc_dir_entry *proc_create_data(const char *name, mode_t mode,
 	struct proc_dir_entry *pde;
 	nlink_t nlink;
 
+	WARN_ON_ONCE(kernel_locked());
+
 	if (S_ISDIR(mode)) {
 		if ((mode & S_IALLUGO) == 0)
 			mode |= S_IRUGO | S_IXUGO;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 6f4e8dc..2f1ed52 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -34,16 +34,13 @@ struct proc_dir_entry *de_get(struct proc_dir_entry *de)
  */
 void de_put(struct proc_dir_entry *de)
 {
-	lock_kernel();
 	if (!atomic_read(&de->count)) {
 		printk("de_put: entry %s already free!\n", de->name);
-		unlock_kernel();
 		return;
 	}
 
 	if (atomic_dec_and_test(&de->count))
 		free_proc_entry(de);
-	unlock_kernel();
 }
 
 /*
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 9511753..c48c76a 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -162,17 +162,14 @@ static int proc_root_readdir(struct file * filp,
 	unsigned int nr = filp->f_pos;
 	int ret;
 
-	lock_kernel();
-
 	if (nr < FIRST_PROCESS_ENTRY) {
 		int error = proc_readdir(filp, dirent, filldir);
-		if (error <= 0) {
-			unlock_kernel();
+
+		if (error <= 0)
 			return error;
-		}
+
 		filp->f_pos = FIRST_PROCESS_ENTRY;
 	}
-	unlock_kernel();
 
 	ret = proc_pid_readdir(filp, dirent, filldir);
 	return ret;

commit b07e615cf0f731d53a3ab431f44b1fe6ef4576e6
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 14:19:52 2008 +0200

    remove the BKL: request_module() debug helper
    
    usermodehelper blocks waiting for modprobe. We cannot do that with
    the BKL held. Also emit a (one time) warning about callsites that
    do this.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 8df97d3..6c42cdf 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -36,6 +36,8 @@
 #include <linux/resource.h>
 #include <linux/notifier.h>
 #include <linux/suspend.h>
+#include <linux/smp_lock.h>
+
 #include <asm/uaccess.h>
 
 extern int max_threads;
@@ -77,6 +79,7 @@ int request_module(const char *fmt, ...)
 	static atomic_t kmod_concurrent = ATOMIC_INIT(0);
 #define MAX_KMOD_CONCURRENT 50	/* Completely arbitrary value - KAO */
 	static int kmod_loop_msg;
+	int bkl = kernel_locked();
 
 	va_start(args, fmt);
 	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
@@ -108,8 +111,27 @@ int request_module(const char *fmt, ...)
 		return -ENOMEM;
 	}
 
+	/*
+	 * usermodehelper blocks waiting for modprobe. We cannot
+	 * do that with the BKL held. Also emit a (one time)
+	 * warning about callsites that do this:
+	 */
+	if (bkl) {
+		if (debug_locks) {
+			WARN_ON_ONCE(1);
+			debug_show_held_locks(current);
+			debug_locks_off();
+		}
+		unlock_kernel();
+	}
+
 	ret = call_usermodehelper(modprobe_path, argv, envp, 1);
+
 	atomic_dec(&kmod_concurrent);
+
+	if (bkl)
+		lock_kernel();
+
 	return ret;
 }
 EXPORT_SYMBOL(request_module);

commit b1f6383484b0ad7b57e451ea638ec774204a7ced
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 13:51:40 2008 +0200

    remove the BKL: lockdep self-test fix
    
    the lockdep self-tests reinitialize the held locks context, so
    make sure we call it with no lock held. Move the first lock_kernel()
    later into the bootup - we are still the only task around so there's
    no serialization issues.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/init/main.c b/init/main.c
index 8d3b879..c97d36c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -554,7 +554,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-	lock_kernel();
 	tick_init();
 	boot_cpu_init();
 	page_address_init();
@@ -626,6 +625,8 @@ asmlinkage void __init start_kernel(void)
 	 */
 	locking_selftest();
 
+	lock_kernel();
+
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 			initrd_start < min_low_pfn << PAGE_SHIFT) {

commit d31eec64e76a4b0795b5a6b57f2925d57aeefda5
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 13:47:58 2008 +0200

    remove the BKL: tty updates
    
    untangle the following workqueue <-> BKL dependency in the TTY code:
    
    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.26-rc2-sched-devel.git #461
    -------------------------------------------------------
    events/1/11 is trying to acquire lock:
     (kernel_mutex){--..}, at: [<c0485203>] lock_kernel+0x1e/0x25
    
    but task is already holding lock:
     (&(&tty->buf.work)->work){--..}, at: [<c014a83d>] run_workqueue+0x80/0x18b
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #2 (&(&tty->buf.work)->work){--..}:
           [<c0159345>] __lock_acquire+0x97d/0xae6
           [<c015981a>] lock_acquire+0x4e/0x6c
           [<c014a873>] run_workqueue+0xb6/0x18b
           [<c014b1b7>] worker_thread+0xb6/0xc2
           [<c014d4f4>] kthread+0x3b/0x63
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    -> #1 (events){--..}:
           [<c0159345>] __lock_acquire+0x97d/0xae6
           [<c015981a>] lock_acquire+0x4e/0x6c
           [<c014aec2>] flush_workqueue+0x3f/0x7c
           [<c014af0c>] flush_scheduled_work+0xd/0xf
           [<c0285a59>] release_dev+0x42c/0x54a
           [<c0285b89>] tty_release+0x12/0x1c
           [<c01945b8>] __fput+0xae/0x155
           [<c01948e8>] fput+0x17/0x19
           [<c0191ea6>] filp_close+0x50/0x5a
           [<c01930c2>] sys_close+0x71/0xad
           [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
           [<ffffffff>] 0xffffffff
    
    -> #0 (kernel_mutex){--..}:
           [<c015926c>] __lock_acquire+0x8a4/0xae6
           [<c015981a>] lock_acquire+0x4e/0x6c
           [<c0483a58>] mutex_lock_nested+0xc2/0x22a
           [<c0485203>] lock_kernel+0x1e/0x25
           [<c02872a9>] opost+0x24/0x194
           [<c02884a2>] n_tty_receive_buf+0xb1b/0xfaa
           [<c0283df2>] flush_to_ldisc+0xd9/0x148
           [<c014a878>] run_workqueue+0xbb/0x18b
           [<c014b1b7>] worker_thread+0xb6/0xc2
           [<c014d4f4>] kthread+0x3b/0x63
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    other info that might help us debug this:
    
    2 locks held by events/1/11:
     #0:  (events){--..}, at: [<c014a83d>] run_workqueue+0x80/0x18b
     #1:  (&(&tty->buf.work)->work){--..}, at: [<c014a83d>] run_workqueue+0x80/0x18b
    
    stack backtrace:
    Pid: 11, comm: events/1 Not tainted 2.6.26-rc2-sched-devel.git #461
     [<c0157a37>] print_circular_bug_tail+0x5b/0x66
     [<c015737f>] ? print_circular_bug_entry+0x39/0x43
     [<c015926c>] __lock_acquire+0x8a4/0xae6
     [<c015981a>] lock_acquire+0x4e/0x6c
     [<c0485203>] ? lock_kernel+0x1e/0x25
     [<c0483a58>] mutex_lock_nested+0xc2/0x22a
     [<c0485203>] ? lock_kernel+0x1e/0x25
     [<c0485203>] ? lock_kernel+0x1e/0x25
     [<c0485203>] lock_kernel+0x1e/0x25
     [<c02872a9>] opost+0x24/0x194
     [<c02884a2>] n_tty_receive_buf+0xb1b/0xfaa
     [<c0133bf5>] ? find_busiest_group+0x1db/0x5a0
     [<c0158470>] ? mark_held_locks+0x4e/0x66
     [<c0158470>] ? mark_held_locks+0x4e/0x66
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c01585e7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c0283df2>] flush_to_ldisc+0xd9/0x148
     [<c014a878>] run_workqueue+0xbb/0x18b
     [<c014a83d>] ? run_workqueue+0x80/0x18b
     [<c0283d19>] ? flush_to_ldisc+0x0/0x148
     [<c014b1b7>] worker_thread+0xb6/0xc2
     [<c014d5b5>] ? autoremove_wake_function+0x0/0x30
     [<c014b101>] ? worker_thread+0x0/0xc2
     [<c014d4f4>] kthread+0x3b/0x63
     [<c014d4b9>] ? kthread+0x0/0x63
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    kjournald starting.  Commit interval 5 seconds
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 49c1a22..b044576 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2590,9 +2590,19 @@ static void release_dev(struct file *filp)
 
 	/*
 	 * Wait for ->hangup_work and ->buf.work handlers to terminate
+	 *
+	 * It's safe to drop/reacquire the BKL here as
+	 * flush_scheduled_work() can sleep anyway:
 	 */
-
-	flush_scheduled_work();
+	{
+		int bkl = kernel_locked();
+
+		if (bkl)
+			unlock_kernel();
+		flush_scheduled_work();
+		if (bkl)
+			lock_kernel();
+	}
 
 	/*
 	 * Wait for any short term users (we know they are just driver

commit afb99e5a939d4eff43ede3155bc8a7563c10f748
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 13:35:33 2008 +0200

    remove the BKL: flush_workqueue() debug helper & fix
    
    workqueue execution can introduce nasty BKL inversion dependencies,
    root them out at their source by warning about them. Avoid hangs
    by unlocking the BKL and warning about the incident. (this is safe
    as this function will sleep anyway)
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 29fc39f..ce0cb10 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -392,13 +392,26 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
 void flush_workqueue(struct workqueue_struct *wq)
 {
 	const cpumask_t *cpu_map = wq_cpu_map(wq);
+	int bkl = kernel_locked();
 	int cpu;
 
 	might_sleep();
+	if (bkl) {
+		if (debug_locks) {
+			WARN_ON_ONCE(1);
+			debug_show_held_locks(current);
+			debug_locks_off();
+		}
+		unlock_kernel();
+	}
+
 	lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
 	lock_release(&wq->lockdep_map, 1, _THIS_IP_);
 	for_each_cpu_mask(cpu, *cpu_map)
 		flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+
+	if (bkl)
+		lock_kernel();
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
 

commit d7f03183eb55be792b3bcf255d2a9aec1c17b5df
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 13:03:11 2008 +0200

    softlockup helper: print BKL owner
    
    on softlockup, print who owns the BKL lock.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/smp_lock.h b/include/linux/smp_lock.h
index 36e23b8..c318a60 100644
--- a/include/linux/smp_lock.h
+++ b/include/linux/smp_lock.h
@@ -11,9 +11,13 @@ static inline int kernel_locked(void)
 {
 	return current->lock_depth >= 0;
 }
+
+extern void debug_print_bkl(void);
+
 #else
 static inline lock_kernel(void)			__acquires(kernel_lock) { }
 static inline void unlock_kernel(void)		__releases(kernel_lock) { }
 static inline int kernel_locked(void)		{ return 1; }
+static inline void debug_print_bkl(void)	{ }
 #endif
 #endif
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index 01b6522..46080ca 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -15,6 +15,7 @@
 #include <linux/kthread.h>
 #include <linux/notifier.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 
 #include <asm/irq_regs.h>
 
@@ -170,6 +171,8 @@ static void check_hung_task(struct task_struct *t, unsigned long now)
 	sched_show_task(t);
 	__debug_show_held_locks(t);
 
+	debug_print_bkl();
+
 	t->last_switch_timestamp = now;
 	touch_nmi_watchdog();
 }
diff --git a/lib/kernel_lock.c b/lib/kernel_lock.c
index 41718ce..ca03ae8 100644
--- a/lib/kernel_lock.c
+++ b/lib/kernel_lock.c
@@ -53,6 +53,17 @@ void __lockfunc unlock_kernel(void)
 		mutex_unlock(&kernel_mutex);
 }
 
+void debug_print_bkl(void)
+{
+#ifdef CONFIG_DEBUG_MUTEXES
+	if (mutex_is_locked(&kernel_mutex)) {
+		printk(KERN_EMERG "BUG: **** BKL held by: %d:%s\n",
+			kernel_mutex.owner->task->pid,
+			kernel_mutex.owner->task->comm);
+	}
+#endif
+}
+
 EXPORT_SYMBOL(lock_kernel);
 EXPORT_SYMBOL(unlock_kernel);
 

commit 7a6e0ca35dc9bd458f331d2950fb6c875e432f18
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 09:55:53 2008 +0200

    remove the BKL: remove it from the core kernel!
    
    remove the classic Big Kernel Lock from the core kernel.
    
    this means it does not get auto-dropped anymore. Code which relies
    on this has to be fixed.
    
    the resulting lock_kernel() code is a plain mutex with a thin
    self-recursion layer ontop of it.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/smp_lock.h b/include/linux/smp_lock.h
index aab3a4c..36e23b8 100644
--- a/include/linux/smp_lock.h
+++ b/include/linux/smp_lock.h
@@ -2,38 +2,18 @@
 #define __LINUX_SMPLOCK_H
 
 #ifdef CONFIG_LOCK_KERNEL
-#include <linux/sched.h>
-
-#define kernel_locked()		(current->lock_depth >= 0)
-
-extern int __lockfunc __reacquire_kernel_lock(void);
-extern void __lockfunc __release_kernel_lock(void);
-
-/*
- * Release/re-acquire global kernel lock for the scheduler
- */
-#define release_kernel_lock(tsk) do { 		\
-	if (unlikely((tsk)->lock_depth >= 0))	\
-		__release_kernel_lock();	\
-} while (0)
-
-static inline int reacquire_kernel_lock(struct task_struct *task)
-{
-	if (unlikely(task->lock_depth >= 0))
-		return __reacquire_kernel_lock();
-	return 0;
-}
+# include <linux/sched.h>
 
 extern void __lockfunc lock_kernel(void)	__acquires(kernel_lock);
 extern void __lockfunc unlock_kernel(void)	__releases(kernel_lock);
 
+static inline int kernel_locked(void)
+{
+	return current->lock_depth >= 0;
+}
 #else
-
-#define lock_kernel()				do { } while(0)
-#define unlock_kernel()				do { } while(0)
-#define release_kernel_lock(task)		do { } while(0)
-#define reacquire_kernel_lock(task)		0
-#define kernel_locked()				1
-
-#endif /* CONFIG_LOCK_KERNEL */
-#endif /* __LINUX_SMPLOCK_H */
+static inline lock_kernel(void)			__acquires(kernel_lock) { }
+static inline void unlock_kernel(void)		__releases(kernel_lock) { }
+static inline int kernel_locked(void)		{ return 1; }
+#endif
+#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 933e60e..34bcb04 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -54,6 +54,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/smp_lock.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1010,6 +1011,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	struct task_struct *p;
 	int cgroup_callbacks_done = 0;
 
+	if (system_state == SYSTEM_RUNNING && kernel_locked())
+		debug_check_no_locks_held(current);
+
 	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 		return ERR_PTR(-EINVAL);
 
diff --git a/kernel/sched.c b/kernel/sched.c
index 59d20a5..c6d1f26 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4494,9 +4494,6 @@ need_resched:
 	prev = rq->curr;
 	switch_count = &prev->nivcsw;
 
-	release_kernel_lock(prev);
-need_resched_nonpreemptible:
-
 	schedule_debug(prev);
 
 	hrtick_clear(rq);
@@ -4549,9 +4546,6 @@ need_resched_nonpreemptible:
 
 	hrtick_set(rq);
 
-	if (unlikely(reacquire_kernel_lock(current) < 0))
-		goto need_resched_nonpreemptible;
-
 	preempt_enable_no_resched();
 	if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
 		goto need_resched;
@@ -4567,8 +4561,6 @@ EXPORT_SYMBOL(schedule);
 asmlinkage void __sched preempt_schedule(void)
 {
 	struct thread_info *ti = current_thread_info();
-	struct task_struct *task = current;
-	int saved_lock_depth;
 
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
@@ -4579,16 +4571,7 @@ asmlinkage void __sched preempt_schedule(void)
 
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
-
-		/*
-		 * We keep the big kernel semaphore locked, but we
-		 * clear ->lock_depth so that schedule() doesnt
-		 * auto-release the semaphore:
-		 */
-		saved_lock_depth = task->lock_depth;
-		task->lock_depth = -1;
 		schedule();
-		task->lock_depth = saved_lock_depth;
 		sub_preempt_count(PREEMPT_ACTIVE);
 
 		/*
@@ -4609,26 +4592,15 @@ EXPORT_SYMBOL(preempt_schedule);
 asmlinkage void __sched preempt_schedule_irq(void)
 {
 	struct thread_info *ti = current_thread_info();
-	struct task_struct *task = current;
-	int saved_lock_depth;
 
 	/* Catch callers which need to be fixed */
 	BUG_ON(ti->preempt_count || !irqs_disabled());
 
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
-
-		/*
-		 * We keep the big kernel semaphore locked, but we
-		 * clear ->lock_depth so that schedule() doesnt
-		 * auto-release the semaphore:
-		 */
-		saved_lock_depth = task->lock_depth;
-		task->lock_depth = -1;
 		local_irq_enable();
 		schedule();
 		local_irq_disable();
-		task->lock_depth = saved_lock_depth;
 		sub_preempt_count(PREEMPT_ACTIVE);
 
 		/*
@@ -5535,11 +5507,6 @@ static void __cond_resched(void)
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
 	__might_sleep(__FILE__, __LINE__);
 #endif
-	/*
-	 * The BKS might be reacquired before we have dropped
-	 * PREEMPT_ACTIVE, which could trigger a second
-	 * cond_resched() call.
-	 */
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
 		schedule();
diff --git a/lib/kernel_lock.c b/lib/kernel_lock.c
index cd3e825..41718ce 100644
--- a/lib/kernel_lock.c
+++ b/lib/kernel_lock.c
@@ -1,66 +1,32 @@
 /*
- * lib/kernel_lock.c
+ * This is the Big Kernel Lock - the traditional lock that we
+ * inherited from the uniprocessor Linux kernel a decade ago.
  *
- * This is the traditional BKL - big kernel lock. Largely
- * relegated to obsolescence, but used by various less
+ * Largely relegated to obsolescence, but used by various less
  * important (or lazy) subsystems.
- */
-#include <linux/smp_lock.h>
-#include <linux/module.h>
-#include <linux/kallsyms.h>
-#include <linux/semaphore.h>
-
-/*
- * The 'big kernel semaphore'
- *
- * This mutex is taken and released recursively by lock_kernel()
- * and unlock_kernel().  It is transparently dropped and reacquired
- * over schedule().  It is used to protect legacy code that hasn't
- * been migrated to a proper locking design yet.
- *
- * Note: code locked by this semaphore will only be serialized against
- * other code using the same locking facility. The code guarantees that
- * the task remains on the same CPU.
  *
  * Don't use in new code.
- */
-static DECLARE_MUTEX(kernel_sem);
-
-/*
- * Re-acquire the kernel semaphore.
  *
- * This function is called with preemption off.
+ * It now has plain mutex semantics (i.e. no auto-drop on
+ * schedule() anymore), combined with a very simple self-recursion
+ * layer that allows the traditional nested use:
+ *
+ *   lock_kernel();
+ *     lock_kernel();
+ *     unlock_kernel();
+ *   unlock_kernel();
  *
- * We are executing in schedule() so the code must be extremely careful
- * about recursion, both due to the down() and due to the enabling of
- * preemption. schedule() will re-check the preemption flag after
- * reacquiring the semaphore.
+ * Please migrate all BKL using code to a plain mutex.
  */
-int __lockfunc __reacquire_kernel_lock(void)
-{
-	struct task_struct *task = current;
-	int saved_lock_depth = task->lock_depth;
-
-	BUG_ON(saved_lock_depth < 0);
-
-	task->lock_depth = -1;
-	preempt_enable_no_resched();
-
-	down(&kernel_sem);
-
-	preempt_disable();
-	task->lock_depth = saved_lock_depth;
-
-	return 0;
-}
+#include <linux/smp_lock.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
 
-void __lockfunc __release_kernel_lock(void)
-{
-	up(&kernel_sem);
-}
+static DEFINE_MUTEX(kernel_mutex);
 
 /*
- * Getting the big kernel semaphore.
+ * Get the big kernel lock:
  */
 void __lockfunc lock_kernel(void)
 {
@@ -71,7 +37,7 @@ void __lockfunc lock_kernel(void)
 		/*
 		 * No recursion worries - we set up lock_depth _after_
 		 */
-		down(&kernel_sem);
+		mutex_lock(&kernel_mutex);
 
 	task->lock_depth = depth;
 }
@@ -80,10 +46,11 @@ void __lockfunc unlock_kernel(void)
 {
 	struct task_struct *task = current;
 
-	BUG_ON(task->lock_depth < 0);
+	if (WARN_ON_ONCE(task->lock_depth < 0))
+		return;
 
 	if (likely(--task->lock_depth < 0))
-		up(&kernel_sem);
+		mutex_unlock(&kernel_mutex);
 }
 
 EXPORT_SYMBOL(lock_kernel);

commit df34bbceea535a6ce4f384a096334feac05d4a33
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 18:40:41 2008 +0200

    remove the BKL: remove "BKL auto-drop" assumption from vt_waitactive()
    
    fix vt_waitactive()'s "schedule() drops the BKL automatically"
    assumption, when schedule() does not do that it can lock up,
    as reported by the softlockup detector:
    
    --------------------->
    console-kit-d D 00000000     0  1866      1
           f5aeeda0 00000046 00000001 00000000 c063d0a4 5f87b6a4 00000009 c06e6900
           c06e6000 f64da358 f64da5c0 c2a12000 00000001 00000040 f5aee000 f6797dc0
           f64da358 00000000 00000000 00000000 00000000 f64da358 c0158627 00000246
    Call Trace:
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c01585e7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c0483a98>] mutex_lock_nested+0x142/0x22a
     [<c04851f9>] ? lock_kernel+0x1e/0x25
     [<c04851f9>] lock_kernel+0x1e/0x25
     [<c028a692>] vt_ioctl+0x25/0x15c7
     [<c013352f>] ? __resched_task+0x5f/0x63
     [<c0157291>] ? trace_hardirqs_off+0xb/0xd
     [<c0485013>] ? _spin_unlock_irqrestore+0x42/0x58
     [<c028a66d>] ? vt_ioctl+0x0/0x15c7
     [<c0286b8c>] tty_ioctl+0xdbb/0xe18
     [<c013014e>] ? kunmap_atomic+0x66/0x7c
     [<c0178c3a>] ? __alloc_pages_internal+0xee/0x3a8
     [<c017e186>] ? __inc_zone_state+0x12/0x5c
     [<c0484f44>] ? _spin_unlock+0x27/0x3c
     [<c0181a01>] ? handle_mm_fault+0x56c/0x587
     [<c0285dd1>] ? tty_ioctl+0x0/0xe18
     [<c019e172>] vfs_ioctl+0x22/0x67
     [<c019e413>] do_vfs_ioctl+0x25c/0x26a
     [<c019e461>] sys_ioctl+0x40/0x5b
     [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
     [<c0110000>] ? kvm_pic_read_irq+0xa3/0xbf
     =======================
    
    console-kit-d S f6eb0380     0  1867      1
           f65a0dc4 00000046 00000000 f6eb0380 f6eb0358 00000000 f65a0d7c c06e6900
           c06e6000 f6eb0358 f6eb05c0 c2a0a000 00000000 00000040 f65a0000 f6797dc0
           f65a0d94 fffc0957 f65a0da4 c0485013 00000003 00000004 ffffffff c013d7d1
    Call Trace:
     [<c0485013>] ? _spin_unlock_irqrestore+0x42/0x58
     [<c013d7d1>] ? release_console_sem+0x192/0x1a5
     [<c028a644>] vt_waitactive+0x70/0x99
     [<c01360a4>] ? default_wake_function+0x0/0xd
     [<c028b5b4>] vt_ioctl+0xf47/0x15c7
     [<c028a66d>] ? vt_ioctl+0x0/0x15c7
     [<c0286b8c>] tty_ioctl+0xdbb/0xe18
     [<c013014e>] ? kunmap_atomic+0x66/0x7c
     [<c0178c3a>] ? __alloc_pages_internal+0xee/0x3a8
     [<c017e186>] ? __inc_zone_state+0x12/0x5c
     [<c0484f44>] ? _spin_unlock+0x27/0x3c
     [<c0181a01>] ? handle_mm_fault+0x56c/0x587
     [<c0285dd1>] ? tty_ioctl+0x0/0xe18
     [<c019e172>] vfs_ioctl+0x22/0x67
     [<c019e413>] do_vfs_ioctl+0x25c/0x26a
     [<c019e461>] sys_ioctl+0x40/0x5b
     [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
     [<c0110000>] ? kvm_pic_read_irq+0xa3/0xbf
     =======================
    
    The fix is the drop the BKL explicitly instead of implicitly.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/drivers/char/vt_ioctl.c b/drivers/char/vt_ioctl.c
index 3211afd..bab26e1 100644
--- a/drivers/char/vt_ioctl.c
+++ b/drivers/char/vt_ioctl.c
@@ -1174,8 +1174,12 @@ static DECLARE_WAIT_QUEUE_HEAD(vt_activate_queue);
 int vt_waitactive(int vt)
 {
 	int retval;
+	int bkl = kernel_locked();
 	DECLARE_WAITQUEUE(wait, current);
 
+	if (bkl)
+		unlock_kernel();
+
 	add_wait_queue(&vt_activate_queue, &wait);
 	for (;;) {
 		retval = 0;
@@ -1201,6 +1205,10 @@ int vt_waitactive(int vt)
 	}
 	remove_wait_queue(&vt_activate_queue, &wait);
 	__set_current_state(TASK_RUNNING);
+
+	if (bkl)
+		lock_kernel();
+
 	return retval;
 }
 

commit 3a0bf25bb160233b902962457ce917df27550850
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 11:34:13 2008 +0200

    remove the BKL: reduce misc_open() BKL dependency
    
    fix this BKL dependency problem due to request_module():
    
    ------------------------>
    Write protecting the kernel text: 3620k
    Write protecting the kernel read-only data: 1664k
    INFO: task hwclock:700 blocked for more than 30 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    hwclock       D c0629430     0   700    673
           f69b7d08 00000046 00000001 c0629430 00000001 00000046 00000000 c06e6900
           c06e6000 f6ead358 f6ead5c0 c1d1b000 00000001 00000040 f69b7000 f6848dc0
           00000000 fffb92ac f6ead358 f6ead830 00000001 00000000 ffffffff 00000001
    Call Trace:
     [<c048323c>] schedule_timeout+0x16/0x8b
     [<c0158470>] ? mark_held_locks+0x4e/0x66
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c01585e7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c048279c>] wait_for_common+0xc3/0xfc
     [<c01360a4>] ? default_wake_function+0x0/0xd
     [<c0482857>] wait_for_completion+0x12/0x14
     [<c014a42b>] call_usermodehelper_exec+0x7f/0xbf
     [<c014a710>] request_module+0xce/0xe2
     [<c0158470>] ? mark_held_locks+0x4e/0x66
     [<c0158627>] ? trace_hardirqs_on+0xb/0xd
     [<c01585e7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c028a252>] misc_open+0xc4/0x216
     [<c0195d77>] chrdev_open+0x156/0x172
     [<c01921d9>] __dentry_open+0x147/0x236
     [<c01922e7>] nameidata_to_filp+0x1f/0x33
     [<c0195c21>] ? chrdev_open+0x0/0x172
     [<c019d38a>] do_filp_open+0x347/0x695
     [<c0191f67>] ? get_unused_fd_flags+0xc3/0xcd
     [<c0191fb1>] do_sys_open+0x40/0xb6
     [<c023fe74>] ? trace_hardirqs_on_thunk+0xc/0x10
     [<c0192069>] sys_open+0x1e/0x26
     [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
     =======================
    1 lock held by hwclock/700:
     #0:  (kernel_sem){--..}, at: [<c04851b1>] lock_kernel+0x1e/0x25
    Kernel panic - not syncing: softlockup: blocked tasks
    Pid: 5, comm: watchdog/0 Not tainted 2.6.26-rc2-sched-devel.git #454
     [<c013d1fb>] panic+0x49/0xfa
     [<c016b177>] watchdog+0x168/0x1d1
     [<c016b00f>] ? watchdog+0x0/0x1d1
     [<c014d4f4>] kthread+0x3b/0x63
     [<c014d4b9>] ? kthread+0x0/0x63
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index eaace0d..3f2b7be 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -36,6 +36,7 @@
 #include <linux/module.h>
 
 #include <linux/fs.h>
+#include <linux/smp_lock.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
 #include <linux/kernel.h>
@@ -128,8 +129,15 @@ static int misc_open(struct inode * inode, struct file * file)
 	}
 		
 	if (!new_fops) {
+		int bkl = kernel_locked();
+
 		mutex_unlock(&misc_mtx);
+		if (bkl)
+			unlock_kernel();
 		request_module("char-major-%d-%d", MISC_MAJOR, minor);
+		if (bkl)
+			lock_kernel();
+
 		mutex_lock(&misc_mtx);
 
 		list_for_each_entry(c, &misc_list, list) {

commit 93ea4ccabef1016e6df217d5756ca5f70e37b39a
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 11:14:48 2008 +0200

    remove the BKL: change ext3 BKL assumption
    
    remove this 'we are holding the BKL' assumption from ext3:
    
    md: Autodetecting RAID arrays.
    md: Scanned 0 and added 0 devices.
    md: autorun ...
    md: ... autorun DONE.
    ------------[ cut here ]------------
    kernel BUG at lib/kernel_lock.c:83!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in:
    
    Pid: 1, comm: swapper Not tainted (2.6.26-rc2-sched-devel.git #451)
    EIP: 0060:[<c0485106>] EFLAGS: 00010286 CPU: 1
    EIP is at unlock_kernel+0x11/0x28
    EAX: ffffffff EBX: fffffff4 ECX: 00000000 EDX: f7cb3358
    ESI: 00000001 EDI: 00000000 EBP: f7cb4d2c ESP: f7cb4d2c
     DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process swapper (pid: 1, ti=f7cb4000 task=f7cb3358 task.ti=f7cb4000)
    Stack: f7cb4dc4 c01dbc59 c023f686 00000001 00000000 0000000a 00000001 f6901bf0
           00000000 00000020 f7cb4dd8 0000000a f7cb4df8 00000002 f7240000 ffffffff
           c05a9138 f6fc6bfc 00000001 f7cb4dd8 f7cb4d8c c023f737 f7cb4da0 f7cb4da0
    Call Trace:
     [<c01dbc59>] ? ext3_fill_super+0xc8/0x13d6
     [<c023f686>] ? vsnprintf+0x3c3/0x3fc
     [<c023f737>] ? snprintf+0x1b/0x1d
     [<c01cbc37>] ? disk_name+0x5a/0x67
     [<c01958f7>] ? get_sb_bdev+0xcd/0x10b
     [<c0190100>] ? __kmalloc+0x86/0x132
     [<c01a7532>] ? alloc_vfsmnt+0xe3/0x10b
     [<c01a7532>] ? alloc_vfsmnt+0xe3/0x10b
     [<c01da253>] ? ext3_get_sb+0x13/0x15
     [<c01dbb91>] ? ext3_fill_super+0x0/0x13d6
     [<c01954df>] ? vfs_kern_mount+0x81/0xf7
     [<c0195599>] ? do_kern_mount+0x32/0xba
     [<c01a8566>] ? do_new_mount+0x46/0x74
     [<c01a872b>] ? do_mount+0x197/0x1b5
     [<c018ee49>] ? cache_alloc_debugcheck_after+0x6a/0x19c
     [<c0178f48>] ? __get_free_pages+0x1b/0x21
     [<c01a67f4>] ? copy_mount_options+0x27/0x10e
     [<c01a87a8>] ? sys_mount+0x5f/0x91
     [<c06a0a90>] ? mount_block_root+0xa3/0x1e6
     [<c02350af>] ? blk_lookup_devt+0x5e/0x64
     [<c019cc61>] ? sys_mknod+0x13/0x15
     [<c06a0c1f>] ? mount_root+0x4c/0x54
     [<c06a0d72>] ? prepare_namespace+0x14b/0x172
     [<c06a0565>] ? kernel_init+0x217/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c011a737>] ? kernel_thread_helper+0x7/0x10
     =======================
    Code: 11 21 00 00 89 e0 25 00 f0 ff ff f6 40 08 08 74 05 e8 2b df ff ff 5b 5e 5d c3 55 64 8b 15 80 20 6e c0 8b 42 14 89 e5 85 c0 79 04 <0f> 0b eb fe 48 89 42 14 40 75 0a b8 70 d0 63 c0 e8 c9 e7 ff ff
    EIP: [<c0485106>] unlock_kernel+0x11/0x28 SS:ESP 0068:f7cb4d2c
    Kernel panic - not syncing: Fatal exception
    Pid: 1, comm: swapper Tainted: G      D   2.6.26-rc2-sched-devel.git #451
     [<c013d1fb>] panic+0x49/0xfa
     [<c011ae8f>] die+0x11c/0x143
     [<c0485518>] do_trap+0x8a/0xa3
     [<c011b061>] ? do_invalid_op+0x0/0x76
     [<c011b0cd>] do_invalid_op+0x6c/0x76
     [<c0485106>] ? unlock_kernel+0x11/0x28
     [<c0484edc>] ? _spin_unlock+0x27/0x3c
     [<c012f143>] ? kernel_map_pages+0x108/0x11f
     [<c0485212>] error_code+0x72/0x78
     [<c0485106>] ? unlock_kernel+0x11/0x28
     [<c01dbc59>] ext3_fill_super+0xc8/0x13d6
     [<c023f686>] ? vsnprintf+0x3c3/0x3fc
     [<c023f737>] ? snprintf+0x1b/0x1d
     [<c01cbc37>] ? disk_name+0x5a/0x67
     [<c01958f7>] get_sb_bdev+0xcd/0x10b
     [<c0190100>] ? __kmalloc+0x86/0x132
     [<c01a7532>] ? alloc_vfsmnt+0xe3/0x10b
     [<c01a7532>] ? alloc_vfsmnt+0xe3/0x10b
     [<c01da253>] ext3_get_sb+0x13/0x15
     [<c01dbb91>] ? ext3_fill_super+0x0/0x13d6
     [<c01954df>] vfs_kern_mount+0x81/0xf7
     [<c0195599>] do_kern_mount+0x32/0xba
     [<c01a8566>] do_new_mount+0x46/0x74
     [<c01a872b>] do_mount+0x197/0x1b5
     [<c018ee49>] ? cache_alloc_debugcheck_after+0x6a/0x19c
     [<c0178f48>] ? __get_free_pages+0x1b/0x21
     [<c01a67f4>] ? copy_mount_options+0x27/0x10e
     [<c01a87a8>] sys_mount+0x5f/0x91
     [<c06a0a90>] mount_block_root+0xa3/0x1e6
     [<c02350af>] ? blk_lookup_devt+0x5e/0x64
     [<c019cc61>] ? sys_mknod+0x13/0x15
     [<c06a0c1f>] mount_root+0x4c/0x54
     [<c06a0d72>] prepare_namespace+0x14b/0x172
     [<c06a0565>] kernel_init+0x217/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    Rebooting in 10 seconds..
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index fe3119a..c05e7a7 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1522,8 +1522,6 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	sbi->s_resgid = EXT3_DEF_RESGID;
 	sbi->s_sb_block = sb_block;
 
-	unlock_kernel();
-
 	blocksize = sb_min_blocksize(sb, EXT3_MIN_BLOCK_SIZE);
 	if (!blocksize) {
 		printk(KERN_ERR "EXT3-fs: unable to set blocksize\n");
@@ -1918,7 +1916,6 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 		test_opt(sb,DATA_FLAGS) == EXT3_MOUNT_ORDERED_DATA ? "ordered":
 		"writeback");
 
-	lock_kernel();
 	return 0;
 
 cantfind_ext3:
@@ -1947,7 +1944,6 @@ failed_mount:
 out_fail:
 	sb->s_fs_info = NULL;
 	kfree(sbi);
-	lock_kernel();
 	return ret;
 }
 

commit a79fcbacfdd3e7dfdf04a5275e6688d37478360b
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 10:55:14 2008 +0200

    remove the BKL: restruct ->bd_mutex and BKL dependency
    
    fix this bd_mutex <-> BKL lock dependency problem (which was hidden
    until now by the BKL's auto-drop property):
    
    ------------->
    ata2.01: configured for UDMA/33
    scsi 0:0:0:0: Direct-Access     ATA      HDS722525VLAT80  V36O PQ: 0 ANSI: 5
    sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
    sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
     sda: sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 >
    
    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.26-rc2-sched-devel.git #448
    -------------------------------------------------------
    swapper/1 is trying to acquire lock:
     (kernel_sem){--..}, at: [<c04851d1>] lock_kernel+0x1e/0x25
    
    but task is already holding lock:
     (&bdev->bd_mutex){--..}, at: [<c01b4e12>] __blkdev_put+0x24/0x10f
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (&bdev->bd_mutex){--..}:
           [<c0159365>] __lock_acquire+0x97d/0xae6
           [<c015983a>] lock_acquire+0x4e/0x6c
           [<c04839f0>] mutex_lock_nested+0xc2/0x22a
           [<c01b5075>] do_open+0x65/0x277
           [<c01b5301>] __blkdev_get+0x7a/0x85
           [<c01b5319>] blkdev_get+0xd/0xf
           [<c01cc04d>] register_disk+0xcf/0x11c
           [<c02351b2>] add_disk+0x2f/0x74
           [<c03264b2>] sd_probe+0x2d2/0x379
           [<c02b5a32>] driver_probe_device+0xa0/0x11b
           [<c02b5b14>] __device_attach+0x8/0xa
           [<c02b5041>] bus_for_each_drv+0x39/0x63
           [<c02b5b86>] device_attach+0x51/0x67
           [<c02b4ec7>] bus_attach_device+0x24/0x4e
           [<c02b41b4>] device_add+0x31e/0x42c
           [<c02f2690>] scsi_sysfs_add_sdev+0x9f/0x1d3
           [<c02f0d08>] scsi_probe_and_add_lun+0x96d/0xa84
           [<c02f1861>] __scsi_add_device+0x85/0xab
           [<c0334dda>] ata_scsi_scan_host+0x99/0x217
           [<c03323c6>] ata_host_register+0x1c8/0x1e5
           [<c0339b71>] ata_pci_sff_activate_host+0x179/0x19f
           [<c0339fab>] ata_pci_sff_init_one+0x97/0xe1
           [<c034c484>] amd_init_one+0x10a/0x113
           [<c024cb8d>] pci_device_probe+0x39/0x59
           [<c02b5a32>] driver_probe_device+0xa0/0x11b
           [<c02b5aea>] __driver_attach+0x3d/0x5f
           [<c02b528b>] bus_for_each_dev+0x3e/0x60
           [<c02b58c9>] driver_attach+0x14/0x16
           [<c02b5622>] bus_add_driver+0x9d/0x1af
           [<c02b5c6a>] driver_register+0x71/0xcd
           [<c024cd4c>] __pci_register_driver+0x40/0x6c
           [<c06bfd10>] amd_init+0x14/0x16
           [<c06a0464>] kernel_init+0x116/0x226
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    -> #0 (kernel_sem){--..}:
           [<c015928c>] __lock_acquire+0x8a4/0xae6
           [<c015983a>] lock_acquire+0x4e/0x6c
           [<c04839f0>] mutex_lock_nested+0xc2/0x22a
           [<c04851d1>] lock_kernel+0x1e/0x25
           [<c01b4e17>] __blkdev_put+0x29/0x10f
           [<c01b4f07>] blkdev_put+0xa/0xc
           [<c01cc058>] register_disk+0xda/0x11c
           [<c02351b2>] add_disk+0x2f/0x74
           [<c03264b2>] sd_probe+0x2d2/0x379
           [<c02b5a32>] driver_probe_device+0xa0/0x11b
           [<c02b5b14>] __device_attach+0x8/0xa
           [<c02b5041>] bus_for_each_drv+0x39/0x63
           [<c02b5b86>] device_attach+0x51/0x67
           [<c02b4ec7>] bus_attach_device+0x24/0x4e
           [<c02b41b4>] device_add+0x31e/0x42c
           [<c02f2690>] scsi_sysfs_add_sdev+0x9f/0x1d3
           [<c02f0d08>] scsi_probe_and_add_lun+0x96d/0xa84
           [<c02f1861>] __scsi_add_device+0x85/0xab
           [<c0334dda>] ata_scsi_scan_host+0x99/0x217
           [<c03323c6>] ata_host_register+0x1c8/0x1e5
           [<c0339b71>] ata_pci_sff_activate_host+0x179/0x19f
           [<c0339fab>] ata_pci_sff_init_one+0x97/0xe1
           [<c034c484>] amd_init_one+0x10a/0x113
           [<c024cb8d>] pci_device_probe+0x39/0x59
           [<c02b5a32>] driver_probe_device+0xa0/0x11b
           [<c02b5aea>] __driver_attach+0x3d/0x5f
           [<c02b528b>] bus_for_each_dev+0x3e/0x60
           [<c02b58c9>] driver_attach+0x14/0x16
           [<c02b5622>] bus_add_driver+0x9d/0x1af
           [<c02b5c6a>] driver_register+0x71/0xcd
           [<c024cd4c>] __pci_register_driver+0x40/0x6c
           [<c06bfd10>] amd_init+0x14/0x16
           [<c06a0464>] kernel_init+0x116/0x226
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    other info that might help us debug this:
    
    2 locks held by swapper/1:
     #0:  (&shost->scan_mutex){--..}, at: [<c02f1835>] __scsi_add_device+0x59/0xab
     #1:  (&bdev->bd_mutex){--..}, at: [<c01b4e12>] __blkdev_put+0x24/0x10f
    
    stack backtrace:
    Pid: 1, comm: swapper Not tainted 2.6.26-rc2-sched-devel.git #448
     [<c0157a57>] print_circular_bug_tail+0x5b/0x66
     [<c0157ed8>] ? print_circular_bug_header+0xa6/0xb1
     [<c015928c>] __lock_acquire+0x8a4/0xae6
     [<c015983a>] lock_acquire+0x4e/0x6c
     [<c04851d1>] ? lock_kernel+0x1e/0x25
     [<c04839f0>] mutex_lock_nested+0xc2/0x22a
     [<c04851d1>] ? lock_kernel+0x1e/0x25
     [<c04851d1>] ? lock_kernel+0x1e/0x25
     [<c04851d1>] lock_kernel+0x1e/0x25
     [<c01b4e17>] __blkdev_put+0x29/0x10f
     [<c01b4f07>] blkdev_put+0xa/0xc
     [<c01cc058>] register_disk+0xda/0x11c
     [<c02351b2>] add_disk+0x2f/0x74
     [<c0234c50>] ? exact_match+0x0/0xb
     [<c0234f2f>] ? exact_lock+0x0/0x11
     [<c03264b2>] sd_probe+0x2d2/0x379
     [<c02b5a32>] driver_probe_device+0xa0/0x11b
     [<c02b5b14>] __device_attach+0x8/0xa
     [<c02b5041>] bus_for_each_drv+0x39/0x63
     [<c02b5b86>] device_attach+0x51/0x67
     [<c02b5b0c>] ? __device_attach+0x0/0xa
     [<c02b4ec7>] bus_attach_device+0x24/0x4e
     [<c02b41b4>] device_add+0x31e/0x42c
     [<c02f2690>] scsi_sysfs_add_sdev+0x9f/0x1d3
     [<c02f0d08>] scsi_probe_and_add_lun+0x96d/0xa84
     [<c02f1835>] ? __scsi_add_device+0x59/0xab
     [<c02f1861>] __scsi_add_device+0x85/0xab
     [<c0334dda>] ata_scsi_scan_host+0x99/0x217
     [<c03323c6>] ata_host_register+0x1c8/0x1e5
     [<c0339b71>] ata_pci_sff_activate_host+0x179/0x19f
     [<c033bc3f>] ? ata_sff_interrupt+0x0/0x1d5
     [<c0339fab>] ata_pci_sff_init_one+0x97/0xe1
     [<c034c484>] amd_init_one+0x10a/0x113
     [<c024cb8d>] pci_device_probe+0x39/0x59
     [<c02b5a32>] driver_probe_device+0xa0/0x11b
     [<c02b5aea>] __driver_attach+0x3d/0x5f
     [<c02b528b>] bus_for_each_dev+0x3e/0x60
     [<c02b58c9>] driver_attach+0x14/0x16
     [<c02b5aad>] ? __driver_attach+0x0/0x5f
     [<c02b5622>] bus_add_driver+0x9d/0x1af
     [<c02b5c6a>] driver_register+0x71/0xcd
     [<c0243089>] ? __spin_lock_init+0x24/0x48
     [<c024cd4c>] __pci_register_driver+0x40/0x6c
     [<c06bfd10>] amd_init+0x14/0x16
     [<c06a0464>] kernel_init+0x116/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c06a034e>] ? kernel_init+0x0/0x226
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    sd 0:0:0:0: [sda] Attached SCSI disk
    sd 0:0:0:0: Attached scsi generic sg0 type 0
    scsi 1:0:1:0: CD-ROM            DVDRW    IDE 16X          A079 PQ: 0 ANSI: 5
    sr0: scsi3-mmc drive: 1x/48x writer cd/rw xa/form2 cdda tray
    Uniform CD-ROM driver Revision: 3.20
    sr 1:0:1:0: Attached scsi CD-ROM sr0
    sr 1:0:1:0: Attached scsi generic sg1 type 5
    initcall amd_init+0x0/0x16() returned 0 after 1120 msecs
    calling  artop_init+0x0/0x16()
    initcall artop_init+0x0/0x16() returned 0 after 0 msecs
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7d822fa..d680428 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1083,8 +1083,8 @@ static int __blkdev_put(struct block_device *bdev, int for_part)
 	struct gendisk *disk = bdev->bd_disk;
 	struct block_device *victim = NULL;
 
-	mutex_lock_nested(&bdev->bd_mutex, for_part);
 	lock_kernel();
+	mutex_lock_nested(&bdev->bd_mutex, for_part);
 	if (for_part)
 		bdev->bd_part_count--;
 
@@ -1112,8 +1112,8 @@ static int __blkdev_put(struct block_device *bdev, int for_part)
 			victim = bdev->bd_contains;
 		bdev->bd_contains = NULL;
 	}
-	unlock_kernel();
 	mutex_unlock(&bdev->bd_mutex);
+	unlock_kernel();
 	bdput(bdev);
 	if (victim)
 		__blkdev_put(victim, 1);

commit c50fbe69c92ff23b10d13085dbcdf3c6c29a3c62
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 10:46:40 2008 +0200

    remove the BKL: reduce BKL locking during bootup
    
    reduce BKL locking during bootup - as nothing is supposed to be
    active at this point that could race with this code (and which
    race would be prevented by the BKL):
    
    ---------------------->
    calling  firmware_class_init+0x0/0x5c()
    initcall firmware_class_init+0x0/0x5c() returned 0 after 0 msecs
    calling  loopback_init+0x0/0xf()
    
    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.26-rc2-sched-devel.git #441
    -------------------------------------------------------
    swapper/1 is trying to acquire lock:
     (kernel_sem){--..}, at: [<c04851c9>] lock_kernel+0x1e/0x25
    
    but task is already holding lock:
     (rtnl_mutex){--..}, at: [<c040748b>] rtnl_lock+0xf/0x11
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #2 (rtnl_mutex){--..}:
           [<c0159361>] __lock_acquire+0x97d/0xae6
           [<c0159836>] lock_acquire+0x4e/0x6c
           [<c04839e8>] mutex_lock_nested+0xc2/0x22a
           [<c040748b>] rtnl_lock+0xf/0x11
           [<c06c4b6f>] net_ns_init+0x93/0xff
           [<c06a0469>] kernel_init+0x11b/0x22b
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    -> #1 (net_mutex){--..}:
           [<c0159361>] __lock_acquire+0x97d/0xae6
           [<c0159836>] lock_acquire+0x4e/0x6c
           [<c04839e8>] mutex_lock_nested+0xc2/0x22a
           [<c03fd5b1>] register_pernet_subsys+0x12/0x2f
           [<c06b7558>] proc_net_init+0x1e/0x20
           [<c06b722b>] proc_root_init+0x4f/0x97
           [<c06a0858>] start_kernel+0x2c4/0x2e7
           [<c06a0008>] __init_begin+0x8/0xa
           [<ffffffff>] 0xffffffff
    
    -> #0 (kernel_sem){--..}:
           [<c0159288>] __lock_acquire+0x8a4/0xae6
           [<c0159836>] lock_acquire+0x4e/0x6c
           [<c04839e8>] mutex_lock_nested+0xc2/0x22a
           [<c04851c9>] lock_kernel+0x1e/0x25
           [<c014a43d>] call_usermodehelper_exec+0x95/0xde
           [<c023c74e>] kobject_uevent_env+0x2cd/0x2ff
           [<c023c78a>] kobject_uevent+0xa/0xc
           [<c02b419d>] device_add+0x317/0x42c
           [<c0409b47>] netdev_register_kobject+0x6c/0x70
           [<c03ffa26>] register_netdevice+0x258/0x2c8
           [<c03ffac8>] register_netdev+0x32/0x3f
           [<c06beea1>] loopback_net_init+0x2e/0x5d
           [<c03fd4e3>] register_pernet_operations+0x13/0x15
           [<c03fd54c>] register_pernet_device+0x1f/0x4c
           [<c06bee71>] loopback_init+0xd/0xf
           [<c06a0469>] kernel_init+0x11b/0x22b
           [<c011a737>] kernel_thread_helper+0x7/0x10
           [<ffffffff>] 0xffffffff
    
    other info that might help us debug this:
    
    2 locks held by swapper/1:
     #0:  (net_mutex){--..}, at: [<c03fd540>] register_pernet_device+0x13/0x4c
     #1:  (rtnl_mutex){--..}, at: [<c040748b>] rtnl_lock+0xf/0x11
    
    stack backtrace:
    Pid: 1, comm: swapper Not tainted 2.6.26-rc2-sched-devel.git #441
     [<c0157a53>] print_circular_bug_tail+0x5b/0x66
     [<c015739b>] ? print_circular_bug_entry+0x39/0x43
     [<c0159288>] __lock_acquire+0x8a4/0xae6
     [<c0159836>] lock_acquire+0x4e/0x6c
     [<c04851c9>] ? lock_kernel+0x1e/0x25
     [<c04839e8>] mutex_lock_nested+0xc2/0x22a
     [<c04851c9>] ? lock_kernel+0x1e/0x25
     [<c0485026>] ? _spin_unlock_irq+0x2d/0x42
     [<c04851c9>] ? lock_kernel+0x1e/0x25
     [<c04851c9>] lock_kernel+0x1e/0x25
     [<c014a43d>] call_usermodehelper_exec+0x95/0xde
     [<c023c74e>] kobject_uevent_env+0x2cd/0x2ff
     [<c023c78a>] kobject_uevent+0xa/0xc
     [<c02b419d>] device_add+0x317/0x42c
     [<c0409b47>] netdev_register_kobject+0x6c/0x70
     [<c03ffa26>] register_netdevice+0x258/0x2c8
     [<c03ffac8>] register_netdev+0x32/0x3f
     [<c06beea1>] loopback_net_init+0x2e/0x5d
     [<c03fd4e3>] register_pernet_operations+0x13/0x15
     [<c03fd54c>] register_pernet_device+0x1f/0x4c
     [<c06bee71>] loopback_init+0xd/0xf
     [<c06a0469>] kernel_init+0x11b/0x22b
     [<c0110031>] ? kvm_timer_intr_post+0x11/0x1b
     [<c06a034e>] ? kernel_init+0x0/0x22b
     [<c06a034e>] ? kernel_init+0x0/0x22b
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    initcall loopback_init+0x0/0xf() returned 0 after 1 msecs
    calling  init_pcmcia_bus+0x0/0x6c()
    initcall init_pcmcia_bus+0x0/0x6c() returned 0 after 0 msecs
    calling  cpufreq_gov_performance_init+0x0/0xf()
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/init/main.c b/init/main.c
index f406fef..8d3b879 100644
--- a/init/main.c
+++ b/init/main.c
@@ -461,7 +461,6 @@ static void noinline __init_refok rest_init(void)
 	numa_default_policy();
 	pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
 	kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
-	unlock_kernel();
 
 	/*
 	 * The boot idle thread must execute schedule()
@@ -677,6 +676,7 @@ asmlinkage void __init start_kernel(void)
 	delayacct_init();
 
 	check_bugs();
+	unlock_kernel();
 
 	acpi_early_init(); /* before LAPIC and SMP init */
 
@@ -795,7 +795,6 @@ static void run_init_process(char *init_filename)
 static int noinline init_post(void)
 {
 	free_initmem();
-	unlock_kernel();
 	mark_rodata_ro();
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
@@ -835,7 +834,6 @@ static int noinline init_post(void)
 
 static int __init kernel_init(void * unused)
 {
-	lock_kernel();
 	/*
 	 * init can run on any cpu.
 	 */

commit 79b2b296c31fa07e8868a6c622d766bb567f6655
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 11:30:35 2008 +0200

    remove the BKL: change get_fs_type() BKL dependency
    
    solve this BKL dependency problem:
    
    ---------->
    Write protecting the kernel read-only data: 1664k
    INFO: task init:1 blocked for more than 30 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    init          D c0629430     0     1      0
           f7cb4d64 00000046 00000001 c0629430 00000001 00000046 00000000 c06e6900
           c06e6000 f7cb3358 f7cb35c0 c1d1b000 00000001 00000040 f7cb4000 f6f35dc0
           00000000 fffb8b68 f7cb3358 f7cb3830 00000001 00000000 ffffffff 00000001
    Call Trace:
     [<c0483224>] schedule_timeout+0x16/0x8b
     [<c0158450>] ? mark_held_locks+0x4e/0x66
     [<c0158607>] ? trace_hardirqs_on+0xb/0xd
     [<c01585c7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c0158607>] ? trace_hardirqs_on+0xb/0xd
     [<c0482784>] wait_for_common+0xc3/0xfc
     [<c013609f>] ? default_wake_function+0x0/0xd
     [<c048283f>] wait_for_completion+0x12/0x14
     [<c014a40b>] call_usermodehelper_exec+0x7f/0xbf
     [<c014a6f0>] request_module+0xce/0xe2
     [<c01a0073>] ? lock_get_status+0x164/0x1fe
     [<c019bf8d>] ? __link_path_walk+0xa67/0xb7a
     [<c01a60da>] get_fs_type+0xbf/0x161
     [<c0195542>] do_kern_mount+0x1b/0xba
     [<c01a853d>] do_new_mount+0x46/0x74
     [<c01a8702>] do_mount+0x197/0x1b5
     [<c01585c7>] ? trace_hardirqs_on_caller+0xe0/0x115
     [<c0483b18>] ? mutex_lock_nested+0x222/0x22a
     [<c0485199>] ? lock_kernel+0x1e/0x25
     [<c01a8784>] sys_mount+0x64/0x9b
     [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
     =======================
    1 lock held by init/1:
     #0:  (kernel_sem){--..}, at: [<c0485199>] lock_kernel+0x1e/0x25
    Kernel panic - not syncing: softlockup: blocked tasks
    Pid: 5, comm: watchdog/0 Not tainted 2.6.26-rc2-sched-devel.git #437
     [<c013d1db>] panic+0x49/0xfa
     [<c016b157>] watchdog+0x168/0x1d1
     [<c016afef>] ? watchdog+0x0/0x1d1
     [<c014d4d4>] kthread+0x3b/0x63
     [<c014d499>] ? kthread+0x0/0x63
     [<c011a737>] kernel_thread_helper+0x7/0x10
     =======================
    <---------
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/fs/filesystems.c b/fs/filesystems.c
index f37f872..1888ec7 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -11,7 +11,9 @@
 #include <linux/slab.h>
 #include <linux/kmod.h>
 #include <linux/init.h>
+#include <linux/smp_lock.h>
 #include <linux/module.h>
+
 #include <asm/uaccess.h>
 
 /*
@@ -219,6 +221,14 @@ struct file_system_type *get_fs_type(const char *name)
 	struct file_system_type *fs;
 	const char *dot = strchr(name, '.');
 	unsigned len = dot ? dot - name : strlen(name);
+	int bkl = kernel_locked();
+
+	/*
+	 * We request a module that might trigger user-space
+	 * tasks. So explicitly drop the BKL here:
+	 */
+	if (bkl)
+		unlock_kernel();
 
 	read_lock(&file_systems_lock);
 	fs = *(find_filesystem(name, len));
@@ -237,6 +247,8 @@ struct file_system_type *get_fs_type(const char *name)
 		put_filesystem(fs);
 		fs = NULL;
 	}
+	if (bkl)
+		lock_kernel();
 	return fs;
 }
 

commit fc6f051a95c8774abb950f287b4b5e7f710f6977
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed May 14 09:51:42 2008 +0200

    revert ("BKL: revert back to the old spinlock implementation")
    
    revert ("BKL: revert back to the old spinlock implementation"),
    commit 8e3e076c5a78519a9f64cd384e8f18bc21882ce0.
    
    Just a technical revert, it's easier to get the new anti-BKL code
    going with the sleeping lock.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig
index e856218..6a6409a 100644
--- a/arch/mn10300/Kconfig
+++ b/arch/mn10300/Kconfig
@@ -186,6 +186,17 @@ config PREEMPT
 	  Say Y here if you are building a kernel for a desktop, embedded
 	  or real-time system.  Say N if you are unsure.
 
+config PREEMPT_BKL
+	bool "Preempt The Big Kernel Lock"
+	depends on PREEMPT
+	default y
+	help
+	  This option reduces the latency of the kernel by making the
+	  big kernel lock preemptible.
+
+	  Say Y here if you are building a kernel for a desktop system.
+	  Say N if you are unsure.
+
 config MN10300_CURRENT_IN_E2
 	bool "Hold current task address in E2 register"
 	default y
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 181006c..897f723 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -72,14 +72,6 @@
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
-#if defined(CONFIG_PREEMPT)
-# define PREEMPT_INATOMIC_BASE kernel_locked()
-# define PREEMPT_CHECK_OFFSET 1
-#else
-# define PREEMPT_INATOMIC_BASE 0
-# define PREEMPT_CHECK_OFFSET 0
-#endif
-
 /*
  * Are we running in atomic context?  WARNING: this macro cannot
  * always detect atomic context; in particular, it cannot know about
@@ -87,11 +79,17 @@
  * used in the general case to determine whether sleeping is possible.
  * Do not use in_atomic() in driver code.
  */
-#define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_INATOMIC_BASE)
+#define in_atomic()		((preempt_count() & ~PREEMPT_ACTIVE) != 0)
+
+#ifdef CONFIG_PREEMPT
+# define PREEMPT_CHECK_OFFSET 1
+#else
+# define PREEMPT_CHECK_OFFSET 0
+#endif
 
 /*
  * Check whether we were atomic before we did preempt_disable():
- * (used by the scheduler, *after* releasing the kernel lock)
+ * (used by the scheduler)
  */
 #define in_atomic_preempt_off() \
 		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
diff --git a/kernel/sched.c b/kernel/sched.c
index 8841a91..59d20a5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4567,6 +4567,8 @@ EXPORT_SYMBOL(schedule);
 asmlinkage void __sched preempt_schedule(void)
 {
 	struct thread_info *ti = current_thread_info();
+	struct task_struct *task = current;
+	int saved_lock_depth;
 
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
@@ -4577,7 +4579,16 @@ asmlinkage void __sched preempt_schedule(void)
 
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 		schedule();
+		task->lock_depth = saved_lock_depth;
 		sub_preempt_count(PREEMPT_ACTIVE);
 
 		/*
@@ -4598,15 +4609,26 @@ EXPORT_SYMBOL(preempt_schedule);
 asmlinkage void __sched preempt_schedule_irq(void)
 {
 	struct thread_info *ti = current_thread_info();
+	struct task_struct *task = current;
+	int saved_lock_depth;
 
 	/* Catch callers which need to be fixed */
 	BUG_ON(ti->preempt_count || !irqs_disabled());
 
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
+
+		/*
+		 * We keep the big kernel semaphore locked, but we
+		 * clear ->lock_depth so that schedule() doesnt
+		 * auto-release the semaphore:
+		 */
+		saved_lock_depth = task->lock_depth;
+		task->lock_depth = -1;
 		local_irq_enable();
 		schedule();
 		local_irq_disable();
+		task->lock_depth = saved_lock_depth;
 		sub_preempt_count(PREEMPT_ACTIVE);
 
 		/*
@@ -5829,11 +5851,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	spin_unlock_irqrestore(&rq->lock, flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
-#if defined(CONFIG_PREEMPT)
-	task_thread_info(idle)->preempt_count = (idle->lock_depth >= 0);
-#else
 	task_thread_info(idle)->preempt_count = 0;
-#endif
+
 	/*
 	 * The idle tasks have their own, simple scheduling class:
 	 */
diff --git a/lib/kernel_lock.c b/lib/kernel_lock.c
index 01a3c22..cd3e825 100644
--- a/lib/kernel_lock.c
+++ b/lib/kernel_lock.c
@@ -11,121 +11,79 @@
 #include <linux/semaphore.h>
 
 /*
- * The 'big kernel lock'
+ * The 'big kernel semaphore'
  *
- * This spinlock is taken and released recursively by lock_kernel()
+ * This mutex is taken and released recursively by lock_kernel()
  * and unlock_kernel().  It is transparently dropped and reacquired
  * over schedule().  It is used to protect legacy code that hasn't
  * been migrated to a proper locking design yet.
  *
+ * Note: code locked by this semaphore will only be serialized against
+ * other code using the same locking facility. The code guarantees that
+ * the task remains on the same CPU.
+ *
  * Don't use in new code.
  */
-static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(kernel_flag);
-
+static DECLARE_MUTEX(kernel_sem);
 
 /*
- * Acquire/release the underlying lock from the scheduler.
+ * Re-acquire the kernel semaphore.
  *
- * This is called with preemption disabled, and should
- * return an error value if it cannot get the lock and
- * TIF_NEED_RESCHED gets set.
+ * This function is called with preemption off.
  *
- * If it successfully gets the lock, it should increment
- * the preemption count like any spinlock does.
- *
- * (This works on UP too - _raw_spin_trylock will never
- * return false in that case)
+ * We are executing in schedule() so the code must be extremely careful
+ * about recursion, both due to the down() and due to the enabling of
+ * preemption. schedule() will re-check the preemption flag after
+ * reacquiring the semaphore.
  */
 int __lockfunc __reacquire_kernel_lock(void)
 {
-	while (!_raw_spin_trylock(&kernel_flag)) {
-		if (test_thread_flag(TIF_NEED_RESCHED))
-			return -EAGAIN;
-		cpu_relax();
-	}
+	struct task_struct *task = current;
+	int saved_lock_depth = task->lock_depth;
+
+	BUG_ON(saved_lock_depth < 0);
+
+	task->lock_depth = -1;
+	preempt_enable_no_resched();
+
+	down(&kernel_sem);
+
 	preempt_disable();
+	task->lock_depth = saved_lock_depth;
+
 	return 0;
 }
 
 void __lockfunc __release_kernel_lock(void)
 {
-	_raw_spin_unlock(&kernel_flag);
-	preempt_enable_no_resched();
+	up(&kernel_sem);
 }
 
 /*
- * These are the BKL spinlocks - we try to be polite about preemption.
- * If SMP is not on (ie UP preemption), this all goes away because the
- * _raw_spin_trylock() will always succeed.
+ * Getting the big kernel semaphore.
  */
-#ifdef CONFIG_PREEMPT
-static inline void __lock_kernel(void)
+void __lockfunc lock_kernel(void)
 {
-	preempt_disable();
-	if (unlikely(!_raw_spin_trylock(&kernel_flag))) {
-		/*
-		 * If preemption was disabled even before this
-		 * was called, there's nothing we can be polite
-		 * about - just spin.
-		 */
-		if (preempt_count() > 1) {
-			_raw_spin_lock(&kernel_flag);
-			return;
-		}
+	struct task_struct *task = current;
+	int depth = task->lock_depth + 1;
 
+	if (likely(!depth))
 		/*
-		 * Otherwise, let's wait for the kernel lock
-		 * with preemption enabled..
+		 * No recursion worries - we set up lock_depth _after_
 		 */
-		do {
-			preempt_enable();
-			while (spin_is_locked(&kernel_flag))
-				cpu_relax();
-			preempt_disable();
-		} while (!_raw_spin_trylock(&kernel_flag));
-	}
-}
+		down(&kernel_sem);
 
-#else
-
-/*
- * Non-preemption case - just get the spinlock
- */
-static inline void __lock_kernel(void)
-{
-	_raw_spin_lock(&kernel_flag);
+	task->lock_depth = depth;
 }
-#endif
 
-static inline void __unlock_kernel(void)
+void __lockfunc unlock_kernel(void)
 {
-	/*
-	 * the BKL is not covered by lockdep, so we open-code the
-	 * unlocking sequence (and thus avoid the dep-chain ops):
-	 */
-	_raw_spin_unlock(&kernel_flag);
-	preempt_enable();
-}
+	struct task_struct *task = current;
 
-/*
- * Getting the big kernel lock.
- *
- * This cannot happen asynchronously, so we only need to
- * worry about other CPU's.
- */
-void __lockfunc lock_kernel(void)
-{
-	int depth = current->lock_depth+1;
-	if (likely(!depth))
-		__lock_kernel();
-	current->lock_depth = depth;
-}
+	BUG_ON(task->lock_depth < 0);
 
-void __lockfunc unlock_kernel(void)
-{
-	BUG_ON(current->lock_depth < 0);
-	if (likely(--current->lock_depth < 0))
-		__unlock_kernel();
+	if (likely(--task->lock_depth < 0))
+		up(&kernel_sem);
 }
 
 EXPORT_SYMBOL(lock_kernel);

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
@ 2008-05-14 18:30 ` Andi Kleen
  2008-05-14 21:00   ` Alan Cox
  2008-05-14 18:41 ` Linus Torvalds
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 18:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro

Ingo Molnar <mingo@elte.hu> writes:

> As some of the latency junkies on lkml already know it, commit 8e3e076 
> ("BKL: revert back to the old spinlock implementation") in v2.6.26-rc2 
> removed the preemptible BKL feature and made the Big Kernel Lock a 
> spinlock and thus turned it into non-preemptible code again. This commit 
> returned the BKL code to the 2.6.7 state of affairs in essence.

It's a reasonable start, but have you considered doing this work
in tree instead? As in just add all the warnings, but don't actually
change the semantics yet.  I suspect you would get far more users
this way and the work would go faster. 

It would be reasonable to enable this in -mm if it the warnings are
not too intrusive (self disable itself etc.)

Also for fixing the ioctls I'm not sure that dynamic instrumentation
will really work because it would be tough to execute them all.

I suspect some variant of static code analysis would make sense
for the ioctls. 

I used to do some auditing with cflow. That won't
catch indirect function calls unfortunately, but if there's 
some way to find those and bail out one could do an automated
tool that flags all the ioctls that don't sleep for example
(don't have any sleeping functions in the call chain -- this
might need some manual annotation, but hopefully not much)

Then it would be possible to safely switch those over to a blocking
mutex variant of BKL. 

Now there could be some more automated analysis here: for example the 
main other user of BKL is character open. I suspect to really
make progress here you would also need a open_unlocked() and
do the same for all the open functions etc.

> According to my quick & dirty git-log analysis, at the current pace of 
> BKL removal we'd have to wait more than 10 years to remove most BKL 
> critical sections from the kernel and to get acceptable latencies again.

Hmm, is BKL really that common still that it's a latency problem?
The few VFS cases like locks can be fixed without extreme measures.

Most of the legacy users are unlikely to be latency problems,
simply because only very few people (or nobody) still has that hardware
and the code will never run.

Also I wouldn't lose sleep over e.g. let ISDN continue using BKL forever.

-Andi

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
  2008-05-14 18:30 ` Andi Kleen
@ 2008-05-14 18:41 ` Linus Torvalds
  2008-05-14 19:41   ` Ingo Molnar
  2008-05-14 21:45 ` Jonathan Corbet
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 18:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro



On Wed, 14 May 2008, Ingo Molnar wrote:
> 
> Linus, Alan: the increased visibility and debuggability of the BKL 
> already uncovered a rather serious regression in upstream -git. You 
> might want to cherry pick this single fix, it will apply just fine to 
> current -git:

Ok, so I'm obviously happy. This is exactly the kind of thing I would want 
to see.

That said, the way it is now set up, it's unreasonable to merge anything 
directly, and while I can cherry-pick obvious fixes this way, I do think 
we could do things better.

It should be possible to set things up so that it's a config option, and 
we can mark it EXPERIMENTAL but still merge it into the standard kernel, 
so that we'd have the debug stuff there. That would get a lot more 
coverage, especially if it all still *works*, even if the debug stuff then 
complains (ie it would be nicer if the lock itself didn't start breaking).

So for example, have CONFIG_DEBUG_BKL turn it into a mutex (and select 
mutex debugging), and get all the debug coverage that way, but then when 
somebody enters the scheduler with the lock held, first complain, but then 
auto-release it anyway. That way, bugs get found and complained about, but 
hopefully the machine still ends up working.

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 18:41 ` Linus Torvalds
@ 2008-05-14 19:41   ` Ingo Molnar
  2008-05-14 20:05     ` Frederik Deweerdt
  0 siblings, 1 reply; 78+ messages in thread
From: Ingo Molnar @ 2008-05-14 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Linus, Alan: the increased visibility and debuggability of the BKL 
> > already uncovered a rather serious regression in upstream -git. You 
> > might want to cherry pick this single fix, it will apply just fine 
> > to current -git:
> 
> Ok, so I'm obviously happy. This is exactly the kind of thing I would 
> want to see.
> 
> That said, the way it is now set up, it's unreasonable to merge 
> anything directly, and while I can cherry-pick obvious fixes this way, 
> I do think we could do things better.

yeah. This is just a first approximation. It might be v2.6.27 stuff, if 
it stabilizes fast enough.

> It should be possible to set things up so that it's a config option, 
> and we can mark it EXPERIMENTAL but still merge it into the standard 
> kernel, so that we'd have the debug stuff there. That would get a lot 
> more coverage, especially if it all still *works*, even if the debug 
> stuff then complains (ie it would be nicer if the lock itself didn't 
> start breaking).

yeah. Will try to reshape it like that.

> So for example, have CONFIG_DEBUG_BKL turn it into a mutex (and select 
> mutex debugging), and get all the debug coverage that way, but then 
> when somebody enters the scheduler with the lock held, first complain, 
> but then auto-release it anyway. That way, bugs get found and 
> complained about, but hopefully the machine still ends up working.

hm, we'll got more ideas about other debug helpers, but i dont think 
warning in the scheduler is realistic or useful: lots and lots of code 
_does_ reschedule with the BKL held and always did - we never knew this 
before in a reliable way due to the auto-release. Sleeping locks that 
purely nest inside the BKL are the norm in the VFS, in the tty code and 
in most other places - they should be fine and are frequently taken in 
BKL sections (and frequently produce scheduling there).

As the BKL gets pushed inside subsystems, so do inner locks vanish from 
its scope - and at the final stage it can become a spinlock or mutex, 
depending on what the actual use is.

The main point with the mutex is to make the BKL _stricter_ and more 
defined - this hurts BKL using code more (see the many fixes that were 
needed), but it also makes things much more visible and much more 
fixable IMO. This tree turns the BKL into "just another mutex", with a 
tiny bit of self-recursion glue on top of it.

Btw., often there's potential scheduling at points where BKL using code 
does not expect it. So this series might also _fix_ some rare races.

The fact that this also makes BKL critical sections involuntarily 
preemptible is a side-effect (which is one of my main motivations to do 
this whole thing), and it's a pretty much unavoidable side-effect.

Also, turning it into a more or less simple mutex with no scheduler 
smarts at all, it all fits into our "how do we remove a serializing 
lock" workflow rather well. Even if for some piece of code not much 
changes in reality, it becomes more familar, less mystic and more 
trustable to fix and improve.

Btw., while i hacked on this today, i _think_ i've got most of the worst 
problems mapped out already. I needed two fixes to get it to boot to a 
ssh shell prompt without hanging. I needed 10 more fixes to solve all 
the dependencies that lockdep found. Another 5 fixes were exposed in 
more directed randconfig based testing in the second half of the day.

I've got a full desktop running on SMP on two boxes with lots of 
services enabled. There are three known problem areas:

 - reiser3. I've got three patches for but they are not pretty - see 
   them below. One of them widens BKL locking to the VFS. I'm not sure 
   it's worth fixing - we could declare reiser3 legacy && make it depend 
   on !DEBUG_BKL?

 - NFS. Even with "remove the BKL: restructure NFS code" there's a 
   lockdep splat when mounting NFS. Havent looked into it yet, Peter 
   says it's hairy code.

 - racy procfs dir entry creation methods. These will not result in
   outright hangs, but need to be reviewed then fixed or annotated away 
   because they are potentially racy - they'll show up as WARN_ON()s in 
   fs/proc/.

More will be found i'm sure, but also, about 80% of the fixes were not 
actual hangs but were proactive fixes based on lockdep warnings. Only 3 
out of the ~17 fixes were hang-induced. So i think even the current 
early form of it is quite hackable and debuggable.

	Ingo

---------------------
Subject: remove bkl: annotate reiserfs3
From: Ingo Molnar <mingo@elte.hu>
Date: Wed May 14 15:22:01 CEST 2008

reiserfs uses proc_create_data() with the BKL held;

WARNING: at fs/proc/generic.c:701 proc_create_data+0x33/0xc3()
Modules linked in:
Pid: 3193, comm: mount Not tainted 2.6.26-rc2-sched-devel.git #478
 [<c013d2ed>] warn_on_slowpath+0x41/0x6d
 [<c01571c0>] ? save_trace+0x37/0x8a
 [<c0157277>] ? add_lock_to_list+0x64/0x8a
 [<c01c7fea>] ? proc_register+0x2e/0x12e
 [<c04ae22c>] ? _spin_unlock+0x27/0x3c
 [<c01c811d>] proc_create_data+0x33/0xc3
 [<c01f6bd2>] add_file+0x23/0x2a
 [<c01f6c73>] ? show_version+0x0/0x3b
 [<c01f749a>] reiserfs_proc_info_init+0xab/0x136
 [<c01e4a3a>] reiserfs_fill_super+0xb97/0xc7d
 [<c02687f8>] ? vsnprintf+0x265/0x3fc
 [<c01cbd02>] ? disk_name+0x25/0x67
 [<c0195997>] get_sb_bdev+0xcd/0x10b
 [<c0190030>] ? cache_alloc_refill+0x53c/0x632
 [<c01a7609>] ? alloc_vfsmnt+0xe3/0x10b
 [<c01a7609>] ? alloc_vfsmnt+0xe3/0x10b
 [<c01e1f3d>] get_super_block+0x13/0x15
 [<c01e3ea3>] ? reiserfs_fill_super+0x0/0xc7d
 [<c019557f>] vfs_kern_mount+0x81/0xf7
 [<c0195639>] do_kern_mount+0x32/0xba
 [<c01a863d>] do_new_mount+0x46/0x74
 [<c01a8802>] do_mount+0x197/0x1b5
 [<c01586a7>] ? trace_hardirqs_on_caller+0xe0/0x115
 [<c04ace60>] ? mutex_lock_nested+0x222/0x22a
 [<c04ae4ab>] ? lock_kernel+0x1e/0x25
 [<c01a8884>] sys_mount+0x64/0x9b
 [<c0119a8a>] sysenter_past_esp+0x6a/0xa4
 =======================

but its use of proc_create_data() is safe here. Annotate that by dropping
the BKL around the procfs ops.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/reiserfs/procfs.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: linux/fs/reiserfs/procfs.c
===================================================================
--- linux.orig/fs/reiserfs/procfs.c
+++ linux/fs/reiserfs/procfs.c
@@ -494,6 +494,15 @@ int reiserfs_proc_info_init(struct super
 	spin_lock_init(&__PINFO(sb).lock);
 	REISERFS_SB(sb)->procdir = proc_mkdir(b, proc_info_root);
 	if (REISERFS_SB(sb)->procdir) {
+		int saved_lock_depth = current->lock_depth;
+
+		/*
+		 * This is in essence an annotation that tells procfs that
+		 * it is fine to call it with the BKL held (it causes
+		 * the kernel_locked() check to not trigger):
+		 */
+		current->lock_depth = -1;
+
 		REISERFS_SB(sb)->procdir->owner = THIS_MODULE;
 		REISERFS_SB(sb)->procdir->data = sb;
 		add_file(sb, "version", show_version);
@@ -503,6 +512,9 @@ int reiserfs_proc_info_init(struct super
 		add_file(sb, "on-disk-super", show_on_disk_super);
 		add_file(sb, "oidmap", show_oidmap);
 		add_file(sb, "journal", show_journal);
+
+		current->lock_depth = saved_lock_depth;
+
 		return 0;
 	}
 	reiserfs_warning(sb, "reiserfs: cannot create /proc/%s/%s",

------------>
Subject: remove: bkl sync supers dependency
From: Ingo Molnar <mingo@elte.hu>
Date: Wed May 14 16:10:36 CEST 2008

untangle this dependency:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc2-sched-devel.git #480
-------------------------------------------------------
pdflush/303 is trying to acquire lock:
 (kernel_mutex){--..}, at: [<c04ae4cb>] lock_kernel+0x1e/0x25

but task is already holding lock:
 (&type->s_lock_key#8){--..}, at: [<c0194b95>] lock_super+0x1b/0x1d

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&type->s_lock_key#8){--..}:
       [<c0159405>] __lock_acquire+0x97d/0xae6
       [<c01598da>] lock_acquire+0x4e/0x6c
       [<c04acd20>] mutex_lock_nested+0xc2/0x22a
       [<c0194b95>] lock_super+0x1b/0x1d
       [<c01956ff>] sync_supers+0x3e/0x99
       [<c017a3b9>] wb_kupdate+0x2a/0xdd
       [<c017a89d>] pdflush+0xf8/0x18d
       [<c014d5b4>] kthread+0x3b/0x63
       [<c011a737>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #1 (&type->s_umount_key#15){----}:
       [<c0159405>] __lock_acquire+0x97d/0xae6
       [<c01598da>] lock_acquire+0x4e/0x6c
       [<c04ad28a>] down_write+0x28/0x44
       [<c0194f67>] sget+0x1fd/0x339
       [<c0195910>] get_sb_bdev+0x46/0x10b
       [<c01e1f3d>] get_super_block+0x13/0x15
       [<c019557f>] vfs_kern_mount+0x81/0xf7
       [<c0195639>] do_kern_mount+0x32/0xba
       [<c01a863d>] do_new_mount+0x46/0x74
       [<c01a8802>] do_mount+0x197/0x1b5
       [<c01a8884>] sys_mount+0x64/0x9b
       [<c06dba90>] mount_block_root+0xa3/0x1e6
       [<c06dbc1f>] mount_root+0x4c/0x54
       [<c06dbd72>] prepare_namespace+0x14b/0x172
       [<c06db565>] kernel_init+0x217/0x226
       [<c011a737>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (kernel_mutex){--..}:
       [<c015932c>] __lock_acquire+0x8a4/0xae6
       [<c01598da>] lock_acquire+0x4e/0x6c
       [<c04acd20>] mutex_lock_nested+0xc2/0x22a
       [<c04ae4cb>] lock_kernel+0x1e/0x25
       [<c01e2839>] reiserfs_sync_fs+0x15/0x5b
       [<c01e288c>] reiserfs_write_super+0xd/0xf
       [<c0195719>] sync_supers+0x58/0x99
       [<c017a3b9>] wb_kupdate+0x2a/0xdd
       [<c017a89d>] pdflush+0xf8/0x18d
       [<c014d5b4>] kthread+0x3b/0x63
       [<c011a737>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

2 locks held by pdflush/303:
 #0:  (&type->s_umount_key#15){----}, at: [<c01956f8>] sync_supers+0x37/0x99
 #1:  (&type->s_lock_key#8){--..}, at: [<c0194b95>] lock_super+0x1b/0x1d

stack backtrace:
Pid: 303, comm: pdflush Not tainted 2.6.26-rc2-sched-devel.git #480
 [<c0157af7>] print_circular_bug_tail+0x5b/0x66
 [<c015743f>] ? print_circular_bug_entry+0x39/0x43
 [<c015932c>] __lock_acquire+0x8a4/0xae6
 [<c0158040>] ? find_usage_backwards+0x97/0xb6
 [<c01598da>] lock_acquire+0x4e/0x6c
 [<c04ae4cb>] ? lock_kernel+0x1e/0x25
 [<c04acd20>] mutex_lock_nested+0xc2/0x22a
 [<c04ae4cb>] ? lock_kernel+0x1e/0x25
 [<c04ae4cb>] ? lock_kernel+0x1e/0x25
 [<c04ae4cb>] lock_kernel+0x1e/0x25
 [<c01e2839>] reiserfs_sync_fs+0x15/0x5b
 [<c0194b95>] ? lock_super+0x1b/0x1d
 [<c01e288c>] reiserfs_write_super+0xd/0xf
 [<c0195719>] sync_supers+0x58/0x99
 [<c017a3b9>] wb_kupdate+0x2a/0xdd
 [<c01586e7>] ? trace_hardirqs_on+0xb/0xd
 [<c017a7a5>] ? pdflush+0x0/0x18d
 [<c017a89d>] pdflush+0xf8/0x18d
 [<c017a38f>] ? wb_kupdate+0x0/0xdd
 [<c014d5b4>] kthread+0x3b/0x63
 [<c014d579>] ? kthread+0x0/0x63
 [<c011a737>] kernel_thread_helper+0x7/0x10
 =======================

it's a hack, because it widens the BKL's scope. But it's needed
for every filesystem that takes the BKL, up until the point that
SB code can stop using the BKL.

NOT-Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/quota.c |    2 ++
 fs/super.c |    4 ++++
 2 files changed, 6 insertions(+)

Index: linux/fs/quota.c
===================================================================
--- linux.orig/fs/quota.c
+++ linux/fs/quota.c
@@ -206,10 +206,12 @@ restart:
 			continue;
 		sb->s_count++;
 		spin_unlock(&sb_lock);
+		lock_kernel();
 		down_read(&sb->s_umount);
 		if (sb->s_root && sb->s_qcop->quota_sync)
 			quota_sync_sb(sb, type);
 		up_read(&sb->s_umount);
+		unlock_kernel();
 		spin_lock(&sb_lock);
 		if (__put_super_and_need_restart(sb))
 			goto restart;
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -408,9 +408,11 @@ restart:
 		if (sb->s_dirt) {
 			sb->s_count++;
 			spin_unlock(&sb_lock);
+			lock_kernel();
 			down_read(&sb->s_umount);
 			write_super(sb);
 			up_read(&sb->s_umount);
+			unlock_kernel();
 			spin_lock(&sb_lock);
 			if (__put_super_and_need_restart(sb))
 				goto restart;
@@ -459,10 +461,12 @@ restart:
 			continue;	/* hm.  Was remounted r/o meanwhile */
 		sb->s_count++;
 		spin_unlock(&sb_lock);
+		lock_kernel();
 		down_read(&sb->s_umount);
 		if (sb->s_root && (wait || sb->s_dirt))
 			sb->s_op->sync_fs(sb, wait);
 		up_read(&sb->s_umount);
+		unlock_kernel();
 		/* restart only when sb is no longer on the list */
 		spin_lock(&sb_lock);
 		if (__put_super_and_need_restart(sb))

------------->

Subject: remove bkl: reiserfs fix
From: Ingo Molnar <mingo@elte.hu>
Date: Wed May 14 16:26:36 CEST 2008

avoid j_commit_lock deadlock. Since the down() can block it is
safe to drop the BKL here.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/reiserfs/journal.c |    2 ++
 fs/super.c            |    2 ++
 2 files changed, 4 insertions(+)

Index: linux/fs/reiserfs/journal.c
===================================================================
--- linux.orig/fs/reiserfs/journal.c
+++ linux/fs/reiserfs/journal.c
@@ -1044,8 +1044,10 @@ static int flush_commit_list(struct supe
 		}
 	}
 
+//	unlock_kernel();
 	/* make sure nobody is trying to flush this one at the same time */
 	down(&jl->j_commit_lock);
+//	lock_kernel();
 	if (!journal_list_still_alive(s, trans_id)) {
 		up(&jl->j_commit_lock);
 		goto put_jl;
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -180,10 +180,12 @@ void deactivate_super(struct super_block
 		s->s_count -= S_BIAS-1;
 		spin_unlock(&sb_lock);
 		DQUOT_OFF(s, 0);
+		lock_kernel();
 		down_write(&s->s_umount);
 		fs->kill_sb(s);
 		put_filesystem(fs);
 		put_super(s);
+		unlock_kernel();
 	}
 }
 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 19:41   ` Ingo Molnar
@ 2008-05-14 20:05     ` Frederik Deweerdt
  0 siblings, 0 replies; 78+ messages in thread
From: Frederik Deweerdt @ 2008-05-14 20:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro

Hi Ingo,
On Wed, May 14, 2008 at 09:41:22PM +0200, Ingo Molnar wrote:
> Subject: remove bkl: reiserfs fix
> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed May 14 16:26:36 CEST 2008
> 
> avoid j_commit_lock deadlock. Since the down() can block it is
> safe to drop the BKL here.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  fs/reiserfs/journal.c |    2 ++
>  fs/super.c            |    2 ++
>  2 files changed, 4 insertions(+)
> 
> Index: linux/fs/reiserfs/journal.c
> ===================================================================
> --- linux.orig/fs/reiserfs/journal.c
> +++ linux/fs/reiserfs/journal.c
> @@ -1044,8 +1044,10 @@ static int flush_commit_list(struct supe
>  		}
>  	}
>  
> +//	unlock_kernel();
 ^^^^
>  	/* make sure nobody is trying to flush this one at the same time */
>  	down(&jl->j_commit_lock);
> +//	lock_kernel();
 ^^^^
>  	if (!journal_list_still_alive(s, trans_id)) {
>  		up(&jl->j_commit_lock);
>  		goto put_jl;
Must be a typo?

Regards,
Frederik

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 18:30 ` Andi Kleen
@ 2008-05-14 21:00   ` Alan Cox
  2008-05-14 21:13     ` Andi Kleen
  0 siblings, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-14 21:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

> Most of the legacy users are unlikely to be latency problems,
> simply because only very few people (or nobody) still has that hardware
> and the code will never run.
> 
> Also I wouldn't lose sleep over e.g. let ISDN continue using BKL forever.

Most of the legacy users inflict that locking on other code - eg the ISN
use of the BKL directly impacts on the tty layer work.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:00   ` Alan Cox
@ 2008-05-14 21:13     ` Andi Kleen
  2008-05-14 21:16       ` H. Peter Anvin
  2008-05-14 21:19       ` Alan Cox
  0 siblings, 2 replies; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 21:13 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

Alan Cox wrote:
>> Most of the legacy users are unlikely to be latency problems,
>> simply because only very few people (or nobody) still has that hardware
>> and the code will never run.
>>
>> Also I wouldn't lose sleep over e.g. let ISDN continue using BKL forever.
> 
> Most 

Most?

>of the legacy users inflict that locking on other code - eg the ISN
> use of the BKL directly impacts on the tty layer work.

So you just stick unlock_kernel()/lock_kernel() around the call
to TTY (or similar to the entry points)

-Andi





^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:13     ` Andi Kleen
@ 2008-05-14 21:16       ` H. Peter Anvin
  2008-05-14 21:17         ` Alan Cox
  2008-05-14 21:19       ` Alan Cox
  1 sibling, 1 reply; 78+ messages in thread
From: H. Peter Anvin @ 2008-05-14 21:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Ingo Molnar, linux-kernel, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alexander Viro

Andi Kleen wrote:
> Alan Cox wrote:
>>> Most of the legacy users are unlikely to be latency problems,
>>> simply because only very few people (or nobody) still has that hardware
>>> and the code will never run.
>>>
>>> Also I wouldn't lose sleep over e.g. let ISDN continue using BKL forever.
>> Most 
> 
> Most?
> 
>> of the legacy users inflict that locking on other code - eg the ISN
>> use of the BKL directly impacts on the tty layer work.
> 
> So you just stick unlock_kernel()/lock_kernel() around the call
> to TTY (or similar to the entry points)
> 

... assuming that the ISDN code doesn't assume lock continuity across 
the TTY call.

	-hpa

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:16       ` H. Peter Anvin
@ 2008-05-14 21:17         ` Alan Cox
  0 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-14 21:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, Ingo Molnar, linux-kernel, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alexander Viro

> > So you just stick unlock_kernel()/lock_kernel() around the call
> > to TTY (or similar to the entry points)
> > 
> 
> ... assuming that the ISDN code doesn't assume lock continuity across 
> the TTY call.

And procfs and between the tty and the net config code and ...

Keeping the BKL just in legacy places doesn't work. A counting mutex (ie
one you can self multi-lock) might be very useful to fix some of these
however as once we push it down to the point of being a driver specific
lock we can just give it a driver mutex

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:13     ` Andi Kleen
  2008-05-14 21:16       ` H. Peter Anvin
@ 2008-05-14 21:19       ` Alan Cox
  2008-05-14 21:45         ` Linus Torvalds
  1 sibling, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-14 21:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

> Most?

Yes

> 
> >of the legacy users inflict that locking on other code - eg the ISN
> > use of the BKL directly impacts on the tty layer work.
> 
> So you just stick unlock_kernel()/lock_kernel() around the call
> to TTY (or similar to the entry points)

It isn't that simple - I've spent a good deal of time working on it.
There are lots of paths that rely on interactions between modules. Eg we
found stuff racing between the pid structs tty internals and procfs that
happened to be saved by the BKL.

That in itself is a problem Ingo's stuff won't help with: We have lots of
"magic" accidental, undocumented and pot luck BKL locking semantics
between subsystems that are not even visible.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45 ` Jonathan Corbet
@ 2008-05-14 21:39   ` Alan Cox
  2008-05-14 21:56   ` Linus Torvalds
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-14 21:39 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

> I kind of like the combination of 2 and 3, done in such a way that
> there's no "every driver must change" flag day.  This could be an
> interesting project, even...  Thoughts?

Its beginning to sound like should start 2.7 ;)

Alan


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
  2008-05-14 18:30 ` Andi Kleen
  2008-05-14 18:41 ` Linus Torvalds
@ 2008-05-14 21:45 ` Jonathan Corbet
  2008-05-14 21:39   ` Alan Cox
                     ` (3 more replies)
  2008-05-14 21:46 ` Alan Cox
                   ` (2 subsequent siblings)
  5 siblings, 4 replies; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-14 21:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel

Sez Ingo:

> This task is not easy at all. 12 years after Linux has been converted to 
> an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ 
> lock_kernel() critical sections and 800+ ioctls.

There's also every char device open() method - a rather long list in its
own right.  I'd be surprised if one in ten of them really needs it, but
one has to look...

I've been looking at the chrdev code anyway, and pondering on how this
might be addressed.  Here's some thoughts on alternatives, I'd be
curious what people think:

1: We could add an unlocked_open() to the file_operations structure;
   drivers could be converted over as they are verified not to need the
   BKL on open.  Disadvantages are that it grows this structure for a
   relatively rare case - most open() calls already don't need the BKL.
   But it's a relatively easy path without flag days.

2: Create a char_dev_ops structure for char devs and use it instead of
   file_operations.  I vaguely remember seeing Al mutter about that a
   while back.  Quite a while back.  This mirrors what was done with
   block devices, and makes some sense - there's a lot of stuff in
   struct file_operations which is not really applicable to char devs.
   Then struct char_dev_ops could have open() and locked_open(), with
   the latter destined for removal sometime around 2015 or so.

   Advantages are that it's cleaner and separates out some things which
   perhaps shouldn't be mixed anyway.  Disadvantage is...well...a fair
   amount of code churn.  It would also require chrdev-specific wrappers
   to map straight file_operations calls in the VFS to the new
   callbacks.

3: Provide a new form of cdev_add() which lets the driver indicate
   that the BKL is not needed on open (or anything else?).  At a
   minimum, it could just be a new parameter on cdev_add which has a
   value of zero or FIXME_I_STILL_NEED_BKL.  Still some churn but easier
   to script and smaller because a lot of drivers are still using
   register_chrdev() - something else worth fixing.

   A more involved form might provide a new chardev_add() which takes
   the new char_dev_ops structure too.  Mapping between new and old
   operations vectors would be done internally to avoid breaking older
   drivers before they can be fixed.

4: Just find every char dev open() function and shove in lock_kernel()
   calls, then remove the call from chrdev_open().  The disadvantage
   here is that, beyond lots of work and churn, there's no way to know
   which ones you missed.

I kind of like the combination of 2 and 3, done in such a way that
there's no "every driver must change" flag day.  This could be an
interesting project, even...  Thoughts?

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:19       ` Alan Cox
@ 2008-05-14 21:45         ` Linus Torvalds
  2008-05-14 22:03           ` Andi Kleen
  2008-05-15  8:02           ` Ingo Molnar
  0 siblings, 2 replies; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 21:45 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, Ingo Molnar, linux-kernel, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro



On Wed, 14 May 2008, Alan Cox wrote:
> 
> That in itself is a problem Ingo's stuff won't help with: We have lots of
> "magic" accidental, undocumented and pot luck BKL locking semantics
> between subsystems that are not even visible.

The good news is that I suspect they are going away. It probably is mainly 
tty and /proc by now, and /proc is pretty close to done.

It's hard to have too many inter-module dependencies when most of the core 
modules no longer even take the kernel lock any more.

In the VFS layer, we still have 

 - the ioctl thing, obviously. That's just mind-numbing "move things 
   down", not hard per se. But there's a *lot* of them (and I suspect the 
   huge majority of them don't actually need it, since they'd already be 
   racing against read/write anyway if they did).

 - default_llseek(). Probably the same, just a lot less of it.

 - superblock read/write.

and the latter one in particular is really dubious (we already have 
"[un]lock_super()" around it all, I think).

The core kernel, VM and networking already don't really do BKL. And it's 
seldom the case that subsystems interact with other unrelated subsystems 
outside of the core areas.

So it's a lot of work, no doubt, but I do think we should be able to do 
it. The most mind-numbing part is literally all the ioctl crud. There's 
more ioctl points than there are lock_kernel() calls left anywhere else.

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
                   ` (2 preceding siblings ...)
  2008-05-14 21:45 ` Jonathan Corbet
@ 2008-05-14 21:46 ` Alan Cox
  2008-05-14 22:11   ` Linus Torvalds
  2008-05-14 22:15   ` Andi Kleen
  2008-05-15 17:41 ` Linus Torvalds
  2008-05-17  0:14 ` Kevin Winchester
  5 siblings, 2 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-14 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro

Out of amusement I took the watchdog drivers and started looking for
large cans of worms in the BKL drop arena.

Here is a fun one for general discussion - right now driver probe
functions request resources. We have no ordering on the requests so we
have deadlocks if two drivers do resource requests for conflicting
resources in reverse order.

"Discuss"

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45 ` Jonathan Corbet
  2008-05-14 21:39   ` Alan Cox
@ 2008-05-14 21:56   ` Linus Torvalds
  2008-05-14 22:07     ` Jonathan Corbet
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  2008-05-14 22:11   ` [announce] "kill the Big Kernel Lock (BKL)" tree Andi Kleen
  2008-05-15  8:44   ` Jan Engelhardt
  3 siblings, 2 replies; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 21:56 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel



On Wed, 14 May 2008, Jonathan Corbet wrote:
> 
> There's also every char device open() method - a rather long list in its
> own right.  I'd be surprised if one in ten of them really needs it, but
> one has to look...

I don't think there are *that* many. I found only 83 instances of 
"register_chrdev()" in the kernel, so the open methods should be pretty 
limited.

Of course, some open methods call other sub-registrations, but you'd start 
off by moving the lock_kernel() down just *one* stage. 

So it literally should be:
 - remove one lock_kernel/unlock_kernel pair in fs/char_dev.c
 - add max 83 pairs in the places that register those things
 - external modules will need to add it themselves some day.

> 1: We could add an unlocked_open() to the file_operations structure;
>    drivers could be converted over as they are verified not to need the
>    BKL on open.  Disadvantages are that it grows this structure for a
>    relatively rare case - most open() calls already don't need the BKL.
>    But it's a relatively easy path without flag days.

I really don't think it's worth the pain. See above. The numbers aren't 
that huge, and external modules simply aren't a pressing enough issue.

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45         ` Linus Torvalds
@ 2008-05-14 22:03           ` Andi Kleen
  2008-05-15 13:34             ` Alan Cox
  2008-05-15  8:02           ` Ingo Molnar
  1 sibling, 1 reply; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 22:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Ingo Molnar, linux-kernel, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

Linus Torvalds wrote:
> 
> On Wed, 14 May 2008, Alan Cox wrote:
>> That in itself is a problem Ingo's stuff won't help with: We have lots of
>> "magic" accidental, undocumented and pot luck BKL locking semantics
>> between subsystems that are not even visible.
> 
> The good news is that I suspect they are going away. It probably is mainly 
> tty and /proc by now, and /proc is pretty close to done.

Character devices in general.

And what's pretty nasty is that some interfaces force BKL still, so
not even new code can opt out.

> In the VFS layer, we still have 
> 
>  - the ioctl thing, obviously. That's just mind-numbing "move things 
>    down", not hard per se. But there's a *lot* of them (and I suspect the 
>    huge majority of them don't actually need it, since they'd already be 
>    racing against read/write anyway if they did).
> 
>  - default_llseek(). Probably the same, just a lot less of it.

I had some patches for those.

> 
>  - superblock read/write.
> 

- fasync

[had some patches for "fasync_locked", not sure if it's worth it]

- character device open

That's a nasty one. Either open_unlocked or a special cdev_init?

> and the latter one in particular is really dubious (we already have 
> "[un]lock_super()" around it all, I think).
> 
> The core kernel, VM and networking already don't really do BKL. And it's 
> seldom the case that subsystems interact with other unrelated subsystems 
> outside of the core areas.
> 
> So it's a lot of work, no doubt, but I do think we should be able to do 
> it. The most mind-numbing part is literally all the ioctl crud. There's 
> more ioctl points than there are lock_kernel() calls left anywhere else.

I tried to recruit kernel janitors some time ago to just do all the
ioctl -> ioctl_unlocked/explicit lock_kernel changes. There were a few
patches generated but the effort died down then.

BTW for ioctl the dynamic instrumentation method proposed also won't
work because it's basically impossible to exercise all these ioctls

-Andi



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:56   ` Linus Torvalds
@ 2008-05-14 22:07     ` Jonathan Corbet
  2008-05-14 22:14       ` Linus Torvalds
  2008-05-22 20:20       ` Alan Cox
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  1 sibling, 2 replies; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-14 22:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I don't think there are *that* many. I found only 83 instances of 
> "register_chrdev()" in the kernel, so the open methods should be pretty 
> limited.

There's the drivers calling cdev_add() directly as well - another
40ish.  Still not a huge list, I guess.
 
> So it literally should be:
>  - remove one lock_kernel/unlock_kernel pair in fs/char_dev.c
>  - add max 83 pairs in the places that register those things
>  - external modules will need to add it themselves some day.

This is all certainly doable, but it leaves me with one concern: there
will be no signal to external module maintainers that the change needs
to be made.  So, beyond doubt, quite a few of them will just continue to
be shipped unfixed - and they will still run.  If any of them actually
*need* the BKL, something awful may happen to somebody someday.

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:46 ` Alan Cox
@ 2008-05-14 22:11   ` Linus Torvalds
  2008-05-14 22:15   ` Andi Kleen
  1 sibling, 0 replies; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 22:11 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro



On Wed, 14 May 2008, Alan Cox wrote:
> 
> Here is a fun one for general discussion - right now driver probe
> functions request resources. We have no ordering on the requests so we
> have deadlocks if two drivers do resource requests for conflicting
> resources in reverse order.

resource requests aren't blocking, so it wouldn't actually be a deadlock. 
It would just be a "both failed, try again".

That said, two drivers shouldn't be probing the same hardware at the same 
time regardless, so I can't imagine that it's much of a problem in real 
life. 

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45 ` Jonathan Corbet
  2008-05-14 21:39   ` Alan Cox
  2008-05-14 21:56   ` Linus Torvalds
@ 2008-05-14 22:11   ` Andi Kleen
  2008-05-14 22:16     ` Linus Torvalds
  2008-05-15  8:44   ` Jan Engelhardt
  3 siblings, 1 reply; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 22:11 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

corbet@lwn.net (Jonathan Corbet) writes:
>
> 3: Provide a new form of cdev_add() which lets the driver indicate
>    that the BKL is not needed on open (or anything else?).  At a
>    minimum, it could just be a new parameter on cdev_add which has a
>    value of zero or FIXME_I_STILL_NEED_BKL.  Still some churn but easier
>    to script and smaller because a lot of drivers are still using
>    register_chrdev() - something else worth fixing.
>
>    A more involved form might provide a new chardev_add() which takes
>    the new char_dev_ops structure too.  Mapping between new and old
>    operations vectors would be done internally to avoid breaking older
>    drivers before they can be fixed.
>
> 4: Just find every char dev open() function and shove in lock_kernel()
>    calls, then remove the call from chrdev_open().  The disadvantage
>    here is that, beyond lots of work and churn, there's no way to know
>    which ones you missed.

In general when changing semantics drastically you should force
compile errors by renaming the respective entry point. That
has been the standard Linux method for this for years.

I've been also pondering a variant of 1, but 3 might be better.

> I kind of like the combination of 2 and 3, done in such a way that
> there's no "every driver must change" flag day.  This could be an
> interesting project, even...  Thoughts?

I doubt it will be very interesting, but it would be useful.
The goal less being to get rid of BKL in old drivers, but not 
requiring BKL in new drivers. Basically all BKL assumptions
in interfaces really should go.

-Andi

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:07     ` Jonathan Corbet
@ 2008-05-14 22:14       ` Linus Torvalds
  2008-05-22 20:20       ` Alan Cox
  1 sibling, 0 replies; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 22:14 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel



On Wed, 14 May 2008, Jonathan Corbet wrote:
> 
> This is all certainly doable, but it leaves me with one concern: there
> will be no signal to external module maintainers that the change needs
> to be made.  So, beyond doubt, quite a few of them will just continue to
> be shipped unfixed - and they will still run.  If any of them actually
> *need* the BKL, something awful may happen to somebody someday.

External modules have bugs because interfaces change. Film at 11.

It's true, but it definitely shouldn't keep us from just doing it. 
Especially since well-maintained external modules (ie the authors follow 
big discussions like this) can just take the kernel lock regardless of 
kernel version, since it won't even be broken with old kernels.

Of course, well-maintained kernel modules wouldn't depend on the BKL in 
the first place. Oh, well.

		Linus



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:46 ` Alan Cox
  2008-05-14 22:11   ` Linus Torvalds
@ 2008-05-14 22:15   ` Andi Kleen
  1 sibling, 0 replies; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 22:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ingo Molnar, linux-kernel, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> Out of amusement I took the watchdog drivers and started looking for
> large cans of worms in the BKL drop arena.
>
> Here is a fun one for general discussion - right now driver probe
> functions request resources. We have no ordering on the requests so we
> have deadlocks if two drivers do resource requests for conflicting
> resources in reverse order.

What deadlocks? resource allocation normally doesn't block. So if there's
a ordering issue one of them will fail and should bail out.

That said if you have conflicting resources then failing is the correct
behavior anyways.

-Andi

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:11   ` [announce] "kill the Big Kernel Lock (BKL)" tree Andi Kleen
@ 2008-05-14 22:16     ` Linus Torvalds
  2008-05-14 22:21       ` Andi Kleen
  0 siblings, 1 reply; 78+ messages in thread
From: Linus Torvalds @ 2008-05-14 22:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jonathan Corbet, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel



On Thu, 15 May 2008, Andi Kleen wrote:
>
> The goal less being to get rid of BKL in old drivers, but not 
> requiring BKL in new drivers. Basically all BKL assumptions
> in interfaces really should go.

No, we really do want to get rid of BKL in old drivers too. Or at least in 
the interfaces.

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:16     ` Linus Torvalds
@ 2008-05-14 22:21       ` Andi Kleen
  2008-05-15 13:30         ` Alan Cox
  2008-05-15 15:05         ` John Stoffel
  0 siblings, 2 replies; 78+ messages in thread
From: Andi Kleen @ 2008-05-14 22:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jonathan Corbet, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 15 May 2008, Andi Kleen wrote:
>> The goal less being to get rid of BKL in old drivers, but not 
>> requiring BKL in new drivers. Basically all BKL assumptions
>> in interfaces really should go.
> 
> No, we really do want to get rid of BKL in old drivers too. Or at least in 
> the interfaces.

In the interfaces definitely yes and all subsystems should have their
own lock_kernel calls, but why in the old drivers? For those it's very
unlikely they are used on any SMP system anyways (e.g. anything
depending on CONFIG_ISA) or if they do only on 2 CPU systems.

Of course if you can find someone to do the work it wouldn't be
bad, just wouldn't seem like a particularly useful investment of time to
me.

Also it would be bad if the people who did such conversions didn't
actually test it and that's a great danger with many old drivers because
nearly nobody has the hardware (and if they do it won't be in a SMP system)

-Andi


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45         ` Linus Torvalds
  2008-05-14 22:03           ` Andi Kleen
@ 2008-05-15  8:02           ` Ingo Molnar
  1 sibling, 0 replies; 78+ messages in thread
From: Ingo Molnar @ 2008-05-15  8:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Andi Kleen, linux-kernel, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 14 May 2008, Alan Cox wrote:
> > 
> > That in itself is a problem Ingo's stuff won't help with: We have 
> > lots of "magic" accidental, undocumented and pot luck BKL locking 
> > semantics between subsystems that are not even visible.
> 
> The good news is that I suspect they are going away. It probably is 
> mainly tty and /proc by now, and /proc is pretty close to done.
> 
> It's hard to have too many inter-module dependencies when most of the 
> core modules no longer even take the kernel lock any more.

yeah. And Alan has a good point: there _is_ lots of magic stuff 
happening below the BKL. One of them are the BKL <-> other lock 
dependencies. My stuff helps with mapping that part of the magic: it 
turns the BKL into an ordinary mutex and thus integrates it into 
lockdep's existing dependency validation machinery.

In other words: this stuff makes BKL validation _stronger_, not weaker, 
and hence it ultimately helps its mapping and elimination. It turns the 
"magic" into something more concrete.

It might not help with other magic directly - but it helps indirectly, 
because now the "magic" has shrunk, so there's more attention and more 
resources available to fix it in the places where the magic hurts. (And 
suggestions are welcome for more debug helpers to make more magic more 
visible.)

Whenever someone narrows the BKL's scope, that will always have to be 
done carefully - and that's true of any other lock. This patchset 
(except perhaps the boot bits) does not narrow the BKL's scope.

It will still be no doubt a tough job (reducing/changing locking of 
_any_ locked path is a tough job), but it will now fit into our existing 
practices much better and we'll get various reminders from lockdep and 
the other debug helpers when we forgot about some detail.

Before this there was almost zero feedback from the kernel when 
something around the BKL broke: pretty much the only remainder we had 
from incorrect BKL elimination were subtle breakages.

And my personal experience might matter as well: before this i never 
dared to touch BKL code. I once removed _all_ BKL locking from all the 
kernel _by accident_ [i typoed a single line in lib/smp_lock.c] and ran 
it on my main desktop for about a day and never noticed a thing - until 
a few weird TTY messages popped up in the syslog...

But with this scheme, i felt _much_ more secure about touching BKL code, 
and kicking the BKL from the scheduler was pure joy i have to say. (even 
though it will of course remain in the upstream scheduler until we are 
reasonably sure about the stability of this whole kill-the-BKL approach)

I'm sure other subsystem maintainers will have a similar experience.

	Ingo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 21:45 ` Jonathan Corbet
                     ` (2 preceding siblings ...)
  2008-05-14 22:11   ` [announce] "kill the Big Kernel Lock (BKL)" tree Andi Kleen
@ 2008-05-15  8:44   ` Jan Engelhardt
  2008-05-15 14:54     ` Diego Calleja
  3 siblings, 1 reply; 78+ messages in thread
From: Jan Engelhardt @ 2008-05-15  8:44 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel


On Wednesday 2008-05-14 23:45, Jonathan Corbet wrote:
>Sez Ingo:
>1: We could add an unlocked_open() to the file_operations structure;
>   drivers could be converted over as they are verified not to need the
>   BKL on open.  Disadvantages are that it grows this structure for a
>   relatively rare case - most open() calls already don't need the BKL.
>   But it's a relatively easy path without flag days.

1b: add a .locked_open and move all BKL-requiring code to use that.
When time comes and BKL is gone, .locked_open can be removed again,
and no rename was ever done for BKL-free code.

>2: Create a char_dev_ops structure for char devs and use it instead of
>   file_operations.  I vaguely remember seeing Al mutter about that a
>   while back.  Quite a while back.  This mirrors what was done with
>   block devices, and makes some sense - there's a lot of stuff in
>   struct file_operations which is not really applicable to char devs.
>   Then struct char_dev_ops could have open() and locked_open(), with
>   the latter destined for removal sometime around 2015 or so.

Iff you create a new char_dev_ops, don't clutter it with the old stuff.
BKL-using code could continue using file_operations, would not it?

>3: Provide a new form of cdev_add() which lets the driver indicate
>   that the BKL is not needed on open (or anything else?).  At a

This is the BSD/Solaris tactic, heh :)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:21       ` Andi Kleen
@ 2008-05-15 13:30         ` Alan Cox
  2008-05-15 15:05         ` John Stoffel
  1 sibling, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-15 13:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Jonathan Corbet, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel

> own lock_kernel calls, but why in the old drivers? For those it's very
> unlikely they are used on any SMP system anyways (e.g. anything
> depending on CONFIG_ISA) or if they do only on 2 CPU systems.

Because

- You need to verify the locking assumptions that remain are entirely
driver internal
- At the point you achieve that you've done *ALL* the work required to
add a driver specific lock


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:03           ` Andi Kleen
@ 2008-05-15 13:34             ` Alan Cox
  2008-05-15 14:27               ` Andi Kleen
  0 siblings, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-15 13:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

> >  - superblock read/write.
> > 
> - fasync
> 
> [had some patches for "fasync_locked", not sure if it's worth it]
> 
> - character device open

file structures - as well

> I tried to recruit kernel janitors some time ago to just do all the
> ioctl -> ioctl_unlocked/explicit lock_kernel changes. There were a few
> patches generated but the effort died down then.

Start at the other end - you can't fix the ioctls until you fix what the
ioctls interact with. That ends up at the basic data structure and once
you fix those the rest just starts to fall into place.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 13:34             ` Alan Cox
@ 2008-05-15 14:27               ` Andi Kleen
  2008-05-15 15:36                 ` Alan Cox
  0 siblings, 1 reply; 78+ messages in thread
From: Andi Kleen @ 2008-05-15 14:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, linux-kernel,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alexander Viro

On Thu, May 15, 2008 at 02:34:30PM +0100, Alan Cox wrote:
> > >  - superblock read/write.
> > > 
> > - fasync
> > 
> > [had some patches for "fasync_locked", not sure if it's worth it]
> > 
> > - character device open
> 
> file structures - as well

How so?

> 
> > I tried to recruit kernel janitors some time ago to just do all the
> > ioctl -> ioctl_unlocked/explicit lock_kernel changes. There were a few
> > patches generated but the effort died down then.
> 
> Start at the other end - you can't fix the ioctls until you fix what the
> ioctls interact with. That ends up at the basic data structure and once
> you fix those the rest just starts to fall into place.

In my experience there's usually not too much interaction with other
kernel structures at the random driver ioctl level. I'm sure tty is
different, but it's probably not typical.

-Andi
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15  8:44   ` Jan Engelhardt
@ 2008-05-15 14:54     ` Diego Calleja
  0 siblings, 0 replies; 78+ messages in thread
From: Diego Calleja @ 2008-05-15 14:54 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Jonathan Corbet, Ingo Molnar, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel

El Thu, 15 May 2008 10:44:57 +0200 (CEST), Jan Engelhardt <jengelh@medozas.de> escribió:

> 1b: add a .locked_open and move all BKL-requiring code to use that.
> When time comes and BKL is gone, .locked_open can be removed again,
> and no rename was ever done for BKL-free code.

1c: make the BKL-unsafe drivers depend on !SMP && !PREEMPT.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:21       ` Andi Kleen
  2008-05-15 13:30         ` Alan Cox
@ 2008-05-15 15:05         ` John Stoffel
  2008-05-15 15:10           ` Andi Kleen
  1 sibling, 1 reply; 78+ messages in thread
From: John Stoffel @ 2008-05-15 15:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Jonathan Corbet, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel

>>>>> "Andi" == Andi Kleen <andi@firstfloor.org> writes:

Andi> Linus Torvalds wrote:
>> 
>> On Thu, 15 May 2008, Andi Kleen wrote:
>>> The goal less being to get rid of BKL in old drivers, but not 
>>> requiring BKL in new drivers. Basically all BKL assumptions
>>> in interfaces really should go.
>> 
>> No, we really do want to get rid of BKL in old drivers too. Or at least in 
>> the interfaces.

Andi> In the interfaces definitely yes and all subsystems should have
Andi> their own lock_kernel calls, but why in the old drivers? For
Andi> those it's very unlikely they are used on any SMP system anyways
Andi> (e.g. anything depending on CONFIG_ISA) or if they do only on 2
Andi> CPU systems.

I'm still running an SMP server with ISA slots.  I'd love to
contribute testing and possibly coding to this effort, but
realisticlly I'll be able to compile and boot stuff.

John

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 15:05         ` John Stoffel
@ 2008-05-15 15:10           ` Andi Kleen
  2008-05-15 15:18             ` John Stoffel
  0 siblings, 1 reply; 78+ messages in thread
From: Andi Kleen @ 2008-05-15 15:10 UTC (permalink / raw)
  To: John Stoffel
  Cc: Linus Torvalds, Jonathan Corbet, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel

John Stoffel wrote:
>>>>>> "Andi" == Andi Kleen <andi@firstfloor.org> writes:
> 
> Andi> Linus Torvalds wrote:
>>> On Thu, 15 May 2008, Andi Kleen wrote:
>>>> The goal less being to get rid of BKL in old drivers, but not 
>>>> requiring BKL in new drivers. Basically all BKL assumptions
>>>> in interfaces really should go.
>>> No, we really do want to get rid of BKL in old drivers too. Or at least in 
>>> the interfaces.
> 
> Andi> In the interfaces definitely yes and all subsystems should have
> Andi> their own lock_kernel calls, but why in the old drivers? For
> Andi> those it's very unlikely they are used on any SMP system anyways
> Andi> (e.g. anything depending on CONFIG_ISA) or if they do only on 2
> Andi> CPU systems.
> 
> I'm still running an SMP server with ISA slots. 

I do too (although one CPU has died recently), but how many ISA devices
do you use in it? Mine used to have a ISA ISDN card, but that was it
and then no ISA anymore even though the slots are still in there.

Also on 2 CPU systems BKL is not that critical anyways. It only starts
to hurt on larger CPU counts.

-Andi


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 15:10           ` Andi Kleen
@ 2008-05-15 15:18             ` John Stoffel
  2008-05-15 15:45               ` Andi Kleen
  0 siblings, 1 reply; 78+ messages in thread
From: John Stoffel @ 2008-05-15 15:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: John Stoffel, Linus Torvalds, Jonathan Corbet, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alan Cox,
	Alexander Viro, linux-kernel

>>>>> "Andi" == Andi Kleen <andi@firstfloor.org> writes:

Andi> John Stoffel wrote:
>>>>>>> "Andi" == Andi Kleen <andi@firstfloor.org> writes:
>> 
Andi> Linus Torvalds wrote:
>>>> On Thu, 15 May 2008, Andi Kleen wrote:
>>>>> The goal less being to get rid of BKL in old drivers, but not 
>>>>> requiring BKL in new drivers. Basically all BKL assumptions
>>>>> in interfaces really should go.
>>>> No, we really do want to get rid of BKL in old drivers too. Or at least in 
>>>> the interfaces.
>> 
Andi> In the interfaces definitely yes and all subsystems should have
Andi> their own lock_kernel calls, but why in the old drivers? For
Andi> those it's very unlikely they are used on any SMP system anyways
Andi> (e.g. anything depending on CONFIG_ISA) or if they do only on 2
Andi> CPU systems.
>> 
>> I'm still running an SMP server with ISA slots. 

Andi> I do too (although one CPU has died recently), but how many ISA
Andi> devices do you use in it? Mine used to have a ISA ISDN card, but
Andi> that was it and then no ISA anymore even though the slots are
Andi> still in there.

I must admit I used to have an ISA Cyclades 8 port serial card running
in there, but now it's all PCI stuff.  It only had one ISA slot.  So
yes, ISA SMP boxes are slowly dying, but they'll be around for a long
time to come.

Andi> Also on 2 CPU systems BKL is not that critical anyways. It only
Andi> starts to hurt on larger CPU counts.

True, but with the growing number of multicore systems, esp on
desktops, it's going to be an issue.  How will the BKL work on quad
core boxes?

I'm going to shutup now,  since I don't have the knowledge to really
contribute much to the discussion.  

John




^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 14:27               ` Andi Kleen
@ 2008-05-15 15:36                 ` Alan Cox
  2008-05-16 10:21                   ` Andi Kleen
  0 siblings, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-15 15:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Linus Torvalds, Ingo Molnar, linux-kernel,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alexander Viro


> > file structures - as well
> 
> How so?

Take a look at how file->f_flags is locked.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 15:18             ` John Stoffel
@ 2008-05-15 15:45               ` Andi Kleen
  0 siblings, 0 replies; 78+ messages in thread
From: Andi Kleen @ 2008-05-15 15:45 UTC (permalink / raw)
  To: John Stoffel
  Cc: Andi Kleen, Linus Torvalds, Jonathan Corbet, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alan Cox,
	Alexander Viro, linux-kernel

> I must admit I used to have an ISA Cyclades 8 port serial card running
> in there, but now it's all PCI stuff.  It only had one ISA slot.  So
> yes, ISA SMP boxes are slowly dying, but they'll be around for a long
> time to come.

My point was that for those few users it's actually better to keep
the BKL. If you try to remove it on some old driver you cannot test
due to lack of hardware the risk of breaking the driver is higher than the 
gain you would get from removing it.  The best you can do in this
legacy code is to keep it running with minimal changes in the old
tested state (although for some of it it's doubtful it still does actually)

> 
> Andi> Also on 2 CPU systems BKL is not that critical anyways. It only
> Andi> starts to hurt on larger CPU counts.
> 
> True, but with the growing number of multicore systems, esp on
> desktops, it's going to be an issue.  

Multicore systems don't have ISA slots.

[yes I'm sure someone will tell me now about the ISA-over-USB device
that exists. Don't bother, it doesn't add anything to the point]

-Andi

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
                   ` (3 preceding siblings ...)
  2008-05-14 21:46 ` Alan Cox
@ 2008-05-15 17:41 ` Linus Torvalds
  2008-05-15 20:27   ` Arjan van de Ven
  2008-05-17  0:14 ` Kevin Winchester
  5 siblings, 1 reply; 78+ messages in thread
From: Linus Torvalds @ 2008-05-15 17:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro


So looking a bit more at your trivial fixups, I'd suggest strongly that 
they be re-organized a bit.

I cherry-picked your tty layer thing, because it was a real fix.

On Wed, 14 May 2008, Ingo Molnar wrote:
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 6eab9bf..e12e571 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>  
>  static int rpc_wait_bit_killable(void *word)
>  {
> +	int bkl = kernel_locked();
> +
>  	if (fatal_signal_pending(current))
>  		return -ERESTARTSYS;
> +	if (bkl)
> +		unlock_kernel();
>  	schedule();
> +	if (bkl)
> +		lock_kernel();
>  	return 0;
>  }

The above doesn't even work in general. It depends on having just a single 
level of locking, and is ugly to boot. So wow about we just expose some 
version of

	depth = release_kernel_lock()
	..
	reacquire_kernel_lock(depth);

to existing BKL users as a way to safely release and re-aquire it 
regardless of depth. That makes the code more generic, but it *also* makes 
it more readable than that "if (bkl) [un]lock_kernel()" sequence.

> commit 89c25297465376321cf54438d86441a5947bbd11
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 15:10:37 2008 +0200
> 
>     remove the BKL: do not take the BKL in init code

I think this one should be changed - that comment says "do not take", but 
in fact you still do take it, you just release it earlier. So we should 
just not start out with the kernel locked in the first place. The BKL 
doesn't do anything for the init sequence anyway, since all of this is for 
code that runs before there even are any other threads (not counting the 
idle thread).

I don't see anything in there that could *possibly* depend on the kernel 
lock, and if there is anything, it would need to be fixed anyway.

> commit 5fff2843de609b77d4590e87de5c976b8ac1aacd
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 14:30:33 2008 +0200
> 
>     remove the BKL: procfs debug helper and BKL elimination

ACK. Code that relies on this is broken anyway. 

We used to have lots of "proc_create()" followed by setup code that could 
race with "proc_lookup()", but they were fundamentally racy anyway, so 
this should happen regardless of any other BKL removal.

> commit b07e615cf0f731d53a3ab431f44b1fe6ef4576e6
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 14:19:52 2008 +0200
> 
>     remove the BKL: request_module() debug helper

See the above comment about "release/reaquire_kernel_lock()".

> commit d31eec64e76a4b0795b5a6b57f2925d57aeefda5
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 13:47:58 2008 +0200
> 
>     remove the BKL: tty updates

Same thing.

> commit 3a0bf25bb160233b902962457ce917df27550850
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 11:34:13 2008 +0200
> 
>     remove the BKL: reduce misc_open() BKL dependency

Same thing.

> commit 79b2b296c31fa07e8868a6c622d766bb567f6655
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 11:30:35 2008 +0200
> 
>     remove the BKL: change get_fs_type() BKL dependency

Same thing.

> commit fc6f051a95c8774abb950f287b4b5e7f710f6977
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed May 14 09:51:42 2008 +0200
> 
>     revert ("BKL: revert back to the old spinlock implementation")

And obviously this one I'd never take. It would need to work with a 
working BKL implementation.

			Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 17:41 ` Linus Torvalds
@ 2008-05-15 20:27   ` Arjan van de Ven
  2008-05-15 20:45     ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Arjan van de Ven @ 2008-05-15 20:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, linux-kernel, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro

On Thu, 15 May 2008 10:41:54 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> So looking a bit more at your trivial fixups, I'd suggest strongly
> that they be re-organized a bit.

> >
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 6eab9bf..e12e571 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >  
> >  static int rpc_wait_bit_killable(void *word)
> >  {
> > +	int bkl = kernel_locked();
> > +
> >  	if (fatal_signal_pending(current))
> >  		return -ERESTARTSYS;
> > +	if (bkl)
> > +		unlock_kernel();
> >  	schedule();
> > +	if (bkl)
> > +		lock_kernel();
> >  	return 0;
> >  }
> 
> The above doesn't even work in general. It depends on having just a
> single level of locking, and is ugly to boot. So wow about we just
> expose some version of
> 
> 	depth = release_kernel_lock()
> 	..
> 	reacquire_kernel_lock(depth);
> 
> to existing BKL users as a way to safely release and re-aquire it 
> regardless of depth. That makes the code more generic, but it *also*
> makes it more readable than that "if (bkl) [un]lock_kernel()"
> sequence.


can we make this even more specific/restricted? Like having something
like

call_bkl_unlocked(function_pointer, argument);

or something that will internally do the full unlock and then the
function call. The last thing we need is another nailgun that BKL using
code can use to staple themselves to something big and fast moving.
By having a more restricted interface... less likely.
Maybe we can even get away with only a

drop_bkl_and_schedule();

and nothing else.




^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 20:27   ` Arjan van de Ven
@ 2008-05-15 20:45     ` Peter Zijlstra
  2008-05-15 21:22       ` Arjan van de Ven
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2008-05-15 20:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Thomas Gleixner, Alan Cox, Alexander Viro

On Thu, 2008-05-15 at 13:27 -0700, Arjan van de Ven wrote:
> On Thu, 15 May 2008 10:41:54 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > 
> > So looking a bit more at your trivial fixups, I'd suggest strongly
> > that they be re-organized a bit.
> 
> > >
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 6eab9bf..e12e571 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > >  
> > >  static int rpc_wait_bit_killable(void *word)
> > >  {
> > > +	int bkl = kernel_locked();
> > > +
> > >  	if (fatal_signal_pending(current))
> > >  		return -ERESTARTSYS;
> > > +	if (bkl)
> > > +		unlock_kernel();
> > >  	schedule();
> > > +	if (bkl)
> > > +		lock_kernel();
> > >  	return 0;
> > >  }
> > 
> > The above doesn't even work in general. It depends on having just a
> > single level of locking, and is ugly to boot. So wow about we just
> > expose some version of
> > 
> > 	depth = release_kernel_lock()
> > 	..
> > 	reacquire_kernel_lock(depth);
> > 
> > to existing BKL users as a way to safely release and re-aquire it 
> > regardless of depth. That makes the code more generic, but it *also*
> > makes it more readable than that "if (bkl) [un]lock_kernel()"
> > sequence.
> 
> 
> can we make this even more specific/restricted? Like having something
> like
> 
> call_bkl_unlocked(function_pointer, argument);
> 
> or something that will internally do the full unlock and then the
> function call. The last thing we need is another nailgun that BKL using
> code can use to staple themselves to something big and fast moving.
> By having a more restricted interface... less likely.
> Maybe we can even get away with only a
> 
> drop_bkl_and_schedule();
> 
> and nothing else.

No, that would defeat the whole purpose of the exercise. This drop on
schedule property makes it possible to have inverse lock order and not
deadlock.

That also makes the manual re-acquire on a different level pretty ugly
and deadlock prone.

The whole purpose of this patch series was to get rid of this exact
problem so that the BKL turns into something that resembles a normal
lock within the regular locking hierarchy.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 20:45     ` Peter Zijlstra
@ 2008-05-15 21:22       ` Arjan van de Ven
  0 siblings, 0 replies; 78+ messages in thread
From: Arjan van de Ven @ 2008-05-15 21:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Thomas Gleixner, Alan Cox, Alexander Viro

On Thu, 15 May 2008 22:45:55 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Thu, 2008-05-15 at 13:27 -0700, Arjan van de Ven wrote:
> > On Thu, 15 May 2008 10:41:54 -0700 (PDT)
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > 
> > > So looking a bit more at your trivial fixups, I'd suggest strongly
> > > that they be re-organized a bit.
> > 
> > > >
> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > > index 6eab9bf..e12e571 100644
> > > > --- a/net/sunrpc/sched.c
> > > > +++ b/net/sunrpc/sched.c
> > > > @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > > >  
> > > >  static int rpc_wait_bit_killable(void *word)
> > > >  {
> > > > +	int bkl = kernel_locked();
> > > > +
> > > >  	if (fatal_signal_pending(current))
> > > >  		return -ERESTARTSYS;
> > > > +	if (bkl)
> > > > +		unlock_kernel();
> > > >  	schedule();
> > > > +	if (bkl)
> > > > +		lock_kernel();
> > > >  	return 0;
> > > >  }
> > > 
> > > The above doesn't even work in general. It depends on having just
> > > a single level of locking, and is ugly to boot. So wow about we
> > > just expose some version of
> > > 
> > > 	depth = release_kernel_lock()
> > > 	..
> > > 	reacquire_kernel_lock(depth);
> > > 
> > > to existing BKL users as a way to safely release and re-aquire it 
> > > regardless of depth. That makes the code more generic, but it
> > > *also* makes it more readable than that "if (bkl)
> > > [un]lock_kernel()" sequence.
> > 
> > 
> > can we make this even more specific/restricted? Like having
> > something like
> > 
> > call_bkl_unlocked(function_pointer, argument);
> > 
> > or something that will internally do the full unlock and then the
> > function call. The last thing we need is another nailgun that BKL
> > using code can use to staple themselves to something big and fast
> > moving. By having a more restricted interface... less likely.
> > Maybe we can even get away with only a
> > 
> > drop_bkl_and_schedule();
> > 
> > and nothing else.
> 
> No, that would defeat the whole purpose of the exercise. This drop on
> schedule property makes it possible to have inverse lock order and not
> deadlock.

I would totally agree with you, except that all these patches
effectively do it manually again ANYWAY :(

so what I propose is make it explicit drop_bkl_and_schedule() call only,
and only do them as a very very last resort.

For 99% of the rest it does give exactly the regular benefits you
describe. And we can then prioritize these ugly cases to get de-bkl'd
first.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-15 15:36                 ` Alan Cox
@ 2008-05-16 10:21                   ` Andi Kleen
  0 siblings, 0 replies; 78+ messages in thread
From: Andi Kleen @ 2008-05-16 10:21 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro

Alan Cox wrote:
>>> file structures - as well
>> How so?
> 
> Take a look at how file->f_flags is locked.

Ah I posted patches to fix that one a couple of weeks
ago, but unfortunately they fell out of -mm again.

It was part of the "unlocked fasync" patchkit.

-Andi


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH, RFC] char dev BKL pushdown
  2008-05-14 21:56   ` Linus Torvalds
  2008-05-14 22:07     ` Jonathan Corbet
@ 2008-05-16 15:44     ` Jonathan Corbet
  2008-05-16 15:49       ` Christoph Hellwig
                         ` (4 more replies)
  1 sibling, 5 replies; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-16 15:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel

Linus said:

> So it literally should be:
>  - remove one lock_kernel/unlock_kernel pair in fs/char_dev.c
>  - add max 83 pairs in the places that register those things

...and so that's what I've done.  My approach was to find every
register_chrdev() and cdev_add() call, look at the associated
file_operations, then go back to the open() function, if any.  Unless it
was almost immediately obvious to me that the function was either (1) so
trivial as to not require locking (quite few of them are "return 0;"), or
(2) clearly doing its own locking, I wrapped the code in the BKL.

Finally, I removed the BKL from chrdev_open().

Allmodconfig and allyesconfig makes work here, and this kernel runs on my
dual-core desktop.  But, clearly, I don't have all of this hardware.
Actually, I wonder if some of it still exists outside of museums.  So
there's probably something stupid in there somewhere.

The result is available at:

	git://git.lwn.net/linux-2.6.git cdev

I'll put a shortlog and diffstat at the end of this message.  For
completeness, there's also a list of files I examined and did *not* change.

Assuming nobody tells me I'm completely off-base, I guess my next step is
to start running individual patches past maintainers.  Some of them,
probably (I hope), will tell me that I've been wasting my time and that
their code doesn't need the BKL.  In such cases, I'll gladly drop the
associated patch.  But there's a fair amount of stuff here which clearly
*does* need it still.  If all seems well, maybe this tree should get into
linux-next at some point too.

Comments?

The changes come out to this:

 arch/cris/arch-v10/drivers/gpio.c         |    3 ++
 arch/cris/arch-v10/drivers/sync_serial.c  |   34 ++++++++++++++++----------
 arch/cris/arch-v32/drivers/mach-a3/gpio.c |    4 +++
 arch/cris/arch-v32/drivers/mach-fs/gpio.c |    5 +++
 arch/cris/arch-v32/drivers/sync_serial.c  |   33 +++++++++++++++-----------
 arch/mips/kernel/rtlx.c                   |    7 ++++-
 arch/mips/kernel/vpe.c                    |   12 +++++++--
 arch/mips/sibyte/common/sb_tbprof.c       |   25 ++++++++++++++-----
 arch/sh/boards/landisk/gio.c              |   10 +++++--
 arch/x86/kernel/cpuid.c                   |   25 +++++++++++++------
 arch/x86/kernel/msr.c                     |   16 +++++++++---
 block/bsg.c                               |    7 ++++-
 drivers/block/aoe/aoechr.c                |    7 ++++-
 drivers/block/paride/pg.c                 |   22 ++++++++++++-----
 drivers/block/paride/pt.c                 |    8 +++++-
 drivers/char/drm/drm_fops.c               |    9 +++++--
 drivers/char/ipmi/ipmi_devintf.c          |    8 ++++--
 drivers/char/lp.c                         |   38 ++++++++++++++++++++----------
 drivers/char/mbcs.c                       |    5 +++
 drivers/char/mem.c                        |   10 ++++++-
 drivers/char/misc.c                       |    3 ++
 drivers/char/pcmcia/cm4000_cs.c           |   26 +++++++++++++++-----
 drivers/char/pcmcia/cm4040_cs.c           |   23 +++++++++++++-----
 drivers/char/snsc.c                       |    5 +++
 drivers/char/tty_io.c                     |   27 +++++++++++++++++++--
 drivers/char/viotape.c                    |    3 ++
 drivers/firewire/fw-cdev.c                |   16 +++++++++---
 drivers/hid/hidraw.c                      |    3 ++
 drivers/i2c/i2c-dev.c                     |   22 ++++++++++++-----
 drivers/ide/ide-tape.c                    |    7 ++++-
 drivers/ieee1394/dv1394.c                 |    6 +++-
 drivers/ieee1394/raw1394.c                |    3 ++
 drivers/ieee1394/video1394.c              |   18 ++++++++++----
 drivers/input/input.c                     |   16 +++++++++---
 drivers/isdn/i4l/isdn_common.c            |    3 +-
 drivers/media/dvb/dvb-core/dvbdev.c       |    4 +++
 drivers/mtd/mtdchar.c                     |   22 ++++++++++++-----
 drivers/mtd/ubi/cdev.c                    |    7 ++++-
 drivers/net/wan/cosa.c                    |   22 ++++++++++++-----
 drivers/pcmcia/pcmcia_ioctl.c             |   25 ++++++++++++++-----
 drivers/rtc/rtc-dev.c                     |   12 +++++++--
 drivers/s390/char/fs3270.c                |   23 ++++++++++++------
 drivers/s390/char/tape_char.c             |   12 +++++++--
 drivers/s390/char/vmlogrdr.c              |    8 +++++-
 drivers/s390/char/vmur.c                  |   12 +++++++--
 drivers/scsi/aacraid/linit.c              |    3 ++
 drivers/scsi/gdth.c                       |    3 ++
 drivers/scsi/osst.c                       |   15 +++++++++++
 drivers/scsi/sg.c                         |   16 ++++++++++--
 drivers/scsi/st.c                         |   11 +++++++-
 drivers/telephony/phonedev.c              |    3 ++
 drivers/uio/uio.c                         |   17 +++++++++----
 drivers/usb/core/file.c                   |    3 ++
 drivers/video/fbmem.c                     |   15 ++++++++---
 fs/char_dev.c                             |    8 ++----
 sound/core/sound.c                        |   15 +++++++++++
 sound/sound_core.c                        |    5 +++
 57 files changed, 554 insertions(+), 176 deletions(-)

The associated shortlog is:

Jonathan Corbet (43):
      bsg: cdev lock_kernel() pushdown
      cris: cdev lock_kernel() pushdown
      mips: cdev lock_kernel() pushdown
      sh: cdev lock_kernel() pushdown
      x86: cdev lock_kernel() pushdown
      i2c: cdev lock_kernel() pushdown
      cosa: cdev lock_kernel() pushdown
      pcmcia: cdev lock_kernel() pushdown
      ieee1394: cdev lock_kernel() pushdown
      rtc: cdev lock_kernel() pushdown
      drivers/s390: cdev lock_kernel() pushdown
      AoE: cdev lock_kernel() pushdown
      paride: cdev lock_kernel() pushdown
      mtdchar: cdev lock_kernel() pushdown
      UBI: cdev lock_kernel() pushdown
      firewire: cdev lock_kernel() pushdown
      HID: cdev lock_kernel() pushdown
      Input: cdev lock_kernel() pushdown
      UIO: cdev lock_kernel() pushdown
      cm40x0: cdev lock_kernel() pushdown
      ipmi: cdev lock_kernel() pushdown
      mem: cdev lock_kernel() pushdown
      misc: cdev lock_kernel() pushdown
      viotape: cdev lock_kernel pushdown
      mbcs: cdev lock_kernel() pushdown
      lp: cdev lock_kernel() pushdown
      drm: cdev lock_kernel() pushdown
      phonedev: cdev lock_kernel() pushdown
      ide-tape: cdev lock_kernel() pushdown
      sg: cdev lock_kernel() pushdown
      osst: cdev lock_kernel() pushdown.
      aacraid: cdev lock_kernel() pushdown
      st: cdev lock_kernel() pushdown
      gdth: cdev lock_kernel() pushdown
      isdn: cdev lock_kernel() pushdown
      usbcore: cdev lock_kernel() pushdown
      dvb: cdev lock_kernel() pushdown
      fbmem: cdev lock_kernel() pushdown
      sound: cdev lock_kernel() pushdown
      snsc: cdev lock_kernel() pushdown
      tty: cdev lock_kernel() pushdown
      Remove the lock_kernel() call from chrdev_open()
      Add a comment in chrdev_open()

Char device source files which I did *not* change:

	arch/cris/arch-v32/drivers/pcf8563.c (no open function)
	arch/cris/arch-v32/drivers/i2c.c (empty open function)
	arch/cris/arch-v32/drivers/cryptocop.c (almost-empty open function)
	arch/cris/arch-v10/drivers/pcf8563.c  (no open function)
	arch/cris/arch-v10/drivers/i2c.c (empty open function)
	arch/cris/arch-v10/drivers/ds1302.c (no open function)
	arch/cris/arch-v10/drivers/eeprom.c (trivial)
	drivers/net/ppp_generic.c (trivial)
	drivers/ieee1394/ieee1394_core.c (almost-trivial)
	drivers/spi/spidev.c  (locking looks right)
	drivers/char/cs5535_gpio.c (trivial)
	drivers/char/dtlk.c (trivial, broken open)
	drivers/char/ds1302.c (no open)
	drivers/char/pc8736x_gpio.c (trivial)
	drivers/char/stallion.c (no open)
	drivers/char/tb0219.c (trivial open)
	drivers/char/vc_screen.c (trivial open)
	drivers/char/ppdev.c (trivial open)
	drivers/char/ip2/ip2main.c (trivial open - does not do anything!)
	drivers/char/scx200_gpio.c (trivial open)
	drivers/char/xilinx_hwicap/xilinx_hwicap.c (locking looks right)
	drivers/char/istallion.c (no open)
	drivers/char/vr41xx_giu.c (trivial)
	drivers/char/tlclk.c (locking looks right)
	drivers/char/raw.c (obvious locking w/raw_mutex)
	drivers/char/dsp56k.c (single-use locking)
	drivers/infiniband/hw/ipath/ipath_file_ops.c (trivial open)
	drivers/infiniband/core/user_mad.c (appears to have good locking)
	drivers/infiniband/core/uverbs_main.c (ditto)
	drivers/infiniband/core/ucm.c (trivial open)
	drivers/misc/phantom.c (has locking)
	drivers/sbus/char/bpp.c (has locking)
	drivers/sbus/char/vfc_dev.c (has locking)
	drivers/scsi/dpt_i2o.c (has locking)
	drivers/scsi/3w-xxxx.c (trivial open)
	drivers/scsi/3w-9xxx.c (trivial open)
	drivers/scsi/megaraid/megaraid_sas.c (trivial open)
	drivers/scsi/ch.c (has locking)
	drivers/scsi/megaraid.c (trivial open)
	drivers/isdn/capi/capi.c (semi-trivial open)
	drivers/isdn/hardware/eicon/divasi.c (empty open)
	drivers/isdn/hardware/eicon/divamnt.c (single-use open)
	drivers/isdn/hardware/eicon/divasmain.c (empty open)
	drivers/macintosh/adb.c (trivial open)
	drivers/usb/gadget/printer.c (has locking)
	drivers/usb/mon/mon_bin.c (has locking)
	drivers/usb/core/endpoint.c (no opens)
	drivers/usb/core/devio.c (has locking)
	drivers/media/video/videodev.c (has locking)
	fs/coda/psdev.c (already has lock_kernel() calls)

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
@ 2008-05-16 15:49       ` Christoph Hellwig
  2008-05-16 16:03         ` [PATCH] kill empty chardev open/release methods Christoph Hellwig
  2008-05-16 16:22       ` [PATCH, RFC] char dev BKL pushdown Alan Cox
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 78+ messages in thread
From: Christoph Hellwig @ 2008-05-16 15:49 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

On Fri, May 16, 2008 at 09:44:06AM -0600, Jonathan Corbet wrote:
> trivial as to not require locking (quite few of them are "return 0;"), or

If they literaly are 'return 0' you can just remove them, as a
non-existing open op will just be fine.

> (2) clearly doing its own locking, I wrapped the code in the BKL.

Even if clearly does it's own locking please add the BKL for now and let
the maintainers sort it out later, better be safe then sorry.

Except for that thanks a lot, this is the kind of work that's more
productive than all these discussions here :)

For some reason about 80 instances seem awfully few, but we've move a
lot of device into subsystems from beeing plain chardevs so this
might actually be correct.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] kill empty chardev open/release methods
  2008-05-16 15:49       ` Christoph Hellwig
@ 2008-05-16 16:03         ` Christoph Hellwig
  2008-05-16 16:24           ` Alan Cox
  2008-05-16 20:55           ` Alan Cox
  0 siblings, 2 replies; 78+ messages in thread
From: Christoph Hellwig @ 2008-05-16 16:03 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

On Fri, May 16, 2008 at 11:49:22AM -0400, Christoph Hellwig wrote:
> On Fri, May 16, 2008 at 09:44:06AM -0600, Jonathan Corbet wrote:
> > trivial as to not require locking (quite few of them are "return 0;"), or
> 
> If they literaly are 'return 0' you can just remove them, as a
> non-existing open op will just be fine.

And here's a patch to do just that:  remove all empty chardev
open/release methods.  Based on the list compiled by Jonathan.

(and yeah, ip2_ipl_open is not technically empty at the source level,
 but only at the binary level :))


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/arch/cris/arch-v10/drivers/i2c.c
===================================================================
--- linux-2.6.orig/arch/cris/arch-v10/drivers/i2c.c	2008-05-16 17:58:15.000000000 +0200
+++ linux-2.6/arch/cris/arch-v10/drivers/i2c.c	2008-05-16 17:58:19.000000000 +0200
@@ -563,18 +563,6 @@ i2c_readreg(unsigned char theSlave, unsi
 	return b;
 }
 
-static int
-i2c_open(struct inode *inode, struct file *filp)
-{
-	return 0;
-}
-
-static int
-i2c_release(struct inode *inode, struct file *filp)
-{
-	return 0;
-}
-
 /* Main device API. ioctl's to write or read to/from i2c registers.
  */
 
@@ -619,8 +607,6 @@ i2c_ioctl(struct inode *inode, struct fi
 static const struct file_operations i2c_fops = {
 	.owner    = THIS_MODULE,
 	.ioctl    = i2c_ioctl,
-	.open     = i2c_open,
-	.release  = i2c_release,
 };
 
 int __init
Index: linux-2.6/arch/cris/arch-v32/drivers/i2c.c
===================================================================
--- linux-2.6.orig/arch/cris/arch-v32/drivers/i2c.c	2008-05-16 17:57:56.000000000 +0200
+++ linux-2.6/arch/cris/arch-v32/drivers/i2c.c	2008-05-16 17:58:07.000000000 +0200
@@ -633,18 +633,6 @@ i2c_readreg(unsigned char theSlave, unsi
 	return b;
 }
 
-static int
-i2c_open(struct inode *inode, struct file *filp)
-{
-	return 0;
-}
-
-static int
-i2c_release(struct inode *inode, struct file *filp)
-{
-	return 0;
-}
-
 /* Main device API. ioctl's to write or read to/from i2c registers.
  */
 
@@ -689,8 +677,6 @@ i2c_ioctl(struct inode *inode, struct fi
 static const struct file_operations i2c_fops = {
 	.owner =    THIS_MODULE,
 	.ioctl =    i2c_ioctl,
-	.open =     i2c_open,
-	.release =  i2c_release,
 };
 
 static int __init i2c_init(void)
Index: linux-2.6/drivers/char/ip2/ip2main.c
===================================================================
--- linux-2.6.orig/drivers/char/ip2/ip2main.c	2008-05-16 17:59:17.000000000 +0200
+++ linux-2.6/drivers/char/ip2/ip2main.c	2008-05-16 17:59:37.000000000 +0200
@@ -203,7 +203,6 @@ static int set_serial_info(i2ChanStrPtr,
 static ssize_t ip2_ipl_read(struct file *, char __user *, size_t, loff_t *);
 static ssize_t ip2_ipl_write(struct file *, const char __user *, size_t, loff_t *);
 static int ip2_ipl_ioctl(struct inode *, struct file *, UINT, ULONG);
-static int ip2_ipl_open(struct inode *, struct file *);
 
 static int DumpTraceBuffer(char __user *, int);
 static int DumpFifoBuffer( char __user *, int);
@@ -236,7 +235,6 @@ static const struct file_operations ip2_
 	.read		= ip2_ipl_read,
 	.write		= ip2_ipl_write,
 	.ioctl		= ip2_ipl_ioctl,
-	.open		= ip2_ipl_open,
 }; 
 
 static unsigned long irq_counter = 0;
@@ -2918,58 +2916,6 @@ ip2_ipl_ioctl ( struct inode *pInode, st
 	return rc;
 }
 
-/******************************************************************************/
-/* Function:   ip2_ipl_open()                                                 */
-/* Parameters: Pointer to device inode                                        */
-/*             Pointer to file structure                                      */
-/* Returns:    Success or failure                                             */
-/*                                                                            */
-/* Description:                                                               */
-/*                                                                            */
-/*                                                                            */
-/******************************************************************************/
-static int
-ip2_ipl_open( struct inode *pInode, struct file *pFile )
-{
-	unsigned int iplminor = iminor(pInode);
-	i2eBordStrPtr pB;
-	i2ChanStrPtr  pCh;
-
-#ifdef IP2DEBUG_IPL
-	printk (KERN_DEBUG "IP2IPL: open\n" );
-#endif
-
-	switch(iplminor) {
-	// These are the IPL devices
-	case 0:
-	case 4:
-	case 8:
-	case 12:
-		break;
-
-	// These are the status devices
-	case 1:
-	case 5:
-	case 9:
-	case 13:
-		break;
-
-	// These are the debug devices
-	case 2:
-	case 6:
-	case 10:
-	case 14:
-		pB = i2BoardPtrTable[iplminor / 4];
-		pCh = (i2ChanStrPtr) pB->i2eChannelPtr;
-		break;
-
-	// This is the trace device
-	case 3:
-		break;
-	}
-	return 0;
-}
-
 static int
 proc_ip2mem_show(struct seq_file *m, void *v)
 {
Index: linux-2.6/drivers/isdn/hardware/eicon/divasi.c
===================================================================
--- linux-2.6.orig/drivers/isdn/hardware/eicon/divasi.c	2008-05-16 17:59:59.000000000 +0200
+++ linux-2.6/drivers/isdn/hardware/eicon/divasi.c	2008-05-16 18:00:13.000000000 +0200
@@ -74,7 +74,6 @@ static ssize_t um_idi_read(struct file *
 static ssize_t um_idi_write(struct file *file, const char __user *buf,
 			    size_t count, loff_t * offset);
 static unsigned int um_idi_poll(struct file *file, poll_table * wait);
-static int um_idi_open(struct inode *inode, struct file *file);
 static int um_idi_release(struct inode *inode, struct file *file);
 static int remove_entity(void *entity);
 static void diva_um_timer_function(unsigned long data);
@@ -136,7 +135,6 @@ static const struct file_operations diva
 	.read    = um_idi_read,
 	.write   = um_idi_write,
 	.poll    = um_idi_poll,
-	.open    = um_idi_open,
 	.release = um_idi_release
 };
 
@@ -398,12 +396,6 @@ static unsigned int um_idi_poll(struct f
 	return (POLLIN | POLLRDNORM);
 }
 
-static int um_idi_open(struct inode *inode, struct file *file)
-{
-	return (0);
-}
-
-
 static int um_idi_release(struct inode *inode, struct file *file)
 {
 	diva_um_idi_os_context_t *p_os;
Index: linux-2.6/drivers/isdn/hardware/eicon/divasmain.c
===================================================================
--- linux-2.6.orig/drivers/isdn/hardware/eicon/divasmain.c	2008-05-16 18:00:26.000000000 +0200
+++ linux-2.6/drivers/isdn/hardware/eicon/divasmain.c	2008-05-16 18:00:36.000000000 +0200
@@ -578,10 +578,6 @@ xdi_copy_from_user(void *os_handle, void
 /*
  * device node operations
  */
-static int divas_open(struct inode *inode, struct file *file)
-{
-	return (0);
-}
 
 static int divas_release(struct inode *inode, struct file *file)
 {
@@ -667,7 +663,6 @@ static const struct file_operations diva
 	.read    = divas_read,
 	.write   = divas_write,
 	.poll    = divas_poll,
-	.open    = divas_open,
 	.release = divas_release
 };
 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  2008-05-16 15:49       ` Christoph Hellwig
@ 2008-05-16 16:22       ` Alan Cox
  2008-05-16 16:30       ` Linus Torvalds
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-16 16:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

> probably (I hope), will tell me that I've been wasting my time and that
> their code doesn't need the BKL.  In such cases, I'll gladly drop the
> associated patch.  But there's a fair amount of stuff here which clearly

You have to be careful before assuming but yes - seems sensible.

I'm currently munching my way through the watchdog drivers fixing them up
for unlocked_ioctl/BKL drops and finding various things needing fixing
anyway.

> *does* need it still.  If all seems well, maybe this tree should get into
> linux-next at some point too.

Definitely

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] kill empty chardev open/release methods
  2008-05-16 16:03         ` [PATCH] kill empty chardev open/release methods Christoph Hellwig
@ 2008-05-16 16:24           ` Alan Cox
  2008-05-16 20:55           ` Alan Cox
  1 sibling, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-16 16:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel

> Index: linux-2.6/drivers/char/ip2/ip2main.c

Looks fine but please send it via my tty tree so it doesn't collide with
the tty work.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  2008-05-16 15:49       ` Christoph Hellwig
  2008-05-16 16:22       ` [PATCH, RFC] char dev BKL pushdown Alan Cox
@ 2008-05-16 16:30       ` Linus Torvalds
  2008-05-16 16:43         ` Jonathan Corbet
  2008-05-17 21:15       ` Arnd Bergmann
  2008-05-17 21:58       ` Linus Torvalds
  4 siblings, 1 reply; 78+ messages in thread
From: Linus Torvalds @ 2008-05-16 16:30 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel



On Fri, 16 May 2008, Jonathan Corbet wrote:
> 
> I'll put a shortlog and diffstat at the end of this message.  For
> completeness, there's also a list of files I examined and did *not* change.

May I suggest just adding a comment in those files, just saying something 
like

	/* This does not need the BKL, because .. */

where even the "because" part could be dropped when it's really obvious.

That way that "list of files I examined and did *not* change" would be 
obvious in the patch itself, and we also have some documentation that 
somebody actually looked at the path.

> Assuming nobody tells me I'm completely off-base, I guess my next step is
> to start running individual patches past maintainers.  Some of them,
> probably (I hope), will tell me that I've been wasting my time and that
> their code doesn't need the BKL.  In such cases, I'll gladly drop the
> associated patch.

Same deal - just document the fact that the BKL isn't needed.

Yeah, in the long run that kind of documentation is worthless and we may 
want to get rid of it again in a year or two, but in the short run it's a 
good idea. If only to help people who want to review your patches.

Btw, do you have gitweb running anywhere? 

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 16:30       ` Linus Torvalds
@ 2008-05-16 16:43         ` Jonathan Corbet
  0 siblings, 0 replies; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-16 16:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> May I suggest just adding a comment in those files, just saying something 
> like
> 
> 	/* This does not need the BKL, because .. */

OK, I'll make another pass shortly and fill that in.

> Btw, do you have gitweb running anywhere? 

No, I guess I need to figure out how to set it up.  Either that or get
one of those kernel.org accounts and put things there.

Thanks,

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] kill empty chardev open/release methods
  2008-05-16 16:03         ` [PATCH] kill empty chardev open/release methods Christoph Hellwig
  2008-05-16 16:24           ` Alan Cox
@ 2008-05-16 20:55           ` Alan Cox
  2008-05-18 19:46             ` Jonathan Corbet
  1 sibling, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-16 20:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel

> > If they literaly are 'return 0' you can just remove them, as a
> > non-existing open op will just be fine.
> 
> And here's a patch to do just that:  remove all empty chardev
> open/release methods.  Based on the list compiled by Jonathan.

Actually it turns out you can introduce bugs doing this when the BKL is
pushed down.

The problem is the methods are not NULL, they (with the lock pushed down
are)

{
	lock_kernel();
	unlock_kernel();
}

And we have drivers with setup code that does things in the wrong order
but under the BKL. eg one I just fixed did

	misc_register()
	init locks
	allocate memory
	do stuff
	return 0;

The lock/unlock in the open happens to save your butt against the wrong
order of intialisation because the open cannot occur before the lock is
taken, and thanks to the BKL it cannot make any progress until the setup
is completed. Fun too - udev loves opening things as they appear so in
some cases we might actually trigger them too.

So when you remove the _open() empty methods *please* make sure you have
verified the correctness and ordering of the entire registration path.
I've found three examples of this so far just cleaning up
drivers/watchdog.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
                   ` (4 preceding siblings ...)
  2008-05-15 17:41 ` Linus Torvalds
@ 2008-05-17  0:14 ` Kevin Winchester
  2008-05-17  0:37   ` Kevin Winchester
  5 siblings, 1 reply; 78+ messages in thread
From: Kevin Winchester @ 2008-05-17  0:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro

Ingo Molnar wrote:

<snip description and patches>

I decided to give this tree a try, and I got:


[4294034.386085] ------------[ cut here ]------------
[4294034.387882] WARNING: at fs/proc/generic.c:669 
create_proc_entry+0x3d/0xc5()
[4294034.390059] Pid: 2565, comm: Xorg Not tainted 
2.6.26-rc2-00456-gd9df34e #35
[4294034.392682]
[4294034.392683] Call Trace:
[4294034.394071]  [<ffffffff8022a8ac>] warn_on_slowpath+0x53/0x81
[4294034.394077]  [<ffffffff802b57b4>] ? proc_register+0xf7/0x162
[4294034.394081]  [<ffffffff802b580b>] ? proc_register+0x14e/0x162
[4294034.394087]  [<ffffffff804afa05>] ? _spin_unlock+0x30/0x4b
[4294034.394091]  [<ffffffff802b580b>] ? proc_register+0x14e/0x162
[4294034.394095]  [<ffffffff8021a127>] ? startup_ioapic_irq+0x54/0x5f
[4294034.394099]  [<ffffffff802b5fcc>] create_proc_entry+0x3d/0xc5
[4294034.394103]  [<ffffffff80252008>] register_irq_proc+0x84/0xa0
[4294034.394108]  [<ffffffff80250b1a>] setup_irq+0x1b2/0x21b
[4294034.394113]  [<ffffffff80250cc8>] request_irq+0xf1/0x117
[4294034.394117]  [<ffffffff8038aaaa>] ? radeon_driver_irq_handler+0x0/0x7e
[4294034.394122]  [<ffffffff803789b6>] ? drm_control+0x0/0x186
[4294034.394126]  [<ffffffff80378acf>] drm_control+0x119/0x186
[4294034.394130]  [<ffffffff803770be>] drm_ioctl+0x1d3/0x265
[4294034.394589]  [<ffffffff80283e5e>] vfs_ioctl+0x5e/0x77
[4294034.394593]  [<ffffffff802840d2>] do_vfs_ioctl+0x25b/0x270
[4294034.394598]  [<ffffffff804af23e>] ? trace_hardirqs_on_thunk+0x35/0x3a
[4294034.394601]  [<ffffffff80284129>] sys_ioctl+0x42/0x65
[4294034.394606]  [<ffffffff8020b2eb>] system_call_after_swapgs+0x7b/0x80
[4294034.411597]
[4294034.411601] ---[ end trace 7f52164e4c2b9927 ]---

I have no idea if that is even related to the BKL or not, I haven't even 
opened the source file yet, but I figured I'd report it.

-- 
Kevin Winchester

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-17  0:14 ` Kevin Winchester
@ 2008-05-17  0:37   ` Kevin Winchester
  0 siblings, 0 replies; 78+ messages in thread
From: Kevin Winchester @ 2008-05-17  0:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, airlied

Kevin Winchester wrote:
> Ingo Molnar wrote:
> 
> <snip description and patches>
> 
> I decided to give this tree a try, and I got:
> 
> 
> [4294034.386085] ------------[ cut here ]------------
> [4294034.387882] WARNING: at fs/proc/generic.c:669 
> create_proc_entry+0x3d/0xc5()
> [4294034.390059] Pid: 2565, comm: Xorg Not tainted 
> 2.6.26-rc2-00456-gd9df34e #35
> [4294034.392682]
> [4294034.392683] Call Trace:
> [4294034.394071]  [<ffffffff8022a8ac>] warn_on_slowpath+0x53/0x81
> [4294034.394077]  [<ffffffff802b57b4>] ? proc_register+0xf7/0x162
> [4294034.394081]  [<ffffffff802b580b>] ? proc_register+0x14e/0x162
> [4294034.394087]  [<ffffffff804afa05>] ? _spin_unlock+0x30/0x4b
> [4294034.394091]  [<ffffffff802b580b>] ? proc_register+0x14e/0x162
> [4294034.394095]  [<ffffffff8021a127>] ? startup_ioapic_irq+0x54/0x5f
> [4294034.394099]  [<ffffffff802b5fcc>] create_proc_entry+0x3d/0xc5
> [4294034.394103]  [<ffffffff80252008>] register_irq_proc+0x84/0xa0
> [4294034.394108]  [<ffffffff80250b1a>] setup_irq+0x1b2/0x21b
> [4294034.394113]  [<ffffffff80250cc8>] request_irq+0xf1/0x117
> [4294034.394117]  [<ffffffff8038aaaa>] ? radeon_driver_irq_handler+0x0/0x7e
> [4294034.394122]  [<ffffffff803789b6>] ? drm_control+0x0/0x186
> [4294034.394126]  [<ffffffff80378acf>] drm_control+0x119/0x186
> [4294034.394130]  [<ffffffff803770be>] drm_ioctl+0x1d3/0x265
> [4294034.394589]  [<ffffffff80283e5e>] vfs_ioctl+0x5e/0x77
> [4294034.394593]  [<ffffffff802840d2>] do_vfs_ioctl+0x25b/0x270
> [4294034.394598]  [<ffffffff804af23e>] ? trace_hardirqs_on_thunk+0x35/0x3a
> [4294034.394601]  [<ffffffff80284129>] sys_ioctl+0x42/0x65
> [4294034.394606]  [<ffffffff8020b2eb>] system_call_after_swapgs+0x7b/0x80
> [4294034.411597]
> [4294034.411601] ---[ end trace 7f52164e4c2b9927 ]---
> 

And now applying the debugging tips that Linus, Al and others supplied 
to me awhile back, I see from GDB that:

vfs_ioctl locks the kernel before calling drm_ioctl, and, that 
create_proc_entry() has the following new line thanks to Ingo:

     WARN_ON_ONCE(kernel_locked());

According to Ingo's patch log:

     The functions, if called from the BKL, show that the calling site
     might have a dependency on the procfs code previously using the BKL
     in the dir-entry manipulation functions.

I do not really know what that means, so I cc'd Dave Airlie to see if he 
has a solution.

-- 
Kevin Winchester



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
                         ` (2 preceding siblings ...)
  2008-05-16 16:30       ` Linus Torvalds
@ 2008-05-17 21:15       ` Arnd Bergmann
  2008-05-18 20:26         ` Jonathan Corbet
  2008-05-17 21:58       ` Linus Torvalds
  4 siblings, 1 reply; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-17 21:15 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

On Friday 16 May 2008, Jonathan Corbet wrote:

> ...and so that's what I've done.  My approach was to find every
> register_chrdev() and cdev_add() call, look at the associated
> file_operations, then go back to the open() function, if any.

Note that the majority of drivers use (grep suggests up to 165
of them) uses misc_register instead of register_chrdev/cdev_add.
Your patches are still correct, because you pushed the BKL into the
misc_open function, but there is an obvious next step in pushing
it further into the misc drivers.
There are probably a few more subsystems with minor number specific
open() functions, misc is just the obvious one.

> > ...and so that's what I've done.  My approach was to find every 
> register_chrdev() and cdev_add() call, look at the associated
> file_operations, then go back to the open() function, if any.  Unless it
> was almost immediately obvious to me that the function was either (1) so
> trivial as to not require locking (quite few of them are "return 0;"), or
> (2) clearly doing its own locking, I wrapped the code in the BKL.
> 
> Finally, I removed the BKL from chrdev_open().

In your current git tree, this change is no longer the final one, so
bisecting the series may cause other bugs. You should probably reorder
the patches at some point to avoid this.

	Arnd <><

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
                         ` (3 preceding siblings ...)
  2008-05-17 21:15       ` Arnd Bergmann
@ 2008-05-17 21:58       ` Linus Torvalds
  2008-05-18 20:07         ` Jonathan Corbet
  4 siblings, 1 reply; 78+ messages in thread
From: Linus Torvalds @ 2008-05-17 21:58 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel



Btw, Jonathan, would you be willing to maintain some kind of tree of these 
BKL removal patches? 

This is different from the work Ingo is doing in the sense that these 
things should be (a) safe and reasonably obvious and thus (b) presumably 
ready to be merged in the next merge window. Ingo's BKL debugging tree is 
likely a good thing to use to find places that need work, but actually 
removing the BKL from some subsystem is a different issue.

(And when I say "safe and reasonably obvious" I obviously don't mean that 
there can't be bugs. Mistakes happen, and some BKL use might be overly 
subtle like the issue that Alan pointed out with an empty ->open routine 
almost accidentally serializing with initialization, but that's why I'd 
not merge these things after -rc2 anyway, but in the next merge window).

Because if you're willing to maintain a BKL-cleanup tree that gets merged 
into linux-next etc, I'd submit my VFAT/MSDOS BKL removal patch to you. 
The reason I did that one was that Thomas actually reported that to be a 
major source of latency problems on one of his embedded systems (80ms 
latency!), so it would be nice to have that patch in some place where it 
might get tested.

		Linus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] kill empty chardev open/release methods
  2008-05-16 20:55           ` Alan Cox
@ 2008-05-18 19:46             ` Jonathan Corbet
  2008-05-18 19:58               ` Alan Cox
  0 siblings, 1 reply; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-18 19:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> Actually it turns out you can introduce bugs doing this when the BKL is
> pushed down.
> 
> The problem is the methods are not NULL, they (with the lock pushed down
> are)
> 
> {
> 	lock_kernel();
> 	unlock_kernel();
> }
> 
> And we have drivers with setup code that does things in the wrong order
> but under the BKL. eg one I just fixed did
> 
> 	misc_register()
> 	init locks
> 	allocate memory
> 	do stuff
> 	return 0;

Hmph.

As it turns out, a misc driver will still be OK because the BKL has not
(yet) been pushed past misc_open().  What this does mean, though, is
that all of those empty and trivial open functions need to be
revisited.  I thought this looked too easy the first time through...

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] kill empty chardev open/release methods
  2008-05-18 19:46             ` Jonathan Corbet
@ 2008-05-18 19:58               ` Alan Cox
  0 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-18 19:58 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel


> As it turns out, a misc driver will still be OK because the BKL has not
> (yet) been pushed past misc_open().  What this does mean, though, is
> that all of those empty and trivial open functions need to be
> revisited.  I thought this looked too easy the first time through...

I think it would be best to make them lock/unlock kernel in the first
pass and then work through them. The BKL can be subtle and evil, but as I
brought it into the world I guess I must banish it ;)


Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-17 21:58       ` Linus Torvalds
@ 2008-05-18 20:07         ` Jonathan Corbet
  0 siblings, 0 replies; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-18 20:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Alan Cox, Alexander Viro, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Btw, Jonathan, would you be willing to maintain some kind of tree of these 
> BKL removal patches? 

Sure, I can do that - as long as people don't mind that committed to
being with the in-laws and off the net for the first couple of weeks in
June.  I'll try to get the first version up shortly, after yet another
pass over the chardev pushdown stuff.

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-17 21:15       ` Arnd Bergmann
@ 2008-05-18 20:26         ` Jonathan Corbet
  2008-05-19 23:07           ` Arnd Bergmann
  0 siblings, 1 reply; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-18 20:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

Arnd Bergmann <arnd@arndb.de> wrote:

> Note that the majority of drivers use (grep suggests up to 165
> of them) uses misc_register instead of register_chrdev/cdev_add.
> Your patches are still correct, because you pushed the BKL into the
> misc_open function, but there is an obvious next step in pushing
> it further into the misc drivers.

There's a few intermediate dispatcher levels like this, actually.
Lots of video drivers get called behind video_open(), usb drivers from
usb_open(), etc.  Not much to be done but to push things down one level
at a time.

> In your current git tree, this change is no longer the final one, so
> bisecting the series may cause other bugs. You should probably reorder
> the patches at some point to avoid this.

Bisection is going to be problem regardless - if a problem turns up,
it's going to be the chrdev_open() change which gets fingered.  I bet,
though, that it will be a rare BKL-related problem which is reproducible
enough to be easily bisectable.

But, yes, I do need to reorganize the patch series once I'm done adding
on changes.

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-18 20:26         ` Jonathan Corbet
@ 2008-05-19 23:07           ` Arnd Bergmann
       [not found]             ` <200805200111.47275.arnd@arndb.de>
                               ` (3 more replies)
  0 siblings, 4 replies; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-19 23:07 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

On Sunday 18 May 2008, Jonathan Corbet wrote:
> There's a few intermediate dispatcher levels like this, actually.
> Lots of video drivers get called behind video_open(), usb drivers from
> usb_open(), etc.  Not much to be done but to push things down one level
> at a time.

I've given it a try for all the misc drivers that have an open() function.
The vast majority of them are actually watchdog drivers, all of which
register as a misc device by themselves. You seem to already have a script
to turn per-file changes into a patch each, so I'm sending you two patches:
one for all the watchdog drivers (maybe Wim can take care of that as well)
and one for all the other misc drivers (this one needs to be split).

	Arnd <><


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 2/3, RFC] watchdog dev BKL pushdown
       [not found]             ` <200805200111.47275.arnd@arndb.de>
@ 2008-05-19 23:14               ` Arnd Bergmann
  2008-05-20  6:20                 ` Christoph Hellwig
  2008-05-20  8:42                 ` Alan Cox
  0 siblings, 2 replies; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-19 23:14 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

The Big Kernel Lock has been pushed down from chardev_open
to misc_open, this change moves it to the individual watchdog
driver open functions.

As before, the change was purely mechanical, most drivers
should actually not need the BKL.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Index: linux-2.6/drivers/watchdog/acquirewdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/acquirewdt.c
+++ linux-2.6/drivers/watchdog/acquirewdt.c
@@ -64,6 +64,7 @@
 #include <linux/ioport.h>		/* For io-port access */
 #include <linux/platform_device.h>	/* For platform_driver framework */
 #include <linux/init.h>			/* For __init/__exit/... */
+#include <linux/smp_lock.h>		/* For lock_kernel() */
 
 #include <asm/uaccess.h>		/* For copy_to_user/put_user/... */
 #include <asm/io.h>			/* For inb/outb/... */
@@ -195,14 +196,18 @@ static int acq_ioctl(struct inode *inode
 
 static int acq_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &acq_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &acq_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	/* Activate */
 	acq_keepalive();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/advantechwdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/advantechwdt.c
+++ linux-2.6/drivers/watchdog/advantechwdt.c
@@ -30,6 +30,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -198,13 +199,17 @@ advwdt_ioctl(struct inode *inode, struct
 static int
 advwdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &advwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &advwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	/*
 	 *	Activate
 	 */
 
 	advwdt_ping();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/alim1535_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/alim1535_wdt.c
+++ linux-2.6/drivers/watchdog/alim1535_wdt.c
@@ -9,6 +9,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -252,11 +253,15 @@ static int ali_ioctl(struct inode *inode
 static int ali_open(struct inode *inode, struct file *file)
 {
 	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &ali_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &ali_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Activate */
 	ali_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/alim7101_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/alim7101_wdt.c
+++ linux-2.6/drivers/watchdog/alim7101_wdt.c
@@ -21,6 +21,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/miscdevice.h>
@@ -198,10 +199,14 @@ static ssize_t fop_write(struct file * f
 static int fop_open(struct inode * inode, struct file * file)
 {
 	/* Just in case we're already talking to someone... */
-	if(test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if(test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	/* Good, fire up the show */
 	wdt_startup();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/ar7_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ar7_wdt.c
+++ linux-2.6/drivers/watchdog/ar7_wdt.c
@@ -28,6 +28,7 @@
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
@@ -179,11 +180,15 @@ static void ar7_wdt_disable_wdt(void)
 static int ar7_wdt_open(struct inode *inode, struct file *file)
 {
 	/* only allow one at a time */
-	if (down_trylock(&open_semaphore))
+	lock_kernel();
+	if (down_trylock(&open_semaphore)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	ar7_wdt_enable_wdt();
 	expect_close = 0;
 
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/at32ap700x_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/at32ap700x_wdt.c
+++ linux-2.6/drivers/watchdog/at32ap700x_wdt.c
@@ -32,6 +32,7 @@
 #include <linux/uaccess.h>
 #include <linux/io.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 
 #define TIMEOUT_MIN		1
 #define TIMEOUT_MAX		2
@@ -131,10 +132,14 @@ static inline void at32_wdt_pat(void)
  */
 static int at32_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(1, &wdt->users))
+	lock_kernel();
+	if (test_and_set_bit(1, &wdt->users)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	at32_wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/at91rm9200_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/at91rm9200_wdt.c
+++ linux-2.6/drivers/watchdog/at91rm9200_wdt.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/platform_device.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <asm/uaccess.h>
@@ -75,10 +76,14 @@ static void inline at91_wdt_reload(void)
  */
 static int at91_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &at91wdt_busy))
+	lock_kernel();
+	if (test_and_set_bit(0, &at91wdt_busy)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	at91_wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/bfin_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/bfin_wdt.c
+++ linux-2.6/drivers/watchdog/bfin_wdt.c
@@ -15,6 +15,7 @@
 #include <linux/platform_device.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/miscdevice.h>
@@ -165,10 +166,13 @@ static int bfin_wdt_set_timeout(unsigned
  */
 static int bfin_wdt_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	stampit();
 
-	if (test_and_set_bit(0, &open_check))
+	if (test_and_set_bit(0, &open_check)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -176,6 +180,7 @@ static int bfin_wdt_open(struct inode *i
 	bfin_wdt_keepalive();
 	bfin_wdt_start();
 
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/booke_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/booke_wdt.c
+++ linux-2.6/drivers/watchdog/booke_wdt.c
@@ -18,6 +18,7 @@
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
 #include <linux/notifier.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 
 #include <asm/reg_booke.h>
@@ -137,12 +138,14 @@ static int booke_wdt_ioctl (struct inode
  */
 static int booke_wdt_open (struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	if (booke_wdt_enabled == 0) {
 		booke_wdt_enabled = 1;
 		booke_wdt_enable();
 		printk (KERN_INFO "PowerPC Book-E Watchdog Timer Enabled (wdt_period=%d)\n",
 				booke_wdt_period);
 	}
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/cpu5wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/cpu5wdt.c
+++ linux-2.6/drivers/watchdog/cpu5wdt.c
@@ -21,6 +21,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -130,9 +131,13 @@ static int cpu5wdt_stop(void)
 
 static int cpu5wdt_open(struct inode *inode, struct file *file)
 {
-	if ( test_and_set_bit(0, &cpu5wdt_device.inuse) )
+	lock_kernel();
+	if ( test_and_set_bit(0, &cpu5wdt_device.inuse) ) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/davinci_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/davinci_wdt.c
+++ linux-2.6/drivers/watchdog/davinci_wdt.c
@@ -13,6 +13,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -120,10 +121,14 @@ static void wdt_enable(void)
 
 static int davinci_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/ep93xx_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ep93xx_wdt.c
+++ linux-2.6/drivers/watchdog/ep93xx_wdt.c
@@ -26,6 +26,7 @@
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/timer.h>
 
@@ -93,13 +94,17 @@ static void wdt_keepalive(void)
 
 static int ep93xx_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	clear_bit(WDT_OK_TO_CLOSE, &wdt_status);
 
 	wdt_startup();
 
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/eurotechwdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/eurotechwdt.c
+++ linux-2.6/drivers/watchdog/eurotechwdt.c
@@ -48,6 +48,7 @@
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -300,11 +301,15 @@ static int eurwdt_ioctl(struct inode *in
 
 static int eurwdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &eurwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &eurwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	eurwdt_timeout = WDT_TIMEOUT;	/* initial timeout */
 	/* Activate the WDT */
 	eurwdt_activate_timer();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/hpwdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/hpwdt.c
+++ linux-2.6/drivers/watchdog/hpwdt.c
@@ -30,6 +30,7 @@
 #include <linux/pci_ids.h>
 #include <linux/reboot.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/timer.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
@@ -486,12 +487,16 @@ static int hpwdt_change_timer(int new_ma
 static int hpwdt_open(struct inode *inode, struct file *file)
 {
 	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &hpwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &hpwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Start the watchdog */
 	hpwdt_start();
 	hpwdt_ping();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/i6300esb.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/i6300esb.c
+++ linux-2.6/drivers/watchdog/i6300esb.c
@@ -28,6 +28,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -198,12 +199,16 @@ static int esb_timer_read (void)
 static int esb_open (struct inode *inode, struct file *file)
 {
         /* /dev/watchdog can only be opened once */
-        if (test_and_set_bit(0, &timer_alive))
+	lock_kernel();
+        if (test_and_set_bit(0, &timer_alive)) {
+		unlock_kernel();
                 return -EBUSY;
+	}
 
         /* Reload and activate timer */
         esb_timer_keepalive ();
         esb_timer_start ();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/iTCO_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/iTCO_wdt.c
+++ linux-2.6/drivers/watchdog/iTCO_wdt.c
@@ -62,6 +62,7 @@
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
 #include <linux/moduleparam.h>		/* For new moduleparam's */
+#include <linux/smp_lock.h>		/* For lock_kernel */
 #include <linux/types.h>		/* For standard types (like size_t) */
 #include <linux/errno.h>		/* For the -ENODEV/... values */
 #include <linux/kernel.h>		/* For printk/panic/... */
@@ -453,14 +454,18 @@ static int iTCO_wdt_get_timeleft (int *t
 static int iTCO_wdt_open (struct inode *inode, struct file *file)
 {
 	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &is_active))
+	lock_kernel();
+	if (test_and_set_bit(0, &is_active)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/*
 	 *      Reload and activate timer
 	 */
 	iTCO_wdt_keepalive();
 	iTCO_wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/ib700wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ib700wdt.c
+++ linux-2.6/drivers/watchdog/ib700wdt.c
@@ -32,6 +32,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -256,7 +257,9 @@ ibwdt_ioctl(struct inode *inode, struct 
 static int
 ibwdt_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	if (test_and_set_bit(0, &ibwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
 	}
 	if (nowayout)
@@ -264,6 +267,7 @@ ibwdt_open(struct inode *inode, struct f
 
 	/* Activate */
 	ibwdt_ping();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/ibmasr.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ibmasr.c
+++ linux-2.6/drivers/watchdog/ibmasr.c
@@ -13,6 +13,7 @@
 #include <linux/fs.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/timer.h>
@@ -300,11 +301,15 @@ static int asr_ioctl(struct inode *inode
 
 static int asr_open(struct inode *inode, struct file *file)
 {
-	if(test_and_set_bit(0, &asr_is_open))
+	lock_kernel();
+	if(test_and_set_bit(0, &asr_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	asr_toggle();
 	asr_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/indydog.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/indydog.c
+++ linux-2.6/drivers/watchdog/indydog.c
@@ -13,6 +13,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -62,8 +63,11 @@ static void indydog_ping(void)
  */
 static int indydog_open(struct inode *inode, struct file *file)
 {
-	if (indydog_alive)
+	lock_kernel();
+	if (indydog_alive) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -74,6 +78,7 @@ static int indydog_open(struct inode *in
 
 	indydog_alive = 1;
 	printk(KERN_INFO "Started watchdog timer.\n");
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/iop_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/iop_wdt.c
+++ linux-2.6/drivers/watchdog/iop_wdt.c
@@ -30,6 +30,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/uaccess.h>
 #include <asm/hardware.h>
@@ -88,14 +89,18 @@ static int wdt_disable(void)
 
 static int iop_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	clear_bit(WDT_OK_TO_CLOSE, &wdt_status);
 
 	wdt_enable();
 
 	set_bit(WDT_ENABLED, &wdt_status);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/it8712f_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/it8712f_wdt.c
+++ linux-2.6/drivers/watchdog/it8712f_wdt.c
@@ -30,6 +30,7 @@
 #include <linux/fs.h>
 #include <linux/pci.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -305,10 +306,14 @@ it8712f_wdt_ioctl(struct inode *inode, s
 static int
 it8712f_wdt_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	/* only allow one at a time */
-	if (down_trylock(&it8712f_wdt_sem))
+	if (down_trylock(&it8712f_wdt_sem)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	it8712f_wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/ixp2000_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ixp2000_wdt.c
+++ linux-2.6/drivers/watchdog/ixp2000_wdt.c
@@ -18,6 +18,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -62,12 +63,16 @@ wdt_keepalive(void)
 static int
 ixp2000_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	clear_bit(WDT_OK_TO_CLOSE, &wdt_status);
 
 	wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/ixp4xx_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ixp4xx_wdt.c
+++ linux-2.6/drivers/watchdog/ixp4xx_wdt.c
@@ -15,6 +15,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -57,12 +58,16 @@ wdt_disable(void)
 static int
 ixp4xx_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	clear_bit(WDT_OK_TO_CLOSE, &wdt_status);
 
 	wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/ks8695_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/ks8695_wdt.c
+++ linux-2.6/drivers/watchdog/ks8695_wdt.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/platform_device.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <asm/io.h>
@@ -114,10 +115,14 @@ static int ks8695_wdt_settimeout(int new
  */
 static int ks8695_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &ks8695wdt_busy))
+	lock_kernel();
+	if (test_and_set_bit(0, &ks8695wdt_busy)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	ks8695_wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/machzwd.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/machzwd.c
+++ linux-2.6/drivers/watchdog/machzwd.c
@@ -30,6 +30,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/jiffies.h>
@@ -337,9 +338,11 @@ static int zf_ioctl(struct inode *inode,
 
 static int zf_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock(&zf_lock);
 	if(test_and_set_bit(0, &zf_is_open)) {
 		spin_unlock(&zf_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
@@ -349,6 +352,7 @@ static int zf_open(struct inode *inode, 
 	spin_unlock(&zf_lock);
 
 	zf_timer_on();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/mixcomwd.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mixcomwd.c
+++ linux-2.6/drivers/watchdog/mixcomwd.c
@@ -44,6 +44,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/ioport.h>
@@ -129,7 +130,9 @@ static void mixcomwd_timerfun(unsigned l
 
 static int mixcomwd_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	if(test_and_set_bit(0,&mixcomwd_opened)) {
+		unlock_kernel();
 		return -EBUSY;
 	}
 	mixcomwd_ping();
@@ -147,6 +150,7 @@ static int mixcomwd_open(struct inode *i
 			mixcomwd_timer_alive=0;
 		}
 	}
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/mpc5200_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mpc5200_wdt.c
+++ linux-2.6/drivers/watchdog/mpc5200_wdt.c
@@ -1,6 +1,7 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/io.h>
 #include <linux/spinlock.h>
@@ -137,14 +138,18 @@ static int mpc5200_wdt_ioctl(struct inod
 }
 static int mpc5200_wdt_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &is_active))
+	if (test_and_set_bit(0, &is_active)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Set and activate the watchdog */
 	mpc5200_wdt_set_timeout(wdt_global, 30);
 	mpc5200_wdt_start(wdt_global);
 	file->private_data = wdt_global;
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 static int mpc5200_wdt_release(struct inode *inode, struct file *file)
Index: linux-2.6/drivers/watchdog/mpc83xx_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mpc83xx_wdt.c
+++ linux-2.6/drivers/watchdog/mpc83xx_wdt.c
@@ -21,6 +21,7 @@
 #include <linux/miscdevice.h>
 #include <linux/platform_device.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -78,8 +79,11 @@ static ssize_t mpc83xx_wdt_write(struct 
 static int mpc83xx_wdt_open(struct inode *inode, struct file *file)
 {
 	u32 tmp = SWCRR_SWEN;
-	if (test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Once we start the watchdog we can't stop it */
 	__module_get(THIS_MODULE);
@@ -93,6 +97,7 @@ static int mpc83xx_wdt_open(struct inode
 	tmp |= timeout << 16;
 
 	out_be32(&wd_base->swcrr, tmp);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/mpc8xx_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mpc8xx_wdt.c
+++ linux-2.6/drivers/watchdog/mpc8xx_wdt.c
@@ -14,6 +14,7 @@
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <asm/8xx_immap.h>
 #include <asm/uaccess.h>
@@ -51,11 +52,15 @@ static void mpc8xx_wdt_handler_enable(vo
 
 static int mpc8xx_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &wdt_opened))
+	lock_kernel();
+	if (test_and_set_bit(0, &wdt_opened)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	m8xx_wdt_reset();
 	mpc8xx_wdt_handler_disable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/mpcore_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mpcore_wdt.c
+++ linux-2.6/drivers/watchdog/mpcore_wdt.c
@@ -21,6 +21,7 @@
  */
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -140,8 +141,11 @@ static int mpcore_wdt_open(struct inode 
 {
 	struct mpcore_wdt *wdt = platform_get_drvdata(mpcore_wdt_dev);
 
-	if (test_and_set_bit(0, &wdt->timer_alive))
+	lock_kernel();
+	if (test_and_set_bit(0, &wdt->timer_alive)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -152,6 +156,7 @@ static int mpcore_wdt_open(struct inode 
 	 *	Activate timer
 	 */
 	mpcore_wdt_start(wdt);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/mtx-1_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mtx-1_wdt.c
+++ linux-2.6/drivers/watchdog/mtx-1_wdt.c
@@ -35,6 +35,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -120,8 +121,12 @@ static int mtx1_wdt_stop(void)
 
 static int mtx1_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &mtx1_wdt_device.inuse))
+	lock_kernel();
+	if (test_and_set_bit(0, &mtx1_wdt_device.inuse)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/mv64x60_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/mv64x60_wdt.c
+++ linux-2.6/drivers/watchdog/mv64x60_wdt.c
@@ -20,6 +20,7 @@
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/platform_device.h>
 
@@ -122,13 +123,17 @@ static void mv64x60_wdt_set_timeout(unsi
 
 static int mv64x60_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(MV64x60_WDOG_FLAG_OPENED, &wdt_flags))
+	lock_kernel();
+	if (test_and_set_bit(MV64x60_WDOG_FLAG_OPENED, &wdt_flags)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	mv64x60_wdt_handler_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/omap_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/omap_wdt.c
+++ linux-2.6/drivers/watchdog/omap_wdt.c
@@ -27,6 +27,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -122,8 +123,11 @@ static void omap_wdt_set_timeout(void)
 
 static int omap_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(1, (unsigned long *)&omap_wdt_users))
+	lock_kernel();
+	if (test_and_set_bit(1, (unsigned long *)&omap_wdt_users)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (cpu_is_omap16xx())
 		clk_enable(armwdt_ck);	/* Enable the clock */
@@ -142,6 +146,7 @@ static int omap_wdt_open(struct inode *i
 
 	omap_wdt_set_timeout();
 	omap_wdt_enable();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/pc87413_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/pc87413_wdt.c
+++ linux-2.6/drivers/watchdog/pc87413_wdt.c
@@ -19,6 +19,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -305,8 +306,11 @@ static int pc87413_open(struct inode *in
 {
 	/* /dev/watchdog can only be opened once */
 
-	if (test_and_set_bit(0, &timer_enabled))
+	lock_kernel();
+	if (test_and_set_bit(0, &timer_enabled)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -316,6 +320,7 @@ static int pc87413_open(struct inode *in
 
 	printk(KERN_INFO MODNAME "Watchdog enabled. Timeout set to"
 	                         " %d minute(s).\n", timeout);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/pcwd.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/pcwd.c
+++ linux-2.6/drivers/watchdog/pcwd.c
@@ -51,6 +51,7 @@
 
 #include <linux/module.h>	/* For module specific items */
 #include <linux/moduleparam.h>	/* For new moduleparam's */
+#include <linux/smp_lock.h>	/* For lock_kernel */
 #include <linux/types.h>	/* For standard types (like size_t) */
 #include <linux/errno.h>	/* For the -ENODEV/... values */
 #include <linux/kernel.h>	/* For printk/panic/... */
@@ -682,10 +683,12 @@ static ssize_t pcwd_write(struct file *f
 
 static int pcwd_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	if (!atomic_dec_and_test(&open_allowed) ) {
 		if (debug >= VERBOSE)
 			printk(KERN_ERR PFX "Attempt to open already opened device.\n");
 		atomic_inc( &open_allowed );
+		unlock_kernel();
 		return -EBUSY;
 	}
 
@@ -695,6 +698,7 @@ static int pcwd_open(struct inode *inode
 	/* Activate */
 	pcwd_start();
 	pcwd_keepalive();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/pcwd_pci.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/pcwd_pci.c
+++ linux-2.6/drivers/watchdog/pcwd_pci.c
@@ -33,6 +33,7 @@
 
 #include <linux/module.h>	/* For module specific items */
 #include <linux/moduleparam.h>	/* For new moduleparam's */
+#include <linux/smp_lock.h>	/* For lock_kernel */
 #include <linux/types.h>	/* For standard types (like size_t) */
 #include <linux/errno.h>	/* For the -ENODEV/... values */
 #include <linux/kernel.h>	/* For printk/panic/... */
@@ -563,15 +564,18 @@ static int pcipcwd_ioctl(struct inode *i
 static int pcipcwd_open(struct inode *inode, struct file *file)
 {
 	/* /dev/watchdog can only be opened once */
+	lock_kernel();
 	if (test_and_set_bit(0, &is_active)) {
 		if (debug >= VERBOSE)
 			printk(KERN_ERR PFX "Attempt to open already opened device.\n");
+		unlock_kernel();
 		return -EBUSY;
 	}
 
 	/* Activate */
 	pcipcwd_start();
 	pcipcwd_keepalive();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/pcwd_usb.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/pcwd_usb.c
+++ linux-2.6/drivers/watchdog/pcwd_usb.c
@@ -26,6 +26,7 @@
 
 #include <linux/module.h>	/* For module specific items */
 #include <linux/moduleparam.h>	/* For new moduleparam's */
+#include <linux/smp_lock.h>	/* For lock_kernel */
 #include <linux/types.h>	/* For standard types (like size_t) */
 #include <linux/errno.h>	/* For the -ENODEV/... values */
 #include <linux/kernel.h>	/* For printk/panic/... */
@@ -460,12 +461,16 @@ static int usb_pcwd_ioctl(struct inode *
 static int usb_pcwd_open(struct inode *inode, struct file *file)
 {
 	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &is_active))
+	lock_kernel();
+	if (test_and_set_bit(0, &is_active)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Activate */
 	usb_pcwd_start(usb_pcwd_device);
 	usb_pcwd_keepalive(usb_pcwd_device);
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/pnx4008_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/pnx4008_wdt.c
+++ linux-2.6/drivers/watchdog/pnx4008_wdt.c
@@ -16,6 +16,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -134,12 +135,16 @@ static void wdt_disable(void)
 
 static int pnx4008_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(WDT_IN_USE, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(WDT_IN_USE, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	clear_bit(WDT_OK_TO_CLOSE, &wdt_status);
 
 	wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/rm9k_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/rm9k_wdt.c
+++ linux-2.6/drivers/watchdog/rm9k_wdt.c
@@ -28,6 +28,7 @@
 #include <linux/reboot.h>
 #include <linux/notifier.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <asm/io.h>
 #include <asm/atomic.h>
@@ -182,8 +183,11 @@ static int wdt_gpi_open(struct inode *in
 {
 	int res;
 
-	if (unlikely(atomic_dec_if_positive(&opencnt) < 0))
+	lock_kernel();
+	if (unlikely(atomic_dec_if_positive(&opencnt) < 0)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	expect_close = 0;
 	if (locked) {
@@ -194,14 +198,17 @@ static int wdt_gpi_open(struct inode *in
 
 	res = request_irq(wd_irq, wdt_gpi_irqhdl, IRQF_SHARED | IRQF_DISABLED,
 			  wdt_gpi_name, &miscdev);
-	if (unlikely(res))
+	if (unlikely(res)) {
+		unlock_kernel();
 		return res;
+	}
 
 	wdt_gpi_set_timeout(timeout);
 	wdt_gpi_start();
 
 	printk(KERN_INFO "%s: watchdog started, timeout = %u seconds\n",
 		wdt_gpi_name, timeout);
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/s3c2410_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/s3c2410_wdt.c
+++ linux-2.6/drivers/watchdog/s3c2410_wdt.c
@@ -37,6 +37,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/miscdevice.h>
@@ -211,8 +212,11 @@ static int s3c2410wdt_set_heartbeat(int 
 
 static int s3c2410wdt_open(struct inode *inode, struct file *file)
 {
-	if(down_trylock(&open_lock))
+	lock_kernel();
+	if(down_trylock(&open_lock)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -221,6 +225,7 @@ static int s3c2410wdt_open(struct inode 
 
 	/* start the timer */
 	s3c2410wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/sa1100_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sa1100_wdt.c
+++ linux-2.6/drivers/watchdog/sa1100_wdt.c
@@ -19,6 +19,7 @@
  */
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -45,14 +46,18 @@ static int boot_status;
  */
 static int sa1100dog_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(1,&sa1100wdt_users))
+	lock_kernel();
+	if (test_and_set_bit(1,&sa1100wdt_users)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Activate SA1100 Watchdog timer */
 	OSMR3 = OSCR + pre_margin;
 	OSSR = OSSR_M3;
 	OWER = OWER_WME;
 	OIER |= OIER_E3;
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/sb_wdog.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sb_wdog.c
+++ linux-2.6/drivers/watchdog/sb_wdog.c
@@ -45,6 +45,7 @@
  */
 #include <linux/module.h>
 #include <linux/io.h>
+#include <linux/smp_lock.h>
 #include <linux/uaccess.h>
 #include <linux/fs.h>
 #include <linux/reboot.h>
@@ -96,8 +97,10 @@ static struct watchdog_info ident = {
  */
 static int sbwdog_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	nonseekable_open(inode, file);
 	if (test_and_set_bit(0, &sbwdog_gate)) {
+		unlock_kernel();
 		return -EBUSY;
 	}
 	__module_get(THIS_MODULE);
@@ -107,6 +110,7 @@ static int sbwdog_open(struct inode *ino
 	 */
 	sbwdog_set(user_dog, timeout);
 	__raw_writeb(1, user_dog);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/watchdog/sbc60xxwdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sbc60xxwdt.c
+++ linux-2.6/drivers/watchdog/sbc60xxwdt.c
@@ -46,6 +46,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/jiffies.h>
@@ -191,15 +192,19 @@ static ssize_t fop_write(struct file * f
 
 static int fop_open(struct inode * inode, struct file * file)
 {
+	lock_kernel();
 	/* Just in case we're already talking to someone... */
-	if(test_and_set_bit(0, &wdt_is_open))
+	if(test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	/* Good, fire up the show */
 	wdt_startup();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/sbc7240_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sbc7240_wdt.c
+++ linux-2.6/drivers/watchdog/sbc7240_wdt.c
@@ -25,6 +25,7 @@
 #include <linux/miscdevice.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <asm/atomic.h>
@@ -136,10 +137,14 @@ static ssize_t fop_write(struct file *fi
 
 static int fop_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(SBC7240_OPEN_STATUS_BIT, &wdt_status))
+	lock_kernel();
+	if (test_and_set_bit(SBC7240_OPEN_STATUS_BIT, &wdt_status)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/sbc8360.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sbc8360.c
+++ linux-2.6/drivers/watchdog/sbc8360.c
@@ -37,6 +37,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -257,9 +258,11 @@ static ssize_t sbc8360_write(struct file
 
 static int sbc8360_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock(&sbc8360_lock);
 	if (test_and_set_bit(0, &sbc8360_is_open)) {
 		spin_unlock(&sbc8360_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 	if (nowayout)
@@ -269,6 +272,7 @@ static int sbc8360_open(struct inode *in
 	spin_unlock(&sbc8360_lock);
 	sbc8360_activate();
 	sbc8360_ping();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/sbc_epx_c3.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sbc_epx_c3.c
+++ linux-2.6/drivers/watchdog/sbc_epx_c3.c
@@ -15,6 +15,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -63,8 +64,11 @@ static void epx_c3_pet(void)
  */
 static int epx_c3_open(struct inode *inode, struct file *file)
 {
-	if (epx_c3_alive)
+	lock_kernel();
+	if (epx_c3_alive) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -75,6 +79,7 @@ static int epx_c3_open(struct inode *ino
 
 	epx_c3_alive = 1;
 	printk(KERN_INFO "Started watchdog timer.\n");
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/sc1200wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sc1200wdt.c
+++ linux-2.6/drivers/watchdog/sc1200wdt.c
@@ -33,6 +33,7 @@
 #include <linux/watchdog.h>
 #include <linux/ioport.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
 #include <linux/init.h>
@@ -151,14 +152,18 @@ static inline int sc1200wdt_status(void)
 static int sc1200wdt_open(struct inode *inode, struct file *file)
 {
 	/* allow one at a time */
-	if (down_trylock(&open_sem))
+	lock_kernel();
+	if (down_trylock(&open_sem)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (timeout > MAX_TIMEOUT)
 		timeout = MAX_TIMEOUT;
 
 	sc1200wdt_start();
 	printk(KERN_INFO PFX "Watchdog enabled, timeout = %d min(s)", timeout);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/sc520_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/sc520_wdt.c
+++ linux-2.6/drivers/watchdog/sc520_wdt.c
@@ -55,6 +55,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/types.h>
+#include <linux/smp_lock.h>
 #include <linux/timer.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -249,13 +250,17 @@ static ssize_t fop_write(struct file * f
 static int fop_open(struct inode * inode, struct file * file)
 {
 	/* Just in case we're already talking to someone... */
-	if(test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if(test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	/* Good, fire up the show */
 	wdt_startup();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/scx200_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/scx200_wdt.c
+++ linux-2.6/drivers/watchdog/scx200_wdt.c
@@ -21,6 +21,7 @@
 #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
@@ -92,9 +93,13 @@ static void scx200_wdt_disable(void)
 static int scx200_wdt_open(struct inode *inode, struct file *file)
 {
 	/* only allow one at a time */
-	if (down_trylock(&open_semaphore))
+	lock_kernel();
+	if (down_trylock(&open_semaphore)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	scx200_wdt_enable();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/shwdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/shwdt.c
+++ linux-2.6/drivers/watchdog/shwdt.c
@@ -22,6 +22,7 @@
 #include <linux/init.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/reboot.h>
 #include <linux/notifier.h>
@@ -194,12 +195,16 @@ static void sh_wdt_ping(unsigned long da
  */
 static int sh_wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &shwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &shwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	sh_wdt_start();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/smsc37b787_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/smsc37b787_wdt.c
+++ linux-2.6/drivers/watchdog/smsc37b787_wdt.c
@@ -45,6 +45,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -346,8 +347,11 @@ static int wb_smsc_wdt_open(struct inode
 {
 	/* /dev/watchdog can only be opened once */
 
-	if (test_and_set_bit(0, &timer_enabled))
+	lock_kernel();
+	if (test_and_set_bit(0, &timer_enabled)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
@@ -356,6 +360,7 @@ static int wb_smsc_wdt_open(struct inode
 	wb_smsc_wdt_enable();
 
 	printk(KERN_INFO MODNAME "Watchdog enabled. Timeout set to %d %s.\n", timeout, (unit == UNIT_SECOND) ? "second(s)" : "minute(s)");
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/softdog.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/softdog.c
+++ linux-2.6/drivers/watchdog/softdog.c
@@ -38,6 +38,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/miscdevice.h>
@@ -132,14 +133,18 @@ static int softdog_set_heartbeat(int t)
 
 static int softdog_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &driver_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &driver_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	if (!test_and_clear_bit(0, &orphan_timer))
 		__module_get(THIS_MODULE);
 	/*
 	 *	Activate timer
 	 */
 	softdog_keepalive();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/txx9wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/txx9wdt.c
+++ linux-2.6/drivers/watchdog/txx9wdt.c
@@ -9,6 +9,7 @@
  */
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -70,8 +71,11 @@ static void txx9wdt_stop(void)
 
 static int txx9wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &txx9wdt_alive))
+	lock_kernel();
+	if (test_and_set_bit(0, &txx9wdt_alive)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (__raw_readl(&txx9wdt_reg->tcr) & TXx9_TMTCR_TCE) {
 		clear_bit(0, &txx9wdt_alive);
@@ -82,6 +86,7 @@ static int txx9wdt_open(struct inode *in
 		__module_get(THIS_MODULE);
 
 	txx9wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/w83627hf_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/w83627hf_wdt.c
+++ linux-2.6/drivers/watchdog/w83627hf_wdt.c
@@ -28,6 +28,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -256,13 +257,17 @@ wdt_ioctl(struct inode *inode, struct fi
 static int
 wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	/*
 	 *	Activate
 	 */
 
 	wdt_ping();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/w83697hf_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/w83697hf_wdt.c
+++ linux-2.6/drivers/watchdog/w83697hf_wdt.c
@@ -27,6 +27,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -280,13 +281,17 @@ wdt_ioctl(struct inode *inode, struct fi
 static int
 wdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	/*
 	 *	Activate
 	 */
 
 	wdt_enable();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/w83877f_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/w83877f_wdt.c
+++ linux-2.6/drivers/watchdog/w83877f_wdt.c
@@ -41,6 +41,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/jiffies.h>
@@ -214,11 +215,15 @@ static ssize_t fop_write(struct file * f
 static int fop_open(struct inode * inode, struct file * file)
 {
 	/* Just in case we're already talking to someone... */
-	if(test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if(test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/* Good, fire up the show */
 	wdt_startup();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/w83977f_wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/w83977f_wdt.c
+++ linux-2.6/drivers/watchdog/w83977f_wdt.c
@@ -17,6 +17,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -290,13 +291,17 @@ static int wdt_get_status(int *status)
 static int wdt_open(struct inode *inode, struct file *file)
 {
 	/* If the watchdog is alive we don't need to start it again */
-	if( test_and_set_bit(0, &timer_alive) )
+	lock_kernel();
+	if( test_and_set_bit(0, &timer_alive) ) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/wafer5823wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wafer5823wdt.c
+++ linux-2.6/drivers/watchdog/wafer5823wdt.c
@@ -29,6 +29,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/fs.h>
 #include <linux/ioport.h>
@@ -181,13 +182,17 @@ static int wafwdt_ioctl(struct inode *in
 
 static int wafwdt_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, &wafwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &wafwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	/*
 	 *      Activate
 	 */
 	wafwdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/wdrtas.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wdrtas.c
+++ linux-2.6/drivers/watchdog/wdrtas.c
@@ -33,6 +33,7 @@
 #include <linux/module.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/watchdog.h>
 
@@ -403,13 +404,16 @@ static int
 wdrtas_open(struct inode *inode, struct file *file)
 {
 	/* only open once */
+	lock_kernel();
 	if (atomic_inc_return(&wdrtas_miscdev_open) > 1) {
 		atomic_dec(&wdrtas_miscdev_open);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
 	wdrtas_timer_start();
 	wdrtas_timer_keepalive();
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/watchdog/wdt.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wdt.c
+++ linux-2.6/drivers/watchdog/wdt.c
@@ -34,6 +34,7 @@
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -405,12 +406,16 @@ static int wdt_ioctl(struct inode *inode
 
 static int wdt_open(struct inode *inode, struct file *file)
 {
-	if(test_and_set_bit(0, &wdt_is_open))
+	lock_kernel();
+	if(test_and_set_bit(0, &wdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	/*
 	 *	Activate
 	 */
 	wdt_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/wdt285.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wdt285.c
+++ linux-2.6/drivers/watchdog/wdt285.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/types.h>
+#include <linux/smp_lock.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -69,11 +70,16 @@ static int watchdog_open(struct inode *i
 {
 	int ret;
 
-	if (*CSR_SA110_CNTL & (1 << 13))
+	lock_kernel();
+	if (*CSR_SA110_CNTL & (1 << 13)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
-	if (test_and_set_bit(1, &timer_alive))
+	if (test_and_set_bit(1, &timer_alive)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	reload = soft_margin * (mem_fclk_21285 / 256);
 
@@ -98,6 +104,7 @@ static int watchdog_open(struct inode *i
 	ret = 0;
 #endif
 	nonseekable_open(inode, file);
+	unlock_kernel();
 	return ret;
 }
 
Index: linux-2.6/drivers/watchdog/wdt977.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wdt977.c
+++ linux-2.6/drivers/watchdog/wdt977.c
@@ -24,6 +24,7 @@
 
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
@@ -258,13 +259,17 @@ static int wdt977_get_status(int *status
 static int wdt977_open(struct inode *inode, struct file *file)
 {
 	/* If the watchdog is alive we don't need to start it again */
-	if( test_and_set_bit(0,&timer_alive) )
+	lock_kernel();
+	if (test_and_set_bit(0,&timer_alive)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout)
 		__module_get(THIS_MODULE);
 
 	wdt977_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/watchdog/wdt_pci.c
===================================================================
--- linux-2.6.orig/drivers/watchdog/wdt_pci.c
+++ linux-2.6/drivers/watchdog/wdt_pci.c
@@ -38,6 +38,7 @@
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/watchdog.h>
@@ -426,8 +427,11 @@ static int wdtpci_ioctl(struct inode *in
 
 static int wdtpci_open(struct inode *inode, struct file *file)
 {
-	if (down_trylock(&open_sem))
+	lock_kernel();
+	if (down_trylock(&open_sem)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if (nowayout) {
 		__module_get(THIS_MODULE);
@@ -436,6 +440,7 @@ static int wdtpci_open(struct inode *ino
 	 *	Activate
 	 */
 	wdtpci_start();
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-19 23:07           ` Arnd Bergmann
       [not found]             ` <200805200111.47275.arnd@arndb.de>
@ 2008-05-19 23:26             ` Arnd Bergmann
  2008-05-20  0:07               ` Mike Frysinger
                                 ` (2 more replies)
  2008-05-19 23:34             ` [PATCH 3/3, RFC] remove BKL from misc_open() Arnd Bergmann
  2008-05-20 15:13             ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  3 siblings, 3 replies; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-19 23:26 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel

The Big Kernel Lock has been pushed down from chardev_open
to misc_open, this change moves it to the individual misc
driver open functions.

As before, the change was purely mechanical, most drivers
should actually not need the BKL. In particular, we still
hold the misc_mtx() while calling the open() function
The patch should probably be split into one changeset
per driver.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
Index: linux-2.6/arch/arm/common/rtctime.c
===================================================================
--- linux-2.6.orig/arch/arm/common/rtctime.c
+++ linux-2.6/arch/arm/common/rtctime.c
@@ -16,6 +16,7 @@
 #include <linux/poll.h>
 #include <linux/proc_fs.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/spinlock.h>
 #include <linux/capability.h>
 #include <linux/device.h>
@@ -282,6 +283,7 @@ static int rtc_open(struct inode *inode,
 {
 	int ret;
 
+	lock_kernel();
 	mutex_lock(&rtc_mutex);
 
 	if (rtc_inuse) {
@@ -301,6 +303,7 @@ static int rtc_open(struct inode *inode,
 		}
 	}
 	mutex_unlock(&rtc_mutex);
+	unlock_kernel();
 
 	return ret;
 }
Index: linux-2.6/arch/blackfin/mach-bf561/coreb.c
===================================================================
--- linux-2.6.orig/arch/blackfin/mach-bf561/coreb.c
+++ linux-2.6/arch/blackfin/mach-bf561/coreb.c
@@ -32,6 +32,7 @@
 #include <linux/device.h>
 #include <linux/ioport.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/uaccess.h>
 #include <linux/fs.h>
 #include <asm/dma.h>
@@ -196,6 +197,7 @@ static loff_t coreb_lseek(struct file *f
 
 static int coreb_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock_irq(&coreb_lock);
 
 	if (coreb_status & COREB_IS_OPEN)
@@ -204,10 +206,12 @@ static int coreb_open(struct inode *inod
 	coreb_status |= COREB_IS_OPEN;
 
 	spin_unlock_irq(&coreb_lock);
+	unlock_kernel();
 	return 0;
 
  out_busy:
 	spin_unlock_irq(&coreb_lock);
+	unlock_kernel();
 	return -EBUSY;
 }
 
Index: linux-2.6/arch/m68k/bvme6000/rtc.c
===================================================================
--- linux-2.6.orig/arch/m68k/bvme6000/rtc.c
+++ linux-2.6/arch/m68k/bvme6000/rtc.c
@@ -10,6 +10,7 @@
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/ioport.h>
 #include <linux/capability.h>
 #include <linux/fcntl.h>
@@ -140,10 +141,14 @@ static int rtc_ioctl(struct inode *inode
 
 static int rtc_open(struct inode *inode, struct file *file)
 {
-	if(rtc_status)
+	lock_kernel();
+	if(rtc_status) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	rtc_status = 1;
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/arch/m68k/mvme16x/rtc.c
===================================================================
--- linux-2.6.orig/arch/m68k/mvme16x/rtc.c
+++ linux-2.6/arch/m68k/mvme16x/rtc.c
@@ -10,6 +10,7 @@
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/ioport.h>
 #include <linux/capability.h>
 #include <linux/fcntl.h>
@@ -127,11 +128,14 @@ static int rtc_ioctl(struct inode *inode
 
 static int rtc_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	if( !atomic_dec_and_test(&rtc_ready) )
 	{
 		atomic_inc( &rtc_ready );
+		unlock_kernel();
 		return -EBUSY;
 	}
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/arch/mips/basler/excite/excite_iodev.c
===================================================================
--- linux-2.6.orig/arch/mips/basler/excite/excite_iodev.c
+++ linux-2.6/arch/mips/basler/excite/excite_iodev.c
@@ -26,6 +26,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 
 #include "excite_iodev.h"
 
@@ -110,8 +111,14 @@ static int __exit iodev_remove(struct de
 
 static int iodev_open(struct inode *i, struct file *f)
 {
-	return request_irq(iodev_irq, iodev_irqhdl, IRQF_DISABLED,
+	int ret;
+
+	lock_kernel();
+	ret = request_irq(iodev_irq, iodev_irqhdl, IRQF_DISABLED,
 			   iodev_name, &miscdev);
+	unlock_kernel();
+
+	return ret;
 }
 
 static int iodev_release(struct inode *i, struct file *f)
Index: linux-2.6/arch/parisc/kernel/perf.c
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/perf.c
+++ linux-2.6/arch/parisc/kernel/perf.c
@@ -46,6 +46,7 @@
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/spinlock.h>
 
 #include <asm/uaccess.h>
@@ -260,13 +261,16 @@ printk("Preparing to start counters\n");
  */
 static int perf_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock(&perf_lock);
 	if (perf_enabled) {
 		spin_unlock(&perf_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 	perf_enabled = 1;
  	spin_unlock(&perf_lock);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/arch/s390/crypto/prng.c
===================================================================
--- linux-2.6.orig/arch/s390/crypto/prng.c
+++ linux-2.6/arch/s390/crypto/prng.c
@@ -6,6 +6,7 @@
 #include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
+#include <linux/smp_lock.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
@@ -48,6 +49,7 @@ static unsigned char parm_block[32] = {
 
 static int prng_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/arch/sparc/kernel/apc.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/apc.c
+++ linux-2.6/arch/sparc/kernel/apc.c
@@ -10,6 +10,7 @@
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 #include <linux/pm.h>
 
 #include <asm/io.h>
@@ -75,6 +76,7 @@ static inline void apc_free(void)
 
 static int apc_open(struct inode *inode, struct file *f)
 {
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/arch/sparc64/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/sparc64/kernel/time.c
+++ linux-2.6/arch/sparc64/kernel/time.c
@@ -11,6 +11,7 @@
 #include <linux/errno.h>
 #include <linux/module.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/kernel.h>
 #include <linux/param.h>
 #include <linux/string.h>
@@ -1659,10 +1660,14 @@ static int mini_rtc_ioctl(struct inode *
 
 static int mini_rtc_open(struct inode *inode, struct file *file)
 {
-	if (mini_rtc_status & RTC_IS_OPEN)
+	lock_kernel();
+	if (mini_rtc_status & RTC_IS_OPEN) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	mini_rtc_status |= RTC_IS_OPEN;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/arch/um/drivers/harddog_kern.c
===================================================================
--- linux-2.6.orig/arch/um/drivers/harddog_kern.c
+++ linux-2.6/arch/um/drivers/harddog_kern.c
@@ -66,6 +66,7 @@ static int harddog_open(struct inode *in
 	int err = -EBUSY;
 	char *sock = NULL;
 
+	lock_kernel();
 	spin_lock(&lock);
 	if(timer_alive)
 		goto err;
@@ -82,9 +83,11 @@ static int harddog_open(struct inode *in
 
 	timer_alive = 1;
 	spin_unlock(&lock);
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 err:
 	spin_unlock(&lock);
+	unlock_kernel();
 	return err;
 }
 
Index: linux-2.6/arch/um/drivers/mmapper_kern.c
===================================================================
--- linux-2.6.orig/arch/um/drivers/mmapper_kern.c
+++ linux-2.6/arch/um/drivers/mmapper_kern.c
@@ -16,6 +16,7 @@
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/smp_lock.h>
 #include <asm/uaccess.h>
 #include "mem_user.h"
 
@@ -77,6 +78,7 @@ out:
 
 static int mmapper_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/arch/um/drivers/random.c
===================================================================
--- linux-2.6.orig/arch/um/drivers/random.c
+++ linux-2.6/arch/um/drivers/random.c
@@ -7,6 +7,7 @@
  * of the GNU General Public License, incorporated herein by reference.
  */
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/interrupt.h>
@@ -33,6 +34,8 @@ static DECLARE_WAIT_QUEUE_HEAD(host_read
 
 static int rng_dev_open (struct inode *inode, struct file *filp)
 {
+	cycle_kernel_lock();
+
 	/* enforce read-only access to this chrdev */
 	if ((filp->f_mode & FMODE_READ) == 0)
 		return -EINVAL;
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -204,6 +204,7 @@
 #include <linux/module.h>
 
 #include <linux/poll.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/stddef.h>
 #include <linux/timer.h>
@@ -1544,10 +1545,12 @@ static int do_open(struct inode *inode, 
 {
 	struct apm_user *as;
 
+	lock_kernel();
 	as = kmalloc(sizeof(*as), GFP_KERNEL);
 	if (as == NULL) {
 		printk(KERN_ERR "apm: cannot allocate struct of size %d bytes\n",
 		       sizeof(*as));
+		       unlock_kernel();
 		return -ENOMEM;
 	}
 	as->magic = APM_BIOS_MAGIC;
@@ -1569,6 +1572,7 @@ static int do_open(struct inode *inode, 
 	user_list = as;
 	spin_unlock(&user_list_lock);
 	filp->private_data = as;
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/string.h>
 #include <linux/rcupdate.h>
 #include <linux/kallsyms.h>
@@ -527,10 +528,12 @@ static int open_exclu;	/* already open e
 
 static int mce_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock(&mce_state_lock);
 
 	if (open_exclu || (open_count && (file->f_flags & O_EXCL))) {
 		spin_unlock(&mce_state_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
@@ -539,6 +542,7 @@ static int mce_open(struct inode *inode,
 	open_count++;
 
 	spin_unlock(&mce_state_lock);
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/arch/x86/kernel/microcode.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode.c
+++ linux-2.6/arch/x86/kernel/microcode.c
@@ -75,6 +75,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/cpumask.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -422,6 +423,7 @@ out:
 
 static int microcode_open (struct inode *unused1, struct file *unused2)
 {
+	cycle_kernel_lock();
 	return capable(CAP_SYS_RAWIO) ? 0 : -EPERM;
 }
 
Index: linux-2.6/drivers/bluetooth/hci_vhci.c
===================================================================
--- linux-2.6.orig/drivers/bluetooth/hci_vhci.c
+++ linux-2.6/drivers/bluetooth/hci_vhci.c
@@ -28,6 +28,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
@@ -263,9 +264,11 @@ static int vhci_open(struct inode *inode
 	skb_queue_head_init(&data->readq);
 	init_waitqueue_head(&data->read_wait);
 
+	lock_kernel();
 	hdev = hci_alloc_dev();
 	if (!hdev) {
 		kfree(data);
+		unlock_kernel();
 		return -ENOMEM;
 	}
 
@@ -286,10 +289,12 @@ static int vhci_open(struct inode *inode
 		BT_ERR("Can't register HCI device");
 		kfree(data);
 		hci_free_dev(hdev);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
 	file->private_data = data;
+	unlock_kernel();
 
 	return nonseekable_open(inode, file);
 }
Index: linux-2.6/drivers/char/agp/frontend.c
===================================================================
--- linux-2.6.orig/drivers/char/agp/frontend.c
+++ linux-2.6/drivers/char/agp/frontend.c
@@ -39,6 +39,7 @@
 #include <linux/mm.h>
 #include <linux/fs.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include "agp.h"
@@ -677,6 +678,7 @@ static int agp_open(struct inode *inode,
 	struct agp_client *client;
 	int rc = -ENXIO;
 
+	lock_kernel();
 	mutex_lock(&(agp_fe.agp_mutex));
 
 	if (minor != AGPGART_MINOR)
@@ -703,12 +705,14 @@ static int agp_open(struct inode *inode,
 	agp_insert_file_private(priv);
 	DBG("private=%p, client=%p", priv, client);
 	mutex_unlock(&(agp_fe.agp_mutex));
+	unlock_kernel();
 	return 0;
 
 err_out_nomem:
 	rc = -ENOMEM;
 err_out:
 	mutex_unlock(&(agp_fe.agp_mutex));
+	unlock_kernel();
 	return rc;
 }
 
Index: linux-2.6/drivers/char/apm-emulation.c
===================================================================
--- linux-2.6.orig/drivers/char/apm-emulation.c
+++ linux-2.6/drivers/char/apm-emulation.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/poll.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/miscdevice.h>
@@ -416,6 +417,7 @@ static int apm_open(struct inode * inode
 {
 	struct apm_user *as;
 
+	lock_kernel();
 	as = kzalloc(sizeof(*as), GFP_KERNEL);
 	if (as) {
 		/*
@@ -435,6 +437,7 @@ static int apm_open(struct inode * inode
 
 		filp->private_data = as;
 	}
+	unlock_kernel();
 
 	return as ? 0 : -ENOMEM;
 }
Index: linux-2.6/drivers/char/briq_panel.c
===================================================================
--- linux-2.6.orig/drivers/char/briq_panel.c
+++ linux-2.6/drivers/char/briq_panel.c
@@ -6,6 +6,7 @@
 
 #include <linux/module.h>
 
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/tty.h>
@@ -67,11 +68,15 @@ static void set_led(char state)
 
 static int briq_panel_open(struct inode *ino, struct file *filep)
 {
-	/* enforce single access */
-	if (vfd_is_open)
+	lock_kernel();
+	/* enforce single access, vfd_is_open is protected by BKL */
+	if (vfd_is_open) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	vfd_is_open = 1;
 
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/char/ds1286.c
===================================================================
--- linux-2.6.orig/drivers/char/ds1286.c
+++ linux-2.6/drivers/char/ds1286.c
@@ -27,6 +27,7 @@
  * option) any later version.
  */
 #include <linux/ds1286.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -252,6 +253,7 @@ static int ds1286_ioctl(struct inode *in
 
 static int ds1286_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock_irq(&ds1286_lock);
 
 	if (ds1286_status & RTC_IS_OPEN)
@@ -260,10 +262,12 @@ static int ds1286_open(struct inode *ino
 	ds1286_status |= RTC_IS_OPEN;
 
 	spin_unlock_irq(&ds1286_lock);
+	unlock_kernel();
 	return 0;
 
 out_busy:
 	spin_lock_irq(&ds1286_lock);
+	unlock_kernel();
 	return -EBUSY;
 }
 
Index: linux-2.6/drivers/char/ds1620.c
===================================================================
--- linux-2.6.orig/drivers/char/ds1620.c
+++ linux-2.6/drivers/char/ds1620.c
@@ -8,6 +8,7 @@
 #include <linux/proc_fs.h>
 #include <linux/capability.h>
 #include <linux/init.h>
+#include <linux/smp_lock.h>
 
 #include <asm/hardware.h>
 #include <asm/mach-types.h>
@@ -208,6 +209,12 @@ static void ds1620_read_state(struct the
 	therm->hi = cvt_9_to_int(ds1620_in(THERM_READ_TH, 9));
 }
 
+static int ds1620_open(struct inode *inode, struct file *file)
+{
+	cycle_kernel_lock();
+	return nonseekable_open(inode, file);
+}
+
 static ssize_t
 ds1620_read(struct file *file, char __user *buf, size_t count, loff_t *ptr)
 {
@@ -336,7 +343,7 @@ static struct proc_dir_entry *proc_therm
 
 static const struct file_operations ds1620_fops = {
 	.owner		= THIS_MODULE,
-	.open		= nonseekable_open,
+	.open		= ds1620_open,
 	.read		= ds1620_read,
 	.ioctl		= ds1620_ioctl,
 };
Index: linux-2.6/drivers/char/efirtc.c
===================================================================
--- linux-2.6.orig/drivers/char/efirtc.c
+++ linux-2.6/drivers/char/efirtc.c
@@ -28,6 +28,7 @@
  */
 
 
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -272,6 +273,7 @@ efi_rtc_open(struct inode *inode, struct
 	 * We do accept multiple open files at the same time as we
 	 * synchronize on the per call operation.
 	 */
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/drivers/char/genrtc.c
===================================================================
--- linux-2.6.orig/drivers/char/genrtc.c
+++ linux-2.6/drivers/char/genrtc.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/poll.h>
 #include <linux/proc_fs.h>
+#include <linux/smp_lock.h>
 #include <linux/workqueue.h>
 
 #include <asm/uaccess.h>
@@ -338,12 +339,16 @@ static int gen_rtc_ioctl(struct inode *i
 
 static int gen_rtc_open(struct inode *inode, struct file *file)
 {
-	if (gen_rtc_status & RTC_IS_OPEN)
+	lock_kernel();
+	if (gen_rtc_status & RTC_IS_OPEN) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	gen_rtc_status |= RTC_IS_OPEN;
 	gen_rtc_irq_data = 0;
 	irq_active = 0;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/char/hpet.c
===================================================================
--- linux-2.6.orig/drivers/char/hpet.c
+++ linux-2.6/drivers/char/hpet.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/major.h>
@@ -193,6 +194,7 @@ static int hpet_open(struct inode *inode
 	if (file->f_mode & FMODE_WRITE)
 		return -EINVAL;
 
+	lock_kernel();
 	spin_lock_irq(&hpet_lock);
 
 	for (devp = NULL, hpetp = hpets; hpetp && !devp; hpetp = hpetp->hp_next)
@@ -207,6 +209,7 @@ static int hpet_open(struct inode *inode
 
 	if (!devp) {
 		spin_unlock_irq(&hpet_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
@@ -214,6 +217,7 @@ static int hpet_open(struct inode *inode
 	devp->hd_irqdata = 0;
 	devp->hd_flags |= HPET_OPEN;
 	spin_unlock_irq(&hpet_lock);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/char/hw_random/core.c
===================================================================
--- linux-2.6.orig/drivers/char/hw_random/core.c
+++ linux-2.6/drivers/char/hw_random/core.c
@@ -37,6 +37,7 @@
 #include <linux/kernel.h>
 #include <linux/fs.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
 #include <linux/delay.h>
@@ -86,6 +87,7 @@ static int rng_dev_open(struct inode *in
 		return -EINVAL;
 	if (filp->f_mode & FMODE_WRITE)
 		return -EINVAL;
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/drivers/char/ip27-rtc.c
===================================================================
--- linux-2.6.orig/drivers/char/ip27-rtc.c
+++ linux-2.6/drivers/char/ip27-rtc.c
@@ -27,6 +27,7 @@
 #include <linux/bcd.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/miscdevice.h>
 #include <linux/ioport.h>
@@ -163,15 +164,18 @@ static long rtc_ioctl(struct file *filp,
 
 static int rtc_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock_irq(&rtc_lock);
 
 	if (rtc_status & RTC_IS_OPEN) {
 		spin_unlock_irq(&rtc_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
 	rtc_status |= RTC_IS_OPEN;
 	spin_unlock_irq(&rtc_lock);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/char/ipmi/ipmi_watchdog.c
===================================================================
--- linux-2.6.orig/drivers/char/ipmi/ipmi_watchdog.c
+++ linux-2.6/drivers/char/ipmi/ipmi_watchdog.c
@@ -35,6 +35,7 @@
 #include <linux/moduleparam.h>
 #include <linux/ipmi.h>
 #include <linux/ipmi_smi.h>
+#include <linux/smp_lock.h>
 #include <linux/watchdog.h>
 #include <linux/miscdevice.h>
 #include <linux/init.h>
@@ -819,6 +820,8 @@ static int ipmi_open(struct inode *ino, 
 		if (test_and_set_bit(0, &ipmi_wdog_open))
 			return -EBUSY;
 
+		cycle_kernel_lock();
+
 		/*
 		 * Don't start the timer now, let it start on the
 		 * first heartbeat.
Index: linux-2.6/drivers/char/lcd.c
===================================================================
--- linux-2.6.orig/drivers/char/lcd.c
+++ linux-2.6/drivers/char/lcd.c
@@ -20,6 +20,7 @@
 #include <linux/mc146818rtc.h>
 #include <linux/netdevice.h>
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include <linux/delay.h>
 
 #include <asm/io.h>
@@ -414,6 +415,8 @@ static int lcd_ioctl(struct inode *inode
 
 static int lcd_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
+
 	if (!lcd_present)
 		return -ENXIO;
 	else
Index: linux-2.6/drivers/char/mwave/mwavedd.c
===================================================================
--- linux-2.6.orig/drivers/char/mwave/mwavedd.c
+++ linux-2.6/drivers/char/mwave/mwavedd.c
@@ -56,6 +56,7 @@
 #include <linux/serial.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 #include <linux/delay.h>
 #include <linux/serial_8250.h>
 #include "smapi.h"
@@ -100,6 +101,7 @@ static int mwave_open(struct inode *inod
 	PRINTK_2(TRACE_MWAVE,
 		"mwavedd::mwave_open, exit return retval %x\n", retval);
 
+	cycle_kernel_lock();
 	return retval;
 }
 
Index: linux-2.6/drivers/char/nvram.c
===================================================================
--- linux-2.6.orig/drivers/char/nvram.c
+++ linux-2.6/drivers/char/nvram.c
@@ -107,6 +107,7 @@
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -333,12 +334,14 @@ nvram_ioctl(struct inode *inode, struct 
 static int
 nvram_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock(&nvram_state_lock);
 
 	if ((nvram_open_cnt && (file->f_flags & O_EXCL)) ||
 	    (nvram_open_mode & NVRAM_EXCL) ||
 	    ((file->f_mode & 2) && (nvram_open_mode & NVRAM_WRITE))) {
 		spin_unlock(&nvram_state_lock);
+		unlock_kernel();
 		return -EBUSY;
 	}
 
@@ -349,6 +352,7 @@ nvram_open(struct inode *inode, struct f
 	nvram_open_cnt++;
 
 	spin_unlock(&nvram_state_lock);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/char/rtc.c
===================================================================
--- linux-2.6.orig/drivers/char/rtc.c
+++ linux-2.6/drivers/char/rtc.c
@@ -73,6 +73,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 #include <linux/sysctl.h>
 #include <linux/wait.h>
 #include <linux/bcd.h>
@@ -733,6 +734,7 @@ static int rtc_ioctl(struct inode *inode
  * needed here. Or anywhere else in this driver. */
 static int rtc_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	spin_lock_irq(&rtc_lock);
 
 	if (rtc_status & RTC_IS_OPEN)
@@ -742,10 +744,12 @@ static int rtc_open(struct inode *inode,
 
 	rtc_irq_data = 0;
 	spin_unlock_irq(&rtc_lock);
+	unlock_kernel();
 	return 0;
 
 out_busy:
 	spin_unlock_irq(&rtc_lock);
+	unlock_kernel();
 	return -EBUSY;
 }
 
Index: linux-2.6/drivers/char/sonypi.c
===================================================================
--- linux-2.6.orig/drivers/char/sonypi.c
+++ linux-2.6/drivers/char/sonypi.c
@@ -49,6 +49,7 @@
 #include <linux/err.h>
 #include <linux/kfifo.h>
 #include <linux/platform_device.h>
+#include <linux/smp_lock.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -906,12 +907,14 @@ static int sonypi_misc_release(struct in
 
 static int sonypi_misc_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	mutex_lock(&sonypi_device.lock);
 	/* Flush input queue on first open */
 	if (!sonypi_device.open_count)
 		kfifo_reset(sonypi_device.fifo);
 	sonypi_device.open_count++;
 	mutex_unlock(&sonypi_device.lock);
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/char/tpm/tpm.c
===================================================================
--- linux-2.6.orig/drivers/char/tpm/tpm.c
+++ linux-2.6/drivers/char/tpm/tpm.c
@@ -26,6 +26,7 @@
 #include <linux/poll.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 
 #include "tpm.h"
 
@@ -897,6 +898,7 @@ int tpm_open(struct inode *inode, struct
 	int rc = 0, minor = iminor(inode);
 	struct tpm_chip *chip = NULL, *pos;
 
+	lock_kernel();
 	spin_lock(&driver_lock);
 
 	list_for_each_entry(pos, &tpm_chip_list, list) {
@@ -926,16 +928,19 @@ int tpm_open(struct inode *inode, struct
 	if (chip->data_buffer == NULL) {
 		chip->num_opens--;
 		put_device(chip->dev);
+		unlock_kernel();
 		return -ENOMEM;
 	}
 
 	atomic_set(&chip->data_pending, 0);
 
 	file->private_data = chip;
+	unlock_kernel();
 	return 0;
 
 err_out:
 	spin_unlock(&driver_lock);
+	unlock_kernel();
 	return rc;
 }
 EXPORT_SYMBOL_GPL(tpm_open);
Index: linux-2.6/drivers/infiniband/core/ucma.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/core/ucma.c
+++ linux-2.6/drivers/infiniband/core/ucma.c
@@ -38,6 +38,7 @@
 #include <linux/in.h>
 #include <linux/in6.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 
 #include <rdma/rdma_user_cm.h>
 #include <rdma/ib_marshall.h>
@@ -1156,6 +1157,7 @@ static int ucma_open(struct inode *inode
 	if (!file)
 		return -ENOMEM;
 
+	lock_kernel();
 	INIT_LIST_HEAD(&file->event_list);
 	INIT_LIST_HEAD(&file->ctx_list);
 	init_waitqueue_head(&file->poll_wait);
@@ -1163,6 +1165,7 @@ static int ucma_open(struct inode *inode
 
 	filp->private_data = file;
 	file->filp = filp;
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/input/misc/hp_sdc_rtc.c
===================================================================
--- linux-2.6.orig/drivers/input/misc/hp_sdc_rtc.c
+++ linux-2.6/drivers/input/misc/hp_sdc_rtc.c
@@ -35,6 +35,7 @@
 
 #include <linux/hp_sdc.h>
 #include <linux/errno.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -408,6 +409,7 @@ static unsigned int hp_sdc_rtc_poll(stru
 
 static int hp_sdc_rtc_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
         return 0;
 }
 
Index: linux-2.6/drivers/input/misc/uinput.c
===================================================================
--- linux-2.6.orig/drivers/input/misc/uinput.c
+++ linux-2.6/drivers/input/misc/uinput.c
@@ -37,6 +37,7 @@
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
 #include <linux/uinput.h>
+#include <linux/smp_lock.h>
 
 static int uinput_dev_event(struct input_dev *dev, unsigned int type, unsigned int code, int value)
 {
@@ -222,6 +223,7 @@ static int uinput_open(struct inode *ino
 	if (!newdev)
 		return -ENOMEM;
 
+	lock_kernel();
 	mutex_init(&newdev->mutex);
 	spin_lock_init(&newdev->requests_lock);
 	init_waitqueue_head(&newdev->requests_waitq);
@@ -229,6 +231,7 @@ static int uinput_open(struct inode *ino
 	newdev->state = UIST_NEW_DEVICE;
 
 	file->private_data = newdev;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/input/mousedev.c
===================================================================
--- linux-2.6.orig/drivers/input/mousedev.c
+++ linux-2.6/drivers/input/mousedev.c
@@ -14,6 +14,7 @@
 #define MOUSEDEV_MIX		31
 
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/poll.h>
 #include <linux/module.h>
 #include <linux/init.h>
@@ -545,16 +546,21 @@ static int mousedev_open(struct inode *i
 	if (i >= MOUSEDEV_MINORS)
 		return -ENODEV;
 
+	lock_kernel();
 	error = mutex_lock_interruptible(&mousedev_table_mutex);
-	if (error)
+	if (error) {
+		unlock_kernel();
 		return error;
+	}
 	mousedev = mousedev_table[i];
 	if (mousedev)
 		get_device(&mousedev->dev);
 	mutex_unlock(&mousedev_table_mutex);
 
-	if (!mousedev)
+	if (!mousedev) {
+		unlock_kernel();
 		return -ENODEV;
+	}
 
 	client = kzalloc(sizeof(struct mousedev_client), GFP_KERNEL);
 	if (!client) {
@@ -573,6 +579,7 @@ static int mousedev_open(struct inode *i
 		goto err_free_client;
 
 	file->private_data = client;
+	unlock_kernel();
 	return 0;
 
  err_free_client:
@@ -580,6 +587,7 @@ static int mousedev_open(struct inode *i
 	kfree(client);
  err_put_mousedev:
 	put_device(&mousedev->dev);
+	unlock_kernel();
 	return error;
 }
 
Index: linux-2.6/drivers/input/serio/serio_raw.c
===================================================================
--- linux-2.6.orig/drivers/input/serio/serio_raw.c
+++ linux-2.6/drivers/input/serio/serio_raw.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/poll.h>
 #include <linux/module.h>
 #include <linux/serio.h>
@@ -81,9 +82,10 @@ static int serio_raw_open(struct inode *
 	struct serio_raw_list *list;
 	int retval = 0;
 
+	lock_kernel();
 	retval = mutex_lock_interruptible(&serio_raw_mutex);
 	if (retval)
-		return retval;
+		goto out_bkl;
 
 	if (!(serio_raw = serio_raw_locate(iminor(inode)))) {
 		retval = -ENODEV;
@@ -108,6 +110,8 @@ static int serio_raw_open(struct inode *
 
 out:
 	mutex_unlock(&serio_raw_mutex);
+out_bkl:
+	unlock_kernel();
 	return retval;
 }
 
Index: linux-2.6/drivers/macintosh/ans-lcd.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/ans-lcd.c
+++ linux-2.6/drivers/macintosh/ans-lcd.c
@@ -3,6 +3,7 @@
  */
 
 #include <linux/types.h>
+#include <linux/smp_lock.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
@@ -119,6 +120,7 @@ anslcd_ioctl( struct inode * inode, stru
 static int
 anslcd_open( struct inode * inode, struct file * file )
 {
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/drivers/macintosh/smu.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/smu.c
+++ linux-2.6/drivers/macintosh/smu.c
@@ -19,6 +19,7 @@
  *    the userland interface
  */
 
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/device.h>
@@ -1083,10 +1084,12 @@ static int smu_open(struct inode *inode,
 	pp->mode = smu_file_commands;
 	init_waitqueue_head(&pp->wait);
 
+	lock_kernel();
 	spin_lock_irqsave(&smu_clist_lock, flags);
 	list_add(&pp->list, &smu_clist);
 	spin_unlock_irqrestore(&smu_clist_lock, flags);
 	file->private_data = pp;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/macintosh/via-pmu.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/via-pmu.c
+++ linux-2.6/drivers/macintosh/via-pmu.c
@@ -18,6 +18,7 @@
  *
  */
 #include <stdarg.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -2047,6 +2048,7 @@ pmu_open(struct inode *inode, struct fil
 	pp->rb_get = pp->rb_put = 0;
 	spin_lock_init(&pp->lock);
 	init_waitqueue_head(&pp->wait);
+	lock_kernel();
 	spin_lock_irqsave(&all_pvt_lock, flags);
 #if defined(CONFIG_INPUT_ADBHID) && defined(CONFIG_PMAC_BACKLIGHT)
 	pp->backlight_locker = 0;
@@ -2054,6 +2056,7 @@ pmu_open(struct inode *inode, struct fil
 	list_add(&pp->list, &all_pmu_pvt);
 	spin_unlock_irqrestore(&all_pvt_lock, flags);
 	file->private_data = pp;
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/media/radio/miropcm20-rds.c
===================================================================
--- linux-2.6.orig/drivers/media/radio/miropcm20-rds.c
+++ linux-2.6/drivers/media/radio/miropcm20-rds.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
 #include <linux/delay.h>
@@ -27,13 +28,16 @@ static int rds_f_open(struct inode *in, 
 	if (rds_users)
 		return -EBUSY;
 
+	lock_kernel();
 	rds_users++;
 	if ((text_buffer=kmalloc(66, GFP_KERNEL)) == 0) {
 		rds_users--;
 		printk(KERN_NOTICE "aci-rds: Out of memory by open()...\n");
+		unlock_kernel();
 		return -ENOMEM;
 	}
 
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/message/i2o/i2o_config.c
===================================================================
--- linux-2.6.orig/drivers/message/i2o/i2o_config.c
+++ linux-2.6/drivers/message/i2o/i2o_config.c
@@ -1061,6 +1061,7 @@ static int cfg_open(struct inode *inode,
 	if (!tmp)
 		return -ENOMEM;
 
+	lock_kernel();
 	file->private_data = (void *)(i2o_cfg_info_id++);
 	tmp->fp = file;
 	tmp->fasync = NULL;
@@ -1074,6 +1075,7 @@ static int cfg_open(struct inode *inode,
 	spin_lock_irqsave(&i2o_config_lock, flags);
 	open_files = tmp;
 	spin_unlock_irqrestore(&i2o_config_lock, flags);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/misc/hdpuftrs/hdpu_cpustate.c
===================================================================
--- linux-2.6.orig/drivers/misc/hdpuftrs/hdpu_cpustate.c
+++ linux-2.6/drivers/misc/hdpuftrs/hdpu_cpustate.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 #include <linux/miscdevice.h>
 #include <linux/proc_fs.h>
 #include <linux/hdpu_features.h>
@@ -151,7 +152,13 @@ static ssize_t cpustate_write(struct fil
 
 static int cpustate_open(struct inode *inode, struct file *file)
 {
-	return cpustate_get_ref((file->f_flags & O_EXCL));
+	int ret;
+
+	lock_kernel();
+	ret = cpustate_get_ref((file->f_flags & O_EXCL));
+	unlock_kernel();
+
+	return ret;
 }
 
 static int cpustate_release(struct inode *inode, struct file *file)
Index: linux-2.6/drivers/misc/sony-laptop.c
===================================================================
--- linux-2.6.orig/drivers/misc/sony-laptop.c
+++ linux-2.6/drivers/misc/sony-laptop.c
@@ -46,6 +46,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/init.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/backlight.h>
 #include <linux/platform_device.h>
@@ -1927,8 +1928,10 @@ static int sonypi_misc_release(struct in
 static int sonypi_misc_open(struct inode *inode, struct file *file)
 {
 	/* Flush input queue on first open */
+	lock_kernel();
 	if (atomic_inc_return(&sonypi_compat.open_count) == 1)
 		kfifo_reset(sonypi_compat.fifo);
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/net/tun.c
===================================================================
--- linux-2.6.orig/drivers/net/tun.c
+++ linux-2.6/drivers/net/tun.c
@@ -48,6 +48,7 @@
 #include <linux/kernel.h>
 #include <linux/major.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/poll.h>
 #include <linux/fcntl.h>
 #include <linux/init.h>
@@ -797,6 +798,7 @@ static int tun_chr_fasync(int fd, struct
 
 static int tun_chr_open(struct inode *inode, struct file * file)
 {
+	cycle_kernel_lock();
 	DBG1(KERN_INFO "tunX: tun_chr_open\n");
 	file->private_data = NULL;
 	return 0;
Index: linux-2.6/drivers/parisc/eisa_eeprom.c
===================================================================
--- linux-2.6.orig/drivers/parisc/eisa_eeprom.c
+++ linux-2.6/drivers/parisc/eisa_eeprom.c
@@ -24,6 +24,7 @@
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/fs.h>
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -83,6 +84,8 @@ static int eisa_eeprom_ioctl(struct inod
 
 static int eisa_eeprom_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
+
 	if (file->f_mode & 2)
 		return -EINVAL;
    
Index: linux-2.6/drivers/rtc/rtc-m41t80.c
===================================================================
--- linux-2.6.orig/drivers/rtc/rtc-m41t80.c
+++ linux-2.6/drivers/rtc/rtc-m41t80.c
@@ -17,6 +17,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/string.h>
 #include <linux/i2c.h>
 #include <linux/rtc.h>
@@ -655,12 +656,16 @@ static int wdt_ioctl(struct inode *inode
 static int wdt_open(struct inode *inode, struct file *file)
 {
 	if (MINOR(inode->i_rdev) == WATCHDOG_MINOR) {
-		if (test_and_set_bit(0, &wdt_is_open))
+		lock_kernel();
+		if (test_and_set_bit(0, &wdt_is_open)) {
+			unlock_kernel();
 			return -EBUSY;
+		}
 		/*
 		 *	Activate
 		 */
 		wdt_is_open = 1;
+		unlock_kernel();
 		return 0;
 	}
 	return -ENODEV;
Index: linux-2.6/drivers/s390/block/dasd_eer.c
===================================================================
--- linux-2.6.orig/drivers/s390/block/dasd_eer.c
+++ linux-2.6/drivers/s390/block/dasd_eer.c
@@ -15,6 +15,7 @@
 #include <linux/device.h>
 #include <linux/poll.h>
 #include <linux/mutex.h>
+#include <linux/smp_lock.h>
 
 #include <asm/uaccess.h>
 #include <asm/atomic.h>
@@ -525,6 +526,7 @@ static int dasd_eer_open(struct inode *i
 	eerb = kzalloc(sizeof(struct eerbuffer), GFP_KERNEL);
 	if (!eerb)
 		return -ENOMEM;
+	lock_kernel();
 	eerb->buffer_page_count = eer_pages;
 	if (eerb->buffer_page_count < 1 ||
 	    eerb->buffer_page_count > INT_MAX / PAGE_SIZE) {
@@ -532,6 +534,7 @@ static int dasd_eer_open(struct inode *i
 		MESSAGE(KERN_WARNING, "can't open device since module "
 			"parameter eer_pages is smaller then 1 or"
 			" bigger then %d", (int)(INT_MAX / PAGE_SIZE));
+		unlock_kernel();
 		return -EINVAL;
 	}
 	eerb->buffersize = eerb->buffer_page_count * PAGE_SIZE;
@@ -539,12 +542,14 @@ static int dasd_eer_open(struct inode *i
 			       GFP_KERNEL);
         if (!eerb->buffer) {
 		kfree(eerb);
+		unlock_kernel();
                 return -ENOMEM;
 	}
 	if (dasd_eer_allocate_buffer_pages(eerb->buffer,
 					   eerb->buffer_page_count)) {
 		kfree(eerb->buffer);
 		kfree(eerb);
+		unlock_kernel();
 		return -ENOMEM;
 	}
 	filp->private_data = eerb;
@@ -552,6 +557,7 @@ static int dasd_eer_open(struct inode *i
 	list_add(&eerb->list, &bufferlist);
 	spin_unlock_irqrestore(&bufferlock, flags);
 
+	unlock_kernel();
 	return nonseekable_open(inp,filp);
 }
 
Index: linux-2.6/drivers/s390/char/monreader.c
===================================================================
--- linux-2.6.orig/drivers/s390/char/monreader.c
+++ linux-2.6/drivers/s390/char/monreader.c
@@ -340,6 +340,7 @@ static int mon_open(struct inode *inode,
 	/*
 	 * only one user allowed
 	 */
+	lock_kernel();
 	rc = -EBUSY;
 	if (test_and_set_bit(MON_IN_USE, &mon_in_use))
 		goto out;
@@ -377,6 +378,7 @@ static int mon_open(struct inode *inode,
 	}
 	P_INFO("open, established connection to *MONITOR service\n\n");
 	filp->private_data = monpriv;
+	unlock_kernel();
 	return nonseekable_open(inode, filp);
 
 out_path:
@@ -386,6 +388,7 @@ out_priv:
 out_use:
 	clear_bit(MON_IN_USE, &mon_in_use);
 out:
+	unlock_kernel();
 	return rc;
 }
 
Index: linux-2.6/drivers/s390/char/monwriter.c
===================================================================
--- linux-2.6.orig/drivers/s390/char/monwriter.c
+++ linux-2.6/drivers/s390/char/monwriter.c
@@ -12,6 +12,7 @@
 #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/errno.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
@@ -179,10 +180,12 @@ static int monwrite_open(struct inode *i
 	monpriv = kzalloc(sizeof(struct mon_private), GFP_KERNEL);
 	if (!monpriv)
 		return -ENOMEM;
+	lock_kernel();
 	INIT_LIST_HEAD(&monpriv->list);
 	monpriv->hdr_to_read = sizeof(monpriv->hdr);
 	mutex_init(&monpriv->thread_mutex);
 	filp->private_data = monpriv;
+	unlock_kernel();
 	return nonseekable_open(inode, filp);
 }
 
Index: linux-2.6/drivers/s390/char/vmcp.c
===================================================================
--- linux-2.6.orig/drivers/s390/char/vmcp.c
+++ linux-2.6/drivers/s390/char/vmcp.c
@@ -16,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <asm/cpcmd.h>
 #include <asm/debug.h>
 #include <asm/uaccess.h>
@@ -39,11 +40,14 @@ static int vmcp_open(struct inode *inode
 	session = kmalloc(sizeof(*session), GFP_KERNEL);
 	if (!session)
 		return -ENOMEM;
+
+	lock_kernel();
 	session->bufsize = PAGE_SIZE;
 	session->response = NULL;
 	session->resp_size = 0;
 	mutex_init(&session->mutex);
 	file->private_data = session;
+	unlock_kernel();
 	return nonseekable_open(inode, file);
 }
 
Index: linux-2.6/drivers/s390/char/vmwatchdog.c
===================================================================
--- linux-2.6.orig/drivers/s390/char/vmwatchdog.c
+++ linux-2.6/drivers/s390/char/vmwatchdog.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/watchdog.h>
+#include <linux/smp_lock.h>
 
 #include <asm/ebcdic.h>
 #include <asm/io.h>
@@ -131,11 +132,15 @@ static int __init vmwdt_probe(void)
 static int vmwdt_open(struct inode *i, struct file *f)
 {
 	int ret;
-	if (test_and_set_bit(0, &vmwdt_is_open))
+	lock_kernel();
+	if (test_and_set_bit(0, &vmwdt_is_open)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 	ret = vmwdt_keepalive();
 	if (ret)
 		clear_bit(0, &vmwdt_is_open);
+	unlock_kernel();
 	return ret ? ret : nonseekable_open(i, f);
 }
 
Index: linux-2.6/drivers/s390/crypto/zcrypt_api.c
===================================================================
--- linux-2.6.orig/drivers/s390/crypto/zcrypt_api.c
+++ linux-2.6/drivers/s390/crypto/zcrypt_api.c
@@ -34,6 +34,7 @@
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
 #include <linux/compat.h>
+#include <linux/smp_lock.h>
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
 #include <linux/hw_random.h>
@@ -300,7 +301,9 @@ static ssize_t zcrypt_write(struct file 
  */
 static int zcrypt_open(struct inode *inode, struct file *filp)
 {
+	lock_kernel();
 	atomic_inc(&zcrypt_open_count);
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/sbus/char/cpwatchdog.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/cpwatchdog.c
+++ linux-2.6/drivers/sbus/char/cpwatchdog.c
@@ -279,6 +279,7 @@ static inline int wd_opt_timeout(void)
 
 static int wd_open(struct inode *inode, struct file *f)
 {
+	lock_kernel();
 	switch(iminor(inode))
 	{
 		case WD0_MINOR:
@@ -291,6 +292,7 @@ static int wd_open(struct inode *inode, 
 			f->private_data = &wd_dev.watchdog[WD2_ID];
 			break;
 		default:
+			unlock_kernel();
 			return(-ENODEV);
 	}
 
@@ -304,11 +306,13 @@ static int wd_open(struct inode *inode, 
 						(void *)wd_dev.regs)) {
 			printk("%s: Cannot register IRQ %d\n", 
 				WD_OBPNAME, wd_dev.irq);
+			unlock_kernel();
 			return(-EBUSY);
 		}
 		wd_dev.initialized = 1;
 	}
 
+	unlock_kernel();
 	return(nonseekable_open(inode, f));
 }
 
Index: linux-2.6/drivers/sbus/char/display7seg.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/display7seg.c
+++ linux-2.6/drivers/sbus/char/display7seg.c
@@ -94,6 +94,7 @@ static int d7s_open(struct inode *inode,
 {
 	if (D7S_MINOR != iminor(inode))
 		return -ENODEV;
+	cycle_kernel_lock();
 	atomic_inc(&d7s_users);
 	return 0;
 }
Index: linux-2.6/drivers/sbus/char/envctrl.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/envctrl.c
+++ linux-2.6/drivers/sbus/char/envctrl.c
@@ -27,6 +27,7 @@
 #include <linux/miscdevice.h>
 #include <linux/kmod.h>
 #include <linux/reboot.h>
+#include <linux/smp_lock.h>
 
 #include <asm/ebus.h>
 #include <asm/uaccess.h>
@@ -694,6 +695,7 @@ envctrl_ioctl(struct file *file, unsigne
 static int
 envctrl_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
 	file->private_data = NULL;
 	return 0;
 }
Index: linux-2.6/drivers/sbus/char/flash.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/flash.c
+++ linux-2.6/drivers/sbus/char/flash.c
@@ -127,9 +127,13 @@ flash_read(struct file * file, char __us
 static int
 flash_open(struct inode *inode, struct file *file)
 {
-	if (test_and_set_bit(0, (void *)&flash.busy) != 0)
+	lock_kernel();
+	if (test_and_set_bit(0, (void *)&flash.busy) != 0) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/sbus/char/jsflash.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/jsflash.c
+++ linux-2.6/drivers/sbus/char/jsflash.c
@@ -27,6 +27,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -417,11 +418,17 @@ static int jsf_mmap(struct file * file, 
 
 static int jsf_open(struct inode * inode, struct file * filp)
 {
-
-	if (jsf0.base == 0) return -ENXIO;
-	if (test_and_set_bit(0, (void *)&jsf0.busy) != 0)
+	lock_kernel();
+	if (jsf0.base == 0) {
+		unlock_kernel();
+		return -ENXIO;
+	}
+	if (test_and_set_bit(0, (void *)&jsf0.busy) != 0) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
+	unlock_kernel();
 	return 0;	/* XXX What security? */
 }
 
Index: linux-2.6/drivers/sbus/char/openprom.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/openprom.c
+++ linux-2.6/drivers/sbus/char/openprom.c
@@ -33,6 +33,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/string.h>
 #include <linux/miscdevice.h>
 #include <linux/init.h>
@@ -689,9 +690,11 @@ static int openprom_open(struct inode * 
 	if (!data)
 		return -ENOMEM;
 
+	lock_kernel();
 	data->current_node = of_find_node_by_path("/");
 	data->lastnode = data->current_node;
 	file->private_data = (void *) data;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/drivers/sbus/char/riowatchdog.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/riowatchdog.c
+++ linux-2.6/drivers/sbus/char/riowatchdog.c
@@ -11,6 +11,7 @@
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
+#include <linux/smp_lock.h>
 
 #include <asm/io.h>
 #include <asm/ebus.h>
@@ -116,6 +117,7 @@ static void riowd_starttimer(void)
 
 static int riowd_open(struct inode *inode, struct file *filp)
 {
+	cycle_kernel_lock();
 	nonseekable_open(inode, filp);
 	return 0;
 }
Index: linux-2.6/drivers/sbus/char/rtc.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/rtc.c
+++ linux-2.6/drivers/sbus/char/rtc.c
@@ -12,6 +12,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/smp_lock.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/miscdevice.h>
@@ -213,6 +214,7 @@ static int rtc_open(struct inode *inode,
 {
 	int ret;
 
+	lock_kernel();
 	spin_lock_irq(&mostek_lock);
 	if (rtc_busy) {
 		ret = -EBUSY;
@@ -221,6 +223,7 @@ static int rtc_open(struct inode *inode,
 		ret = 0;
 	}
 	spin_unlock_irq(&mostek_lock);
+	unlock_kernel();
 
 	return ret;
 }
Index: linux-2.6/drivers/sbus/char/uctrl.c
===================================================================
--- linux-2.6.orig/drivers/sbus/char/uctrl.c
+++ linux-2.6/drivers/sbus/char/uctrl.c
@@ -9,6 +9,7 @@
 #include <linux/delay.h>
 #include <linux/interrupt.h>
 #include <linux/slab.h>
+#include <linux/smp_lock.h>
 #include <linux/ioport.h>
 #include <linux/init.h>
 #include <linux/miscdevice.h>
@@ -211,8 +212,10 @@ uctrl_ioctl(struct inode *inode, struct 
 static int
 uctrl_open(struct inode *inode, struct file *file)
 {
+	lock_kernel();
 	uctrl_get_event_status();
 	uctrl_get_external_status();
+	unlock_kernel();
 	return 0;
 }
 
Index: linux-2.6/drivers/scsi/megaraid/megaraid_mm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/megaraid/megaraid_mm.c
+++ linux-2.6/drivers/scsi/megaraid/megaraid_mm.c
@@ -15,6 +15,7 @@
  * Common management module
  */
 #include <linux/sched.h>
+#include <linux/smp_lock.h>
 #include "megaraid_mm.h"
 
 
@@ -96,6 +97,7 @@ mraid_mm_open(struct inode *inode, struc
 	 */
 	if (!capable(CAP_SYS_ADMIN)) return (-EACCES);
 
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/drivers/scsi/scsi_tgt_if.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_tgt_if.c
+++ linux-2.6/drivers/scsi/scsi_tgt_if.c
@@ -21,6 +21,7 @@
  */
 #include <linux/miscdevice.h>
 #include <linux/file.h>
+#include <linux/smp_lock.h>
 #include <net/tcp.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -321,6 +322,7 @@ static int tgt_open(struct inode *inode,
 {
 	tx_ring.tr_idx = rx_ring.tr_idx = 0;
 
+	cycle_kernel_lock();
 	return 0;
 }
 
Index: linux-2.6/fs/dlm/user.c
===================================================================
--- linux-2.6.orig/fs/dlm/user.c
+++ linux-2.6/fs/dlm/user.c
@@ -15,6 +15,7 @@
 #include <linux/poll.h>
 #include <linux/signal.h>
 #include <linux/spinlock.h>
+#include <linux/smp_lock.h>
 #include <linux/dlm.h>
 #include <linux/dlm_device.h>
 
@@ -618,13 +619,17 @@ static int device_open(struct inode *ino
 	struct dlm_user_proc *proc;
 	struct dlm_ls *ls;
 
+	lock_kernel();
 	ls = dlm_find_lockspace_device(iminor(inode));
-	if (!ls)
+	if (!ls) {
+		unlock_kernel();
 		return -ENOENT;
+	}
 
 	proc = kzalloc(sizeof(struct dlm_user_proc), GFP_KERNEL);
 	if (!proc) {
 		dlm_put_lockspace(ls);
+		unlock_kernel();
 		return -ENOMEM;
 	}
 
@@ -636,6 +641,7 @@ static int device_open(struct inode *ino
 	spin_lock_init(&proc->locks_spin);
 	init_waitqueue_head(&proc->wait);
 	file->private_data = proc;
+	unlock_kernel();
 
 	return 0;
 }
@@ -870,6 +876,7 @@ static unsigned int device_poll(struct f
 
 static int ctl_device_open(struct inode *inode, struct file *file)
 {
+	cycle_kernel_lock();
 	file->private_data = NULL;
 	return 0;
 }
Index: linux-2.6/fs/ocfs2/stack_user.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/stack_user.c
+++ linux-2.6/fs/ocfs2/stack_user.c
@@ -21,6 +21,7 @@
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
 #include <linux/mutex.h>
+#include <linux/smp_lock.h>
 #include <linux/reboot.h>
 #include <asm/uaccess.h>
 
@@ -619,10 +620,12 @@ static int ocfs2_control_open(struct ino
 		return -ENOMEM;
 	p->op_this_node = -1;
 
+	lock_kernel();
 	mutex_lock(&ocfs2_control_lock);
 	file->private_data = p;
 	list_add(&p->op_list, &ocfs2_control_private_list);
 	mutex_unlock(&ocfs2_control_lock);
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/kernel/pm_qos_params.c
===================================================================
--- linux-2.6.orig/kernel/pm_qos_params.c
+++ linux-2.6/kernel/pm_qos_params.c
@@ -358,15 +358,19 @@ static int pm_qos_power_open(struct inod
 	int ret;
 	long pm_qos_class;
 
+	lock_kernel();
 	pm_qos_class = find_pm_qos_object_by_minor(iminor(inode));
 	if (pm_qos_class >= 0) {
 		filp->private_data = (void *)pm_qos_class;
 		sprintf(name, "process_%d", current->pid);
 		ret = pm_qos_add_requirement(pm_qos_class, name,
 					PM_QOS_DEFAULT_VALUE);
-		if (ret >= 0)
+		if (ret >= 0) {
+			unlock_kernel();
 			return 0;
+		}
 	}
+	unlock_kernel();
 
 	return -EPERM;
 }
Index: linux-2.6/kernel/power/user.c
===================================================================
--- linux-2.6.orig/kernel/power/user.c
+++ linux-2.6/kernel/power/user.c
@@ -9,6 +9,7 @@
  *
  */
 
+#include <linux/smp_lock.h>
 #include <linux/suspend.h>
 #include <linux/syscalls.h>
 #include <linux/reboot.h>
@@ -69,15 +70,20 @@ static int snapshot_open(struct inode *i
 	struct snapshot_data *data;
 	int error;
 
-	if (!atomic_add_unless(&snapshot_device_available, -1, 0))
+	lock_kernel();
+	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
+		unlock_kernel();
 		return -EBUSY;
+	}
 
 	if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
 		atomic_inc(&snapshot_device_available);
+		unlock_kernel();
 		return -ENOSYS;
 	}
 	if(create_basic_memory_bitmaps()) {
 		atomic_inc(&snapshot_device_available);
+		unlock_kernel();
 		return -ENOMEM;
 	}
 	nonseekable_open(inode, filp);
@@ -100,11 +106,13 @@ static int snapshot_open(struct inode *i
 	}
 	if (error) {
 		atomic_inc(&snapshot_device_available);
+		unlock_kernel();
 		return error;
 	}
 	data->frozen = 0;
 	data->ready = 0;
 	data->platform_support = 0;
+	unlock_kernel();
 
 	return 0;
 }
Index: linux-2.6/net/irda/irnet/irnet.h
===================================================================
--- linux-2.6.orig/net/irda/irnet/irnet.h
+++ linux-2.6/net/irda/irnet/irnet.h
@@ -241,6 +241,7 @@
 #include <linux/module.h>
 
 #include <linux/kernel.h>
+#include <linux/smp_lock.h>
 #include <linux/skbuff.h>
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
Index: linux-2.6/net/irda/irnet/irnet_ppp.c
===================================================================
--- linux-2.6.orig/net/irda/irnet/irnet_ppp.c
+++ linux-2.6/net/irda/irnet/irnet_ppp.c
@@ -479,6 +479,7 @@ dev_irnet_open(struct inode *	inode,
   ap = kzalloc(sizeof(*ap), GFP_KERNEL);
   DABORT(ap == NULL, -ENOMEM, FS_ERROR, "Can't allocate struct irnet...\n");
 
+  lock_kernel();
   /* initialize the irnet structure */
   ap->file = file;
 
@@ -500,6 +501,7 @@ dev_irnet_open(struct inode *	inode,
     {
       DERROR(FS_ERROR, "Can't setup IrDA link...\n");
       kfree(ap);
+      unlock_kernel();
       return err;
     }
 
@@ -510,6 +512,7 @@ dev_irnet_open(struct inode *	inode,
   file->private_data = ap;
 
   DEXIT(FS_TRACE, " - ap=0x%p\n", ap);
+  unlock_kernel();
   return 0;
 }
 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 3/3, RFC] remove BKL from misc_open()
  2008-05-19 23:07           ` Arnd Bergmann
       [not found]             ` <200805200111.47275.arnd@arndb.de>
  2008-05-19 23:26             ` [PATCH 1/3, RFC] misc char " Arnd Bergmann
@ 2008-05-19 23:34             ` Arnd Bergmann
  2008-05-20 15:13             ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
  3 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-19 23:34 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

Since all misc drivers that have an open() function now take the
BKL in there, there is no longer the need to take it in the common
misc_open() function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
Index: linux-2.6/drivers/char/misc.c
===================================================================
--- linux-2.6.orig/drivers/char/misc.c
+++ linux-2.6/drivers/char/misc.c
@@ -50,7 +50,6 @@
 #include <linux/device.h>
 #include <linux/tty.h>
 #include <linux/kmod.h>
-#include <linux/smp_lock.h>
 
 /*
  * Head entry for the doubly linked miscdevice list
@@ -121,7 +120,6 @@ static int misc_open(struct inode * inod
 	int err = -ENODEV;
 	const struct file_operations *old_fops, *new_fops = NULL;
 	
-	lock_kernel();
 	mutex_lock(&misc_mtx);
 	
 	list_for_each_entry(c, &misc_list, list) {
@@ -159,7 +157,6 @@ static int misc_open(struct inode * inod
 	fops_put(old_fops);
 fail:
 	mutex_unlock(&misc_mtx);
-	unlock_kernel();
 	return err;
 }
 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-19 23:26             ` [PATCH 1/3, RFC] misc char " Arnd Bergmann
@ 2008-05-20  0:07               ` Mike Frysinger
  2008-05-20  0:21                 ` Jonathan Corbet
  2008-05-20  8:46               ` Alan Cox
  2008-05-20 23:01               ` Mike Frysinger
  2 siblings, 1 reply; 78+ messages in thread
From: Mike Frysinger @ 2008-05-20  0:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel

On Mon, May 19, 2008 at 7:26 PM, Arnd Bergmann wrote:
> The Big Kernel Lock has been pushed down from chardev_open
> to misc_open, this change moves it to the individual misc
> driver open functions.
>
> As before, the change was purely mechanical, most drivers
> should actually not need the BKL. In particular, we still
> hold the misc_mtx() while calling the open() function
> The patch should probably be split into one changeset
> per driver.
>
> --- linux-2.6.orig/arch/blackfin/mach-bf561/coreb.c
> +++ linux-2.6/arch/blackfin/mach-bf561/coreb.c
> @@ -32,6 +32,7 @@
>  #include <linux/device.h>
>  #include <linux/ioport.h>
>  #include <linux/module.h>
> +#include <linux/smp_lock.h>
>  #include <linux/uaccess.h>
>  #include <linux/fs.h>
>  #include <asm/dma.h>
> @@ -196,6 +197,7 @@ static loff_t coreb_lseek(struct file *f
>
>  static int coreb_open(struct inode *inode, struct file *file)
>  {
> +       lock_kernel();
>        spin_lock_irq(&coreb_lock);
>
>        if (coreb_status & COREB_IS_OPEN)
> @@ -204,10 +206,12 @@ static int coreb_open(struct inode *inod
>        coreb_status |= COREB_IS_OPEN;
>
>        spin_unlock_irq(&coreb_lock);
> +       unlock_kernel();
>        return 0;
>
>  out_busy:
>        spin_unlock_irq(&coreb_lock);
> +       unlock_kernel();
>        return -EBUSY;
>  }

this open func already has a spinlock protecting it.  doesnt that mean
we dont need the bkl in it ?
-mike

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-20  0:07               ` Mike Frysinger
@ 2008-05-20  0:21                 ` Jonathan Corbet
  2008-05-20  0:46                   ` Mike Frysinger
  0 siblings, 1 reply; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-20  0:21 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	Arnd Bergmann, linux-kernel

Mike Frysinger <vapier.adi@gmail.com> wrote:

> this open func already has a spinlock protecting it.  doesnt that mean
> we dont need the bkl in it ?

The existence of a spinlock is a good sign.  But, until somebody has
looked at the code and verified that said lock is really protecting
everything, it's best to leave the BKL protection (which has always been
there, just at a higher level) in place.

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-20  0:21                 ` Jonathan Corbet
@ 2008-05-20  0:46                   ` Mike Frysinger
  0 siblings, 0 replies; 78+ messages in thread
From: Mike Frysinger @ 2008-05-20  0:46 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, Arnd Bergmann,
	linux-kernel

On Mon, May 19, 2008 at 8:21 PM, Jonathan Corbet wrote:
> Mike Frysinger wrote:
>> this open func already has a spinlock protecting it.  doesnt that mean
>> we dont need the bkl in it ?
>
> The existence of a spinlock is a good sign.  But, until somebody has
> looked at the code and verified that said lock is really protecting
> everything, it's best to leave the BKL protection (which has always been
> there, just at a higher level) in place.

if the spinlock doesnt do what it's advertising (preventing mutual
access), then the BKL is needed.  if there's some UP behavior i'm not
aware of, then the BKL is needed.  otherwise, the BKL is not needed in
this driver.

i should prob rewrite this driver anyways ... the open code could
easily be replaced with some atomic funcs.
-mike

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-19 23:14               ` [PATCH 2/3, RFC] watchdog " Arnd Bergmann
@ 2008-05-20  6:20                 ` Christoph Hellwig
  2008-05-20  8:30                   ` Arnd Bergmann
  2008-05-20  9:08                   ` Alan Cox
  2008-05-20  8:42                 ` Alan Cox
  1 sibling, 2 replies; 78+ messages in thread
From: Christoph Hellwig @ 2008-05-20  6:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel, Wim Van Sebroeck

On Tue, May 20, 2008 at 01:14:23AM +0200, Arnd Bergmann wrote:
> The Big Kernel Lock has been pushed down from chardev_open
> to misc_open, this change moves it to the individual watchdog
> driver open functions.
> 
> As before, the change was purely mechanical, most drivers
> should actually not need the BKL.

Actually I'd prefer to fix this for real.  This single open stuff aswell
as same set of ioctls are duplicated all over the watchdog drivers.  We'd
be much better off introducing a simple watchdog layer that handles this
plus proper locking and convert drivers over to it gradually.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20  6:20                 ` Christoph Hellwig
@ 2008-05-20  8:30                   ` Arnd Bergmann
  2008-05-20 15:47                     ` Wim Van Sebroeck
  2008-05-20  9:08                   ` Alan Cox
  1 sibling, 1 reply; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-20  8:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel, Wim Van Sebroeck

On Tuesday 20 May 2008, Christoph Hellwig wrote:
> On Tue, May 20, 2008 at 01:14:23AM +0200, Arnd Bergmann wrote:
> > The Big Kernel Lock has been pushed down from chardev_open
> > to misc_open, this change moves it to the individual watchdog
> > driver open functions.
> > 
> > As before, the change was purely mechanical, most drivers
> > should actually not need the BKL.
> 
> Actually I'd prefer to fix this for real.  This single open stuff aswell
> as same set of ioctls are duplicated all over the watchdog drivers.  We'd
> be much better off introducing a simple watchdog layer that handles this
> plus proper locking and convert drivers over to it gradually.

I fully agree, I thought the same thing when I did the patches. I remember
that Wim had a git tree doing this, which is still active at
http://git.kernel.org/?p=linux/kernel/git/wim/linux-2.6-watchdog-experimental.git;a=commitdiff;h=732c54027e6c866f98857c4a6d1c6c466459dcd5

Unfortunately, it hasn't seen much activitity over the last two years, and
the number of watchdog drivers seems to have exploded: I count 67 of them,
including some outside of drivers/watchdog.

Wim, was there anything preventing you from integrating the generic
watchdog layer back then?

	Arnd <><

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-19 23:14               ` [PATCH 2/3, RFC] watchdog " Arnd Bergmann
  2008-05-20  6:20                 ` Christoph Hellwig
@ 2008-05-20  8:42                 ` Alan Cox
  1 sibling, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-20  8:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

On Tue, 20 May 2008 01:14:23 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> The Big Kernel Lock has been pushed down from chardev_open
> to misc_open, this change moves it to the individual watchdog
> driver open functions.

NAK. I've posted set of bigger patches which actually fix up all the
watchdog locking code - and simply adding lock/unlock kernel isn't enough
because of all the bugs in the code (it won't make it worse but it won't
fix it either).

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-19 23:26             ` [PATCH 1/3, RFC] misc char " Arnd Bergmann
  2008-05-20  0:07               ` Mike Frysinger
@ 2008-05-20  8:46               ` Alan Cox
  2008-05-20 23:01               ` Mike Frysinger
  2 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-20  8:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel

On Tue, 20 May 2008 01:26:28 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> The Big Kernel Lock has been pushed down from chardev_open
> to misc_open, this change moves it to the individual misc
> driver open functions.
> 
> As before, the change was purely mechanical, most drivers
> should actually not need the BKL. In particular, we still
> hold the misc_mtx() while calling the open() function
> The patch should probably be split into one changeset
> per driver.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Alan Cox <alan@redhat.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20  6:20                 ` Christoph Hellwig
  2008-05-20  8:30                   ` Arnd Bergmann
@ 2008-05-20  9:08                   ` Alan Cox
  1 sibling, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-20  9:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, Jonathan Corbet, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alexander Viro,
	linux-kernel, Wim Van Sebroeck

On Tue, 20 May 2008 02:20:56 -0400
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, May 20, 2008 at 01:14:23AM +0200, Arnd Bergmann wrote:
> > The Big Kernel Lock has been pushed down from chardev_open
> > to misc_open, this change moves it to the individual watchdog
> > driver open functions.
> > 
> > As before, the change was purely mechanical, most drivers
> > should actually not need the BKL.
> 
> Actually I'd prefer to fix this for real.  This single open stuff aswell
> as same set of ioctls are duplicated all over the watchdog drivers.  We'd
> be much better off introducing a simple watchdog layer that handles this
> plus proper locking and convert drivers over to it gradually.

In progress in two ways

- Wim was (is ?) working on a proper device layer

- I've cleaned up all the drivers and am now testing a watchdog
driver supporting library which is taking about 50% of the code out of
each driver I convert.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-19 23:07           ` Arnd Bergmann
                               ` (2 preceding siblings ...)
  2008-05-19 23:34             ` [PATCH 3/3, RFC] remove BKL from misc_open() Arnd Bergmann
@ 2008-05-20 15:13             ` Jonathan Corbet
  2008-05-20 17:21               ` Arnd Bergmann
  3 siblings, 1 reply; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-20 15:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

Arnd Bergmann <arnd@arndb.de> wrote:

> I've given it a try for all the misc drivers that have an open() function.
> The vast majority of them are actually watchdog drivers, all of which
> register as a misc device by themselves. 

OK, it looks like the "misc" misc drivers patch can go into the
bkl-removal tree, while the watchdog patches should not.  What that
means, I guess, is that the final misc_open() patch cannot go in at this
point; Alan's watchdog stuff needs to find its way in first.  Make
sense? 

> You seem to already have a script to turn per-file changes into a
> patch each, so I'm sending you two patches: one for all the watchdog
> drivers (maybe Wim can take care of that as well) and one for all the
> other misc drivers (this one needs to be split).

Alas, I have no such script.  I just committed each change as I made it
- each one required individual attention anyway.  The misc changes look
pretty straightforward, so I could probably hack up such a thing pretty
quickly if you don't have a tree with broken out patches.

Thanks,

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20  8:30                   ` Arnd Bergmann
@ 2008-05-20 15:47                     ` Wim Van Sebroeck
  2008-05-20 18:31                       ` Alan Cox
  0 siblings, 1 reply; 78+ messages in thread
From: Wim Van Sebroeck @ 2008-05-20 15:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Jonathan Corbet, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alan Cox,
	Alexander Viro, linux-kernel

Hi Arnd,

> > Actually I'd prefer to fix this for real.  This single open stuff aswell
> > as same set of ioctls are duplicated all over the watchdog drivers.  We'd
> > be much better off introducing a simple watchdog layer that handles this
> > plus proper locking and convert drivers over to it gradually.
> 
> I fully agree, I thought the same thing when I did the patches. I remember
> that Wim had a git tree doing this, which is still active at
> http://git.kernel.org/?p=linux/kernel/git/wim/linux-2.6-watchdog-experimental.git;a=commitdiff;h=732c54027e6c866f98857c4a6d1c6c466459dcd5
> 
> Unfortunately, it hasn't seen much activitity over the last two years, and
> the number of watchdog drivers seems to have exploded: I count 67 of them,
> including some outside of drivers/watchdog.
> 
> Wim, was there anything preventing you from integrating the generic
> watchdog layer back then?

check the linux-2.6-watchdog-mm tree. You'll see the patches sitting there
as the uniform watchdog driver. They are thus also in the -mm tree.
I'll need to change it also to an unlocked_ioctl (and add documentation!)
but I'll attach the core code below (This does not seem to be the latest code,
because I know there was a request to change the alloc_watchdogdev code so that
it could also allocate a private data-area/space).
But it gives you an idea where I was going to. Second step would then be to add
a sysfs interface so that we can start handling mutiple devices.

Greetings,
Wim.

>From a223c170e5e7e63e5dd55f56837318db6ad0807f Mon Sep 17 00:00:00 2001
From: Wim Van Sebroeck <wim@iguana.be>
Date: Sun, 19 Aug 2007 19:44:24 +0000
Subject: [PATCH] [WATCHDOG] Uniform Watchdog Device Driver

The Uniform Watchdog Device Driver is a frame-work
that contains the common code for all watchdog-driver's.
It also introduces a watchdog device structure and the
operations that go with it.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
---
 drivers/watchdog/Kconfig              |    2 +
 drivers/watchdog/Makefile             |    2 +
 drivers/watchdog/core/Kconfig         |   34 +++
 drivers/watchdog/core/Makefile        |   11 +
 drivers/watchdog/core/watchdog_core.c |  187 +++++++++++++
 drivers/watchdog/core/watchdog_dev.c  |  463 +++++++++++++++++++++++++++++++++
 include/linux/watchdog.h              |   49 ++++
 7 files changed, 748 insertions(+), 0 deletions(-)
 create mode 100644 drivers/watchdog/core/Kconfig
 create mode 100644 drivers/watchdog/core/Makefile
 create mode 100644 drivers/watchdog/core/watchdog_core.c
 create mode 100644 drivers/watchdog/core/watchdog_dev.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 37bddc1..a458e87 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -37,6 +37,8 @@ config WATCHDOG_NOWAYOUT
 	  get killed. If you say Y here, the watchdog cannot be stopped once
 	  it has been started.
 
+source "drivers/watchdog/core/Kconfig"
+
 #
 # General Watchdog drivers
 #
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 389f8b1..b61c103 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -2,6 +2,8 @@
 # Makefile for the WatchDog device drivers.
 #
 
+obj-$(CONFIG_WATCHDOG)	+= core/
+
 # Only one watchdog can succeed. We probe the ISA/PCI/USB based
 # watchdog-cards first, then the architecture specific watchdog
 # drivers and then the architecture independant "softdog" driver.
diff --git a/drivers/watchdog/core/Kconfig b/drivers/watchdog/core/Kconfig
new file mode 100644
index 0000000..a9a23a9
--- /dev/null
+++ b/drivers/watchdog/core/Kconfig
@@ -0,0 +1,34 @@
+#
+# Watchdog device driver core
+#
+
+if WATCHDOG
+
+config WATCHDOG_CORE
+	tristate "Uniform Watchdog Device Driver"
+	depends on EXPERIMENTAL
+	default m
+	---help---
+	  Say Y here if you want to use the new uniform watchdog device
+	  driver. This driver provides a framework for all watchdog
+	  device drivers and gives them the /dev/watchdog interface (and
+	  later also the sysfs interface).
+
+	  At this moment we have no watchdog device drivers using this new
+	  framework.
+
+	  To compile this driver as a module, choose M here: the module will
+	  be called watchdog_core.
+
+config WATCHDOG_DEBUG_CORE
+	bool "Uniform Watchdog Device Driver debugging output"
+	depends on WATCHDOG_CORE
+	default n
+	---help---
+	  Say Y here if you want the Uniform Watchdog Device Driver to
+	  produce debugging information. Select this if you are having a
+	  problem with the uniform watchdog device driver and want to see
+	  more of what is really happening.
+
+endif # WATCHDOG
+
diff --git a/drivers/watchdog/core/Makefile b/drivers/watchdog/core/Makefile
new file mode 100644
index 0000000..7554d35
--- /dev/null
+++ b/drivers/watchdog/core/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for the Watchdog Device Drivers generic core.
+#
+
+# The Generic Watchdog Driver
+obj-$(CONFIG_WATCHDOG_CORE)		+= watchdog_core.o watchdog_dev.o
+
diff --git a/drivers/watchdog/core/watchdog_core.c b/drivers/watchdog/core/watchdog_core.c
new file mode 100644
index 0000000..133dc11
--- /dev/null
+++ b/drivers/watchdog/core/watchdog_core.c
@@ -0,0 +1,187 @@
+/*
+ *	watchdog.c
+ *
+ *	(c) Copyright 2007 Wim Van Sebroeck <wim@iguana.be>.
+ *
+ *	This code is generic code that can be shared by all the
+ *	watchdog drivers.
+ *
+ *	Based on source code of the following authors:
+ *	  Alan Cox <alan@redhat.com>,
+ *	  Matt Domsch <Matt_Domsch@dell.com>,
+ *	  Rob Radez <rob@osinvestor.com>,
+ *	  Rusty Lynch <rusty@linux.co.intel.com>
+ *	  Satyam Sharma <satyam@infradead.org>
+ *	  Randy Dunlap <randy.dunlap@oracle.com>
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *	Neither Wim Van Sebroeck nor Iguana vzw. admit liability nor
+ *	provide warranty for any of this software. This material is
+ *	provided "AS-IS" and at no charge.
+ */
+
+#include <linux/module.h>	/* For module related things/EXPORT_SYMBOL/... */
+#include <linux/types.h>	/* For standard types */
+#include <linux/errno.h>	/* For the -ENODEV/... values */
+#include <linux/kernel.h>	/* For printk/panic/... */
+#include <linux/mm.h>		/* For memory allocations ... */
+#include <linux/watchdog.h>	/* For watchdog specific items */
+#include <linux/init.h>		/* For __init/__exit/... */
+
+/*
+ *	Version information
+ */
+#define DRV_VERSION	"0.01"
+#define DRV_NAME	"watchdog_core"
+#define PFX DRV_NAME	": "
+
+/*
+ *	External functions/procedures
+ */
+extern int watchdog_dev_register(struct watchdog_device *, struct device *);
+extern int watchdog_dev_unregister(struct watchdog_device *);
+
+/**
+ *	alloc_watchdogdev - allocates a watchdog device
+ *
+ *	Creates a new watchdog device structure.
+ *	Returns the new structure, or NULL if an error occured.
+ */
+struct watchdog_device *alloc_watchdogdev(void)
+{
+	int alloc_size = sizeof(struct watchdog_device);
+	void *p;
+	struct watchdog_device *dev;
+
+	/* allocate memory for our device and initialize it */
+	p = kzalloc(alloc_size, GFP_KERNEL);
+	if (!p) {
+		printk(KERN_ERR PFX "Unable to allocate watchdog device\n");
+		return NULL;
+	}
+	dev = (struct watchdog_device *) p;
+
+	return dev;
+}
+EXPORT_SYMBOL(alloc_watchdogdev);
+
+/**
+ *	free_watchdogdev - free watchdog device
+ *	@dev: watchdog device
+ *
+ *	This function does the last stage of destroying an allocated
+ *	watchdog device.
+ */
+int free_watchdogdev(struct watchdog_device *dev)
+{
+	if (!((dev->watchdog_state == WATCHDOG_UNINITIALIZED) ||
+	      (dev->watchdog_state == WATCHDOG_UNREGISTERED))) {
+		printk(KERN_ERR PFX "Unable to destroy a watchdog device that is still in use\n");
+		return -1;
+	}
+
+	kfree(dev);
+	return 0;
+}
+EXPORT_SYMBOL(free_watchdogdev);
+
+/**
+ *	register_watchdogdevice - register a watchdog device
+ *	@dev: watchdog device
+ *	@parent: parent device for the watchdog class device
+ *
+ *	This function registers a watchdog device in the kernel so
+ *	that it can be accessed from userspace.
+ */
+int register_watchdogdevice(struct watchdog_device *dev, struct device *parent)
+{
+	int ret;
+
+	if (dev == NULL ||
+	    dev->watchdog_ops == NULL)
+		return -ENODATA;
+
+	if (dev->watchdog_ops->start == NULL ||
+	    dev->watchdog_ops->stop == NULL ||
+	    dev->watchdog_ops->keepalive == NULL)
+		return -ENODATA;
+
+	if (!((dev->watchdog_state == WATCHDOG_UNINITIALIZED) ||
+	      (dev->watchdog_state == WATCHDOG_UNREGISTERED))) {
+		printk(KERN_ERR PFX "Unable to register a watchdog device that is allready in use\n");
+		return -1;
+	}
+
+	dev->options |= WDIOF_MAGICCLOSE;
+	if (dev->watchdog_ops->set_heartbeat) {
+		dev->options |= WDIOF_SETTIMEOUT;
+	} else {
+		dev->options &= ~WDIOF_SETTIMEOUT;
+	}
+
+	ret = watchdog_dev_register(dev, parent);
+	if (ret) {
+		printk(KERN_ERR PFX "error registering /dev/watchdog (err=%d)",
+			ret);
+		return ret;
+	}
+
+	dev->watchdog_state = WATCHDOG_REGISTERED;
+	return 0;
+}
+EXPORT_SYMBOL(register_watchdogdevice);
+
+/**
+ *	unregister_watchdogdevice - unregister a watchdog device
+ *	@dev: watchdog device
+ *
+ *	This function unregisters a watchdog device from the kernel.
+ */
+int unregister_watchdogdevice(struct watchdog_device *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -ENODATA;
+
+	if ((dev->watchdog_state == WATCHDOG_UNINITIALIZED) ||
+	    (dev->watchdog_state == WATCHDOG_UNREGISTERED)) {
+		printk(KERN_ERR PFX "Unable to unregister a watchdog device that has not been registered\n");
+		return -ENODEV;
+	}
+
+	ret = watchdog_dev_unregister(dev);
+	if (ret) {
+		printk(KERN_ERR PFX "error unregistering /dev/watchdog (err=%d)",
+			ret);
+		return ret;
+	}
+
+	dev->watchdog_state = WATCHDOG_UNREGISTERED;
+	return 0;
+}
+EXPORT_SYMBOL(unregister_watchdogdevice);
+
+static int __init watchdog_init(void)
+{
+	printk(KERN_INFO "Uniform watchdog device driver v%s loaded\n",
+		DRV_VERSION);
+	return 0;
+}
+
+static void __exit watchdog_exit(void)
+{
+	printk(KERN_INFO "Uniform watchdog device driver unloaded\n");
+}
+
+module_init(watchdog_init);
+module_exit(watchdog_exit);
+
+MODULE_AUTHOR("Wim Van Sebroeck <wim@iguana.be>");
+MODULE_DESCRIPTION("Uniform Watchdog Device Driver");
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_SUPPORTED_DEVICE("watchdog");
+
diff --git a/drivers/watchdog/core/watchdog_dev.c b/drivers/watchdog/core/watchdog_dev.c
new file mode 100644
index 0000000..37520cd
--- /dev/null
+++ b/drivers/watchdog/core/watchdog_dev.c
@@ -0,0 +1,463 @@
+/*
+ *	watchdog_dev.c
+ *
+ *	(c) Copyright 2007 Wim Van Sebroeck <wim@iguana.be>.
+ *
+ *	This source code is part of the generic code that can be used
+ *	by all the watchdog drivers.
+ *
+ *	This part of the generic code takes care of the following
+ *	misc device: /dev/watchdog.
+ *
+ *	Based on source code of the following authors:
+ *	  Alan Cox <alan@redhat.com>,
+ *	  Matt Domsch <Matt_Domsch@dell.com>,
+ *	  Rob Radez <rob@osinvestor.com>,
+ *	  Rusty Lynch <rusty@linux.co.intel.com>
+ *	  Satyam Sharma <satyam@infradead.org>
+ *	  Randy Dunlap <randy.dunlap@oracle.com>
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *	Neither Wim Van Sebroeck nor Iguana vzw. admit liability nor
+ *	provide warranty for any of this software. This material is
+ *	provided "AS-IS" and at no charge.
+ */
+
+#include <linux/module.h>	/* For module related things/EXPORT_SYMBOL/... */
+#include <linux/types.h>	/* For standard types (like size_t) */
+#include <linux/errno.h>	/* For the -ENODEV/... values */
+#include <linux/kernel.h>	/* For printk/panic/... */
+#include <linux/fs.h>		/* For file operations */
+#include <linux/watchdog.h>	/* For watchdog specific items */
+#include <linux/miscdevice.h>	/* For handling misc devices */
+#include <linux/mutex.h>	/* For mutex locking */
+#include <linux/init.h>		/* For __init/__exit/... */
+#include <linux/uaccess.h>	/* For copy_to_user/put_user/... */
+
+#ifdef CONFIG_WATCHDOG_DEBUG_CORE
+#define trace(format, args...) \
+	printk(KERN_INFO "%s(" format ")\n", __FUNCTION__ , ## args)
+#define dbg(format, arg...) \
+	printk(KERN_DEBUG "%s: " format "\n", __FUNCTION__, ## arg)
+#else
+#define trace(format, arg...) do { } while (0)
+#define dbg(format, arg...) do { } while (0)
+#endif
+
+/*
+ *	Version information
+ */
+#define DRV_VERSION	"0.01"
+#define DRV_NAME	"watchdog_dev"
+#define PFX DRV_NAME	": "
+
+/*
+ *	Locally used variables
+ */
+
+static struct watchdog_device *watchdogdev;	/* the watchdog device behind /dev/watchdog */
+static unsigned long watchdog_dev_open;		/* wether or not /dev/watchdog has been opened */
+static char received_magic_char;		/* wether or not we received the magic char */
+static DEFINE_MUTEX(watchdog_register_mtx);	/* prevent races between register & unregister */
+
+/*
+ *	/dev/watchdog operations
+ */
+
+/*
+ *	watchdog_write: writes to the watchdog.
+ *	@file: file from VFS
+ *	@data: user address of data
+ *	@len: length of data
+ *	@ppos: pointer to the file offset
+ *
+ *	A write to a watchdog device is defined as a keepalive signal.
+ *	Writing the magic 'V' sequence allows the next close to turn
+ *	off the watchdog (if 'nowayout' is not set).
+ */
+
+static ssize_t watchdog_write(struct file *file, const char __user *data,
+				size_t len, loff_t *ppos)
+{
+	trace("%p, %p, %zu, %p", file, data, len, ppos);
+
+	if (!watchdogdev ||
+	    !watchdogdev->watchdog_ops ||
+	    !watchdogdev->watchdog_ops->keepalive)
+		return -ENODEV;
+
+	/* See if we got the magic character 'V' and reload the timer */
+	if (len) {
+		if (!watchdogdev->nowayout) {
+			size_t i;
+
+			/* note: just in case someone wrote the magic character
+			 * five months ago... */
+			received_magic_char = 0;
+
+			/* scan to see wether or not we got the magic character */
+			for (i = 0; i != len; i++) {
+				char c;
+				if (get_user(c, data + i))
+					return -EFAULT;
+				if (c == 'V') {
+					received_magic_char = 42;
+					dbg("received the magic character\n");
+				}
+			}
+		}
+
+		/* someone wrote to us, so we sent the watchdog a keepalive signal if
+		 * the watchdog is active */
+		if (watchdogdev->watchdog_state == WATCHDOG_STARTED)
+			watchdogdev->watchdog_ops->keepalive(watchdogdev);
+	}
+	return len;
+}
+
+/*
+ *	watchdog_ioctl: handle the different ioctl's for the watchdog device.
+ *	@inode: inode of the device
+ *	@file: file handle to the device
+ *	@cmd: watchdog command
+ *	@arg: argument pointer
+ *
+ *	The watchdog API defines a common set of functions for all watchdogs
+ *	according to their available features.
+ */
+
+static int watchdog_ioctl(struct inode *inode, struct file *file,
+				unsigned int cmd, unsigned long arg)
+{
+	int status;
+	int err;
+	int new_options;
+	int new_heartbeat;
+	int time_left;
+	void __user *argp = (void __user *)arg;
+	int __user *p = argp;
+	static struct watchdog_info ident = {
+		.options =		0,
+		.firmware_version =	0,
+		.identity =		"Watchdog Device",
+	};
+
+	trace("%p, %p, %u, %li", inode, file, cmd, arg);
+
+	if (!watchdogdev || !watchdogdev->watchdog_ops)
+		return -ENODEV;
+
+	switch (cmd) {
+	case WDIOC_GETSUPPORT:
+	{
+		ident.options = watchdogdev->options;
+		ident.firmware_version = watchdogdev->firmware;
+
+		strncpy(ident.identity, watchdogdev->name, 31);
+		ident.identity[32] = 0;
+
+		return copy_to_user(argp, &ident,
+			sizeof(ident)) ? -EFAULT : 0;
+	}
+
+	case WDIOC_GETSTATUS:
+	{
+		status = 0;
+
+		if (watchdogdev->watchdog_ops->get_status &&
+		    watchdogdev->watchdog_ops->get_status(watchdogdev, &status))
+			return -EFAULT;
+
+		return put_user(status, p);
+	}
+
+	case WDIOC_GETBOOTSTATUS:
+		return put_user(watchdogdev->bootstatus, p);
+
+	case WDIOC_KEEPALIVE:
+	{
+		if (!watchdogdev->watchdog_ops->keepalive)
+			return -EFAULT;
+
+		/* We only sent a keepalive when the watchdog is active */
+		if (watchdogdev->watchdog_state == WATCHDOG_STARTED)
+			watchdogdev->watchdog_ops->keepalive(watchdogdev);
+
+		return 0;
+	}
+
+	case WDIOC_SETOPTIONS:
+	{
+		if (get_user(new_options, p))
+			return -EFAULT;
+
+		if (!watchdogdev->watchdog_ops->start ||
+		    !watchdogdev->watchdog_ops->stop)
+			return -EFAULT;
+
+		if (new_options & WDIOS_DISABLECARD) {
+			/* only try to stop the watchdog if it's allready running */
+			if (watchdogdev->watchdog_state == WATCHDOG_STARTED) {
+				err =  watchdogdev->watchdog_ops->stop(watchdogdev);
+				if (err == 0) {
+					watchdogdev->watchdog_state = WATCHDOG_STOPPED;
+				} else {
+					printk(KERN_CRIT PFX "WDIOS_DISABLECARD not successfull! (err=%d)",
+						err);
+					return -EFAULT;
+				}
+			}
+		}
+
+		if (new_options & WDIOS_ENABLECARD) {
+			/* if the watchdog is not allready running, try to start it */
+			if (watchdogdev->watchdog_state != WATCHDOG_STARTED) {
+				err = watchdogdev->watchdog_ops->start(watchdogdev);
+				if (err == 0) {
+					watchdogdev->watchdog_state = WATCHDOG_STARTED;
+				} else {
+					printk(KERN_CRIT PFX "WDIOS_ENABLECARD not successfull! (err=%d)",
+						err);
+					return -EFAULT;
+				}
+			}
+		}
+
+		return 0;
+	}
+
+	case WDIOC_SETTIMEOUT:
+	{
+		if (!watchdogdev->watchdog_ops->set_heartbeat)
+			return -ENOTTY;
+
+		if (get_user(new_heartbeat, p))
+			return -EFAULT;
+
+		if (watchdogdev->watchdog_ops->set_heartbeat(watchdogdev, new_heartbeat))
+			return -EFAULT;
+
+		/* If the watchdog is active then we sent a keepalive to make sure
+		 * that the watchdog keep's running (and if possible takes the new
+		 * heartbeat) */
+		if (watchdogdev->watchdog_ops->keepalive &&
+		    (watchdogdev->watchdog_state == WATCHDOG_STARTED))
+			watchdogdev->watchdog_ops->keepalive(watchdogdev);
+		/* Fall */
+	}
+
+	case WDIOC_GETTIMEOUT:
+		return put_user(watchdogdev->heartbeat, p);
+
+	case WDIOC_GETTIMELEFT:
+	{
+		if (!watchdogdev->watchdog_ops->get_timeleft)
+			return -ENOTTY;
+
+		if (watchdogdev->watchdog_ops->get_timeleft(watchdogdev, &time_left))
+			return -EFAULT;
+
+		return put_user(time_left, p);
+	}
+
+	default:
+		return -ENOTTY;
+	}
+}
+
+/*
+ *	watchdog_open: open the /dev/watchdog device.
+ *	@inode: inode of device
+ *	@file: file handle to device
+ *
+ *	When the /dev/watchdog device get's opened, we start the watchdog
+ *	and feed it with his first keepalive signal. Watch out: the
+ *	/dev/watchdog device is single open, so make sure it can only be
+ *	opened once.
+ */
+
+static int watchdog_open(struct inode *inode, struct file *file)
+{
+	trace("%p, %p", inode, file);
+
+	/* only open if we have a valid watchdog device */
+	if (!watchdogdev ||
+	    !watchdogdev->watchdog_ops ||
+	    !watchdogdev->watchdog_ops->start ||
+	    !watchdogdev->watchdog_ops->stop ||
+	    !watchdogdev->watchdog_ops->keepalive)
+		return -EBUSY;
+
+	/* the watchdog is single open! */
+	if (test_and_set_bit(0, &watchdog_dev_open))
+		return -EBUSY;
+
+	/* if the watchdog is not allready running, try to start it */
+	if (watchdogdev->watchdog_state != WATCHDOG_STARTED) {
+		if (watchdogdev->watchdog_ops->start(watchdogdev) == 0)
+			watchdogdev->watchdog_state = WATCHDOG_STARTED;
+	}
+
+	/* if the watchdog started, then feed the watchdog it's first keepalive signal */
+	if (watchdogdev->watchdog_state == WATCHDOG_STARTED)
+		watchdogdev->watchdog_ops->keepalive(watchdogdev);
+
+	return nonseekable_open(inode, file);
+}
+
+/*
+ *      watchdog_release: release the /dev/watchdog device.
+ *      @inode: inode of device
+ *      @file: file handle to device
+ *
+ *	This is the code for when /dev/watchdog get's closed. We will only
+ *	stop the watchdog when we have received the magic char, else the
+ *	watchdog will keep running.
+ */
+
+static int watchdog_release(struct inode *inode, struct file *file)
+{
+	int err;
+
+	trace("%p, %p", inode, file);
+	dbg("received_magic_char=%d", received_magic_char);
+
+	if (watchdogdev && (watchdogdev->watchdog_state == WATCHDOG_STARTED)) {
+		/* Only stop a watchdog if it actually started */
+		if (received_magic_char == 42) {
+			/* we received the magic char -> we can stop the watchdog */
+			if (watchdogdev->watchdog_ops && watchdogdev->watchdog_ops->stop) {
+				err =  watchdogdev->watchdog_ops->stop(watchdogdev);
+				if (err == 0) {
+					watchdogdev->watchdog_state = WATCHDOG_STOPPED;
+				} else {
+					printk(KERN_CRIT PFX "Watchdog didn't stop successfull! (err=%d)",
+						err);
+				}
+			} else {
+				printk(KERN_CRIT PFX "Unable to stop watchdog!");
+			}
+		} else {
+			/* If we didn't receive the magic char, then we will close
+			 * /dev/watchdog but the watchdog keeps running... */
+			printk(KERN_CRIT PFX "Unexpected close, not stopping watchdog!");
+			if (watchdogdev->watchdog_ops && watchdogdev->watchdog_ops->keepalive) {
+				watchdogdev->watchdog_ops->keepalive(watchdogdev);
+			}
+		}
+	}
+
+	received_magic_char = 0;
+
+	/* make sure that /dev/watchdog can be re-opened */
+	clear_bit(0, &watchdog_dev_open);
+
+	return 0;
+}
+
+/*
+ *	/dev/watchdog kernel interfaces
+ */
+
+static struct file_operations watchdog_fops = {
+	.owner =	THIS_MODULE,
+	.llseek =	no_llseek,
+	.write =	watchdog_write,
+	.ioctl =	watchdog_ioctl,
+	.open =		watchdog_open,
+	.release =	watchdog_release,
+};
+
+static struct miscdevice watchdog_miscdev = {
+	.minor =	WATCHDOG_MINOR,
+	.name =		"watchdog",
+	.fops =		&watchdog_fops,
+};
+
+/*
+ *	/dev/watchdog register and unregister functions
+ */
+
+/*
+ *	watchdog_dev_register:
+ *
+ *	Register a watchdog device as /dev/watchdog. /dev/watchdog
+ *	is actually a miscdevice and thus we set it up like that.
+ */
+
+int watchdog_dev_register(struct watchdog_device *wdd, struct device *parent)
+{
+	int err = -EBUSY;
+
+	trace("%p %p", wdd, parent);
+
+	mutex_lock(&watchdog_register_mtx);
+
+	if (watchdogdev) {
+		printk(KERN_ERR PFX "another watchdog device is allready registered as /dev/watchdog\n");
+		goto out;
+	}
+
+	watchdog_miscdev.parent = parent;
+
+	dbg("Register a new /dev/watchdog device\n");
+	err = misc_register(&watchdog_miscdev);
+	if (err != 0) {
+		printk(KERN_ERR PFX "cannot register miscdev on minor=%d (err=%d)\n",
+			watchdog_miscdev.minor, err);
+		goto out;
+	}
+
+	watchdogdev = wdd;
+
+out:
+	mutex_unlock(&watchdog_register_mtx);
+	return err;
+}
+EXPORT_SYMBOL(watchdog_dev_register);
+
+/*
+ *	watchdog_dev_unregister:
+ *
+ *	Deregister the /dev/watchdog device.
+ */
+
+int watchdog_dev_unregister(struct watchdog_device *wdd)
+{
+	trace("%p", wdd);
+
+	mutex_lock(&watchdog_register_mtx);
+
+	if (!watchdogdev) {
+		printk(KERN_ERR PFX "there is no watchdog registered\n");
+		mutex_unlock(&watchdog_register_mtx);
+		return -1;
+	}
+
+	if (!wdd) {
+		printk(KERN_ERR PFX "cannot unregister non-existing watchdog-driver\n");
+		mutex_unlock(&watchdog_register_mtx);
+		return -2;
+	}
+
+	if (watchdogdev != wdd) {
+		printk(KERN_ERR PFX "another watchdog device is running\n");
+		mutex_unlock(&watchdog_register_mtx);
+		return -3;
+	}
+
+	dbg("Unregister /dev/watchdog device\n");
+	misc_deregister(&watchdog_miscdev);
+	watchdogdev = NULL;
+	mutex_unlock(&watchdog_register_mtx);
+	return 0;
+}
+EXPORT_SYMBOL(watchdog_dev_unregister);
+
+MODULE_AUTHOR("Wim Van Sebroeck <wim@iguana.be>");
+MODULE_DESCRIPTION("Generic /dev/watchdog Code");
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE("GPL");
+
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 011bcfe..4f57bf4 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -53,12 +53,61 @@ struct watchdog_info {
 
 #ifdef __KERNEL__
 
+#include <linux/device.h>
+
 #ifdef CONFIG_WATCHDOG_NOWAYOUT
 #define WATCHDOG_NOWAYOUT	1
 #else
 #define WATCHDOG_NOWAYOUT	0
 #endif
 
+struct watchdog_ops;
+struct watchdog_device;
+
+struct watchdog_ops {
+	/* mandatory routines */
+		/* operation = start watchdog */
+		int	(*start)(struct watchdog_device *);
+		/* operation = stop watchdog */
+		int	(*stop)(struct watchdog_device *);
+		/* operation = send keepalive ping */
+		int	(*keepalive)(struct watchdog_device *);
+	/* optional routines */
+		/* operation = set watchdog's heartbeat */
+		int	(*set_heartbeat)(struct watchdog_device *, int);
+		/* operation = get the watchdog's status */
+		int	(*get_status)(struct watchdog_device *, int *);
+		/* operation = get the time left before reboot */
+		int	(*get_timeleft)(struct watchdog_device *, int *);
+};
+
+struct watchdog_device {
+	unsigned char name[32];			/* The watchdog's 'identity' */
+	unsigned long options;			/* The supported capabilities/options */
+	unsigned long firmware;			/* The Watchdog's Firmware version */
+	int nowayout;				/* The nowayout setting for this watchdog */
+	int heartbeat;				/* The watchdog's heartbeat */
+	int bootstatus;				/* The watchdog's bootstatus */
+	struct watchdog_ops *watchdog_ops;	/* link to watchdog_ops */
+
+	/* watchdog status (register/unregister) state machine */
+	enum { WATCHDOG_UNINITIALIZED = 0,
+	       WATCHDOG_REGISTERED,		/* completed register_watchdogdevice */
+	       WATCHDOG_STARTED,		/* watchdog device started */
+	       WATCHDOG_STOPPED,		/* watchdog device stopped */
+	       WATCHDOG_UNREGISTERED,		/* completed unregister_watchdogdevice */
+	} watchdog_state;
+
+	/* From here on everything is device dependent */
+	void	*private;
+};
+
+/* drivers/watchdog/watchdog_core.c */
+extern struct watchdog_device *alloc_watchdogdev(void);
+extern int register_watchdogdevice(struct watchdog_device *, struct device *);
+extern int unregister_watchdogdevice(struct watchdog_device *);
+extern int free_watchdogdev(struct watchdog_device *);
+
 #endif	/* __KERNEL__ */
 
 #endif  /* ifndef _LINUX_WATCHDOG_H */
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-20 15:13             ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
@ 2008-05-20 17:21               ` Arnd Bergmann
  2008-05-20 18:51                 ` Alan Cox
  0 siblings, 1 reply; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-20 17:21 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alan Cox, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

On Tuesday 20 May 2008, Jonathan Corbet wrote:
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > I've given it a try for all the misc drivers that have an open() function.
> > The vast majority of them are actually watchdog drivers, all of which
> > register as a misc device by themselves. 
> 
> OK, it looks like the "misc" misc drivers patch can go into the
> bkl-removal tree, while the watchdog patches should not.  What that
> means, I guess, is that the final misc_open() patch cannot go in at this
> point; Alan's watchdog stuff needs to find its way in first.  Make
> sense? 

Right, unless Alan or Wim are confident enough that removing the
BKL won't break the drivers (more than they are today).
Almost all of the open functions go along the lines of

int open(struct file *f, struct inode *i)
{
	if (wd_is_open)
		return -EBUSY;
	wd_is_open = 1;
	
	start_wd();

	return nonseekable_open(f, i);
}

nonseekable_open doesn't need the BKL by itself, and the wd_is_open
variable is protected by the misc_mtx mutex.
I can't see any scenario in which start_wd() would need the BKL, or
where a watchdog driver needs cycle_kernel_lock(), but I was't confident
enough about that assessment, because I'm not really familiar with
the drivers.

> > You seem to already have a script to turn per-file changes into a
> > patch each, so I'm sending you two patches: one for all the watchdog
> > drivers (maybe Wim can take care of that as well) and one for all the
> > other misc drivers (this one needs to be split).
> 
> Alas, I have no such script.  I just committed each change as I made it
> - each one required individual attention anyway.  The misc changes look
> pretty straightforward, so I could probably hack up such a thing pretty
> quickly if you don't have a tree with broken out patches.

I've done a semi-automated split and applied the patches on top of your
tree. You can pull these from

git://git.kernel.org/pub/scm/linux/kernel/git/arnd/cell-2.6 bkl-removal

(I guess I should do a separate tree for it, will do that if more stuff
comes up.)

	Arnd <><

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20 15:47                     ` Wim Van Sebroeck
@ 2008-05-20 18:31                       ` Alan Cox
  2008-05-20 21:00                         ` Arnd Bergmann
  0 siblings, 1 reply; 78+ messages in thread
From: Alan Cox @ 2008-05-20 18:31 UTC (permalink / raw)
  To: Wim Van Sebroeck
  Cc: Arnd Bergmann, Christoph Hellwig, Jonathan Corbet,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

> +int register_watchdogdevice(struct watchdog_device *dev, struct device *parent)
> +{
> +	int ret;
> +
> +	if (dev == NULL ||
> +	    dev->watchdog_ops == NULL)
> +		return -ENODATA;
> +
> +	if (dev->watchdog_ops->start == NULL ||
> +	    dev->watchdog_ops->stop == NULL ||
> +	    dev->watchdog_ops->keepalive == NULL)

Some watchdogs have no stop method so you need to allow for that (it
means your device is always going to close 'nowayout' style).

> +	if (dev->watchdog_ops->set_heartbeat) {
> +		dev->options |= WDIOF_SETTIMEOUT;
> +	} else {
> +		dev->options &= ~WDIOF_SETTIMEOUT;
> +	}

Should have been done by the driver anyway - so this is a WARN check
really for debug.

> +static int watchdog_ioctl(struct inode *inode, struct file *file,
> +				unsigned int cmd, unsigned long arg)
> +{

This ioctl code is racy


> +		if (new_options & WDIOS_DISABLECARD) {
> +			/* only try to stop the watchdog if it's allready running */
> +			if (watchdogdev->watchdog_state == WATCHDOG_STARTED) {
> +				err =  watchdogdev->watchdog_ops->stop(watchdogdev);
> +				if (err == 0) {
> +					watchdogdev->watchdog_state = WATCHDOG_STOPPED;
> +				} else {
> +					printk(KERN_CRIT PFX "WDIOS_DISABLECARD not successfull! (err=%d)",
> +						err);
> +					return -EFAULT;
> +				}
> +			}
> +		}
> +

Consider two of these happening at once

> +	/* only open if we have a valid watchdog device */
> +	if (!watchdogdev ||
> +	    !watchdogdev->watchdog_ops ||
> +	    !watchdogdev->watchdog_ops->start ||
> +	    !watchdogdev->watchdog_ops->stop ||
> +	    !watchdogdev->watchdog_ops->keepalive)
> +		return -EBUSY;

Races register - plus you don't need to check these cases

Also there is a problem with module locking as you don't lock down the
driver module as your fops are owned by the watchdog_dev.

I'd actually been thinking along the same lines but not paid attention
to your tree (was a bit busy without poking watchdog). 

The patch below is actually quite similar in parts to yours except 

- It handles the temperature device
- You pass it the ident and other structs
- It handles stop not being supported (and stop/start fails)
- It does all the reboot, nowayout and module locking and housekeeping
- I've used static objects for most things so they can be initialised
  by the compiler making more of the code "table filling".

The API is actually pretty similar so I think we agree on most parts but
it might be worth working out which are the best bits of each
implementation and putting together something nicer still:

Current patch (& example conversions) below. I'm still debating whether
the watchdog core should implement its own mutex locking for the drivers
so that only special case users (eg those with IRQ handlers) need to do
their own private locking. That would further reduce mistakes by authors.


 drivers/watchdog/Kconfig        |   15 -
 drivers/watchdog/Makefile       |    8 -
 drivers/watchdog/alim1535_wdt.c |  289 ++++---------------------
 drivers/watchdog/alim7101_wdt.c |  218 +++----------------
 drivers/watchdog/softdog.c      |  211 ++++--------------
 drivers/watchdog/watchdog.c     |  296 ++++++++++++++++++++++++++
 drivers/watchdog/watchdog.h     |   35 +++
 drivers/watchdog/wdt.c          |  447 ++++++++-------------------------------
 8 files changed, 556 insertions(+), 963 deletions(-)
 create mode 100644 drivers/watchdog/watchdog.c
 create mode 100644 drivers/watchdog/watchdog.h


diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 254d115..b592ede 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -828,21 +828,6 @@ config WDT
 	  To compile this driver as a module, choose M here: the
 	  module will be called wdt.
 
-config WDT_501
-	bool "WDT501 features"
-	depends on WDT
-	help
-	  Saying Y here and creating a character special file /dev/temperature
-	  with major number 10 and minor number 131 ("man mknod") will give
-	  you a thermometer inside your computer: reading from
-	  /dev/temperature yields one byte, the temperature in degrees
-	  Fahrenheit. This works only if you have a WDT501P watchdog board
-	  installed.
-
-	  If you want to enable the Fan Tachometer on the WDT501P, then you
-	  can do this via the tachometer parameter. Only do this if you have a
-	  fan tachometer actually set up.
-
 #
 # PCI-based Watchdog Cards
 #
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index f3fb170..1a9f2b7 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -13,7 +13,7 @@
 # ISA-based Watchdog Cards
 obj-$(CONFIG_PCWATCHDOG) += pcwd.o
 obj-$(CONFIG_MIXCOMWD) += mixcomwd.o
-obj-$(CONFIG_WDT) += wdt.o
+obj-$(CONFIG_WDT) += wdt.o watchdog.o
 
 # PCI-based Watchdog Cards
 obj-$(CONFIG_PCIPCWATCHDOG) += pcwd_pci.o
@@ -57,8 +57,8 @@ obj-$(CONFIG_BFIN_WDT) += bfin_wdt.o
 # X86 (i386 + ia64 + x86_64) Architecture
 obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o
 obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o
-obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o
-obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o
+obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o softdog.o
+obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o softdog.o
 obj-$(CONFIG_SC520_WDT) += sc520_wdt.o
 obj-$(CONFIG_EUROTECH_WDT) += eurotechwdt.o
 obj-$(CONFIG_IB700_WDT) += ib700wdt.o
@@ -123,4 +123,4 @@ obj-$(CONFIG_SH_WDT) += shwdt.o
 # XTENSA Architecture
 
 # Architecture Independant
-obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o
+obj-$(CONFIG_SOFT_WATCHDOG) += softdog.o watchdog.o
diff --git a/drivers/watchdog/alim1535_wdt.c b/drivers/watchdog/alim1535_wdt.c
index 88760cb..b5bab8b 100644
--- a/drivers/watchdog/alim1535_wdt.c
+++ b/drivers/watchdog/alim1535_wdt.c
@@ -22,13 +22,13 @@
 #include <linux/uaccess.h>
 #include <linux/io.h>
 
+#include "watchdog.h"
+
 #define WATCHDOG_NAME "ALi_M1535"
 #define PFX WATCHDOG_NAME ": "
 #define WATCHDOG_TIMEOUT 60	/* 60 sec default timeout */
 
 /* internal variables */
-static unsigned long ali_is_open;
-static char ali_expect_release;
 static struct pci_dev *ali_pci;
 static u32 ali_timeout_bits;		/* stores the computed timeout */
 static DEFINE_SPINLOCK(ali_lock);	/* Guards the hardware */
@@ -53,7 +53,7 @@ MODULE_PARM_DESC(nowayout,
  *	configuration set.
  */
 
-static void ali_start(void)
+static int ali_start(struct watchdog *w)
 {
 	u32 val;
 
@@ -61,10 +61,16 @@ static void ali_start(void)
 
 	pci_read_config_dword(ali_pci, 0xCC, &val);
 	val &= ~0x3F;	/* Mask count */
-	val |= (1<<25) | ali_timeout_bits;
+	val |= (1 << 25) | ali_timeout_bits;
 	pci_write_config_dword(ali_pci, 0xCC, val);
 
 	spin_unlock(&ali_lock);
+	return 0;
+}
+
+static void ali_ping(struct watchdog *w)
+{
+	ali_start(w);
 }
 
 /*
@@ -73,7 +79,7 @@ static void ali_start(void)
  *	Stop the ALi watchdog countdown
  */
 
-static void ali_stop(void)
+static int ali_stop(struct watchdog *w)
 {
 	u32 val;
 
@@ -81,21 +87,11 @@ static void ali_stop(void)
 
 	pci_read_config_dword(ali_pci, 0xCC, &val);
 	val &= ~0x3F;	/* Mask count to zero (disabled) */
-	val &= ~(1<<25);/* and for safety mask the reset enable */
+	val &= ~(1 << 25);/* and for safety mask the reset enable */
 	pci_write_config_dword(ali_pci, 0xCC, val);
 
 	spin_unlock(&ali_lock);
-}
-
-/*
- *	ali_keepalive	-	send a keepalive to the watchdog
- *
- *      Send a keepalive to the timer (actually we restart the timer).
- */
-
-static void ali_keepalive(void)
-{
-	ali_start();
+	return 0;
 }
 
 /*
@@ -105,193 +101,45 @@ static void ali_keepalive(void)
  *	Computes the timeout values needed
  */
 
-static int ali_settimer(int t)
+static int ali_settimer(struct watchdog *w, int t)
 {
 	if (t < 0)
 		return -EINVAL;
 	else if (t < 60)
-		ali_timeout_bits = t|(1<<6);
+		ali_timeout_bits = t|(1 << 6);
 	else if (t < 3600)
-		ali_timeout_bits = (t/60)|(1<<7);
+		ali_timeout_bits = (t/60)|(1 << 7);
 	else if (t < 18000)
-		ali_timeout_bits = (t/300)|(1<<6)|(1<<7);
+		ali_timeout_bits = (t/300)|(1 << 6)|(1 << 7);
 	else
 		return -EINVAL;
 
-	timeout = t;
-	return 0;
-}
-
-/*
- *	/dev/watchdog handling
- */
-
-/*
- *	ali_write	-	writes to ALi watchdog
- *	@file: file from VFS
- *	@data: user address of data
- *	@len: length of data
- *	@ppos: pointer to the file offset
- *
- *	Handle a write to the ALi watchdog. Writing to the file pings
- *	the watchdog and resets it. Writing the magic 'V' sequence allows
- *	the next close to turn off the watchdog.
- */
-
-static ssize_t ali_write(struct file *file, const char __user *data,
-			      size_t len, loff_t *ppos)
-{
-	/* See if we got the magic character 'V' and reload the timer */
-	if (len) {
-		if (!nowayout) {
-			size_t i;
-
-			/* note: just in case someone wrote the
-			   magic character five months ago... */
-			ali_expect_release = 0;
-
-			/* scan to see whether or not we got
-			   the magic character */
-			for (i = 0; i != len; i++) {
-				char c;
-				if (get_user(c, data+i))
-					return -EFAULT;
-				if (c == 'V')
-					ali_expect_release = 42;
-			}
-		}
-
-		/* someone wrote to us, we should reload the timer */
-		ali_start();
-	}
-	return len;
-}
-
-/*
- *	ali_ioctl	-	handle watchdog ioctls
- *	@file: VFS file pointer
- *	@cmd: ioctl number
- *	@arg: arguments to the ioctl
- *
- *	Handle the watchdog ioctls supported by the ALi driver. Really
- *	we want an extension to enable irq ack monitoring and the like
- */
-
-static long ali_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	int __user *p = argp;
-	static struct watchdog_info ident = {
-		.options =		WDIOF_KEEPALIVEPING |
-					WDIOF_SETTIMEOUT |
-					WDIOF_MAGICCLOSE,
-		.firmware_version =	0,
-		.identity =		"ALi M1535 WatchDog Timer",
-	};
-
-	switch (cmd) {
-	case WDIOC_GETSUPPORT:
-		return copy_to_user(argp, &ident, sizeof(ident)) ? -EFAULT : 0;
-
-	case WDIOC_GETSTATUS:
-	case WDIOC_GETBOOTSTATUS:
-		return put_user(0, p);
-	case WDIOC_KEEPALIVE:
-		ali_keepalive();
-		return 0;
-	case WDIOC_SETOPTIONS:
-	{
-		int new_options, retval = -EINVAL;
-
-		if (get_user(new_options, p))
-			return -EFAULT;
-		if (new_options & WDIOS_DISABLECARD) {
-			ali_stop();
-			retval = 0;
-		}
-		if (new_options & WDIOS_ENABLECARD) {
-			ali_start();
-			retval = 0;
-		}
-		return retval;
-	}
-	case WDIOC_SETTIMEOUT:
-	{
-		int new_timeout;
-		if (get_user(new_timeout, p))
-			return -EFAULT;
-		if (ali_settimer(new_timeout))
-			return -EINVAL;
-		ali_keepalive();
-		/* Fall */
-	}
-	case WDIOC_GETTIMEOUT:
-		return put_user(timeout, p);
-	default:
-		return -ENOTTY;
-	}
-}
-
-/*
- *	ali_open	-	handle open of ali watchdog
- *	@inode: inode from VFS
- *	@file: file from VFS
- *
- *	Open the ALi watchdog device. Ensure only one person opens it
- *	at a time. Also start the watchdog running.
- */
-
-static int ali_open(struct inode *inode, struct file *file)
-{
-	/* /dev/watchdog can only be opened once */
-	if (test_and_set_bit(0, &ali_is_open))
-		return -EBUSY;
-
-	/* Activate */
-	ali_start();
-	return nonseekable_open(inode, file);
-}
-
-/*
- *	ali_release	-	close an ALi watchdog
- *	@inode: inode from VFS
- *	@file: file from VFS
- *
- *	Close the ALi watchdog device. Actual shutdown of the timer
- *	only occurs if the magic sequence has been set.
- */
-
-static int ali_release(struct inode *inode, struct file *file)
-{
-	/*
-	 *      Shut off the timer.
-	 */
-	if (ali_expect_release == 42)
-		ali_stop();
-	else {
-		printk(KERN_CRIT PFX
-				"Unexpected close, not stopping watchdog!\n");
-		ali_keepalive();
-	}
-	clear_bit(0, &ali_is_open);
-	ali_expect_release = 0;
+	w->timeout = t;
 	return 0;
 }
 
-/*
- *	ali_notify_sys	-	System down notifier
- *
- *	Notifier for system down
- */
+static const struct watchdog_ops wdt_ops = {
+	.start	=	ali_start,
+	.stop	=	ali_stop,
+	.reboot	=	ali_stop,
+	.ping	=	ali_ping,
+	.set_timeout =  ali_settimer,
+};
 
+static const struct watchdog_info ident = {
+	.options =		WDIOF_KEEPALIVEPING |
+				WDIOF_SETTIMEOUT |
+				WDIOF_MAGICCLOSE,
+	.firmware_version =	0,
+	.identity =		"ALi M1535 WatchDog Timer",
+};
 
-static int ali_notify_sys(struct notifier_block *this,
-					unsigned long code, void *unused)
-{
-	if (code == SYS_DOWN || code == SYS_HALT)
-		ali_stop();		/* Turn the WDT off */
-	return NOTIFY_DONE;
-}
+static struct watchdog aliwd = {
+	.name = "ALiM1535",
+	.info = &ident,
+	.ops = &wdt_ops,
+	.owner = THIS_MODULE,
+};
 
 /*
  *	Data for PCI driver interface
@@ -349,9 +197,9 @@ static int __init ali_find_watchdog(void)
 	/* Timer bits */
 	wdog &= ~0x3F;
 	/* Issued events */
-	wdog &= ~((1<<27)|(1<<26)|(1<<25)|(1<<24));
+	wdog &= ~((1 << 27)|(1 << 26)|(1 << 25)|(1 << 24));
 	/* No monitor bits */
-	wdog &= ~((1<<16)|(1<<13)|(1<<12)|(1<<11)|(1<<10)|(1<<9));
+	wdog &= ~((1 << 16)|(1 << 13)|(1 << 12)|(1 << 11)|(1 << 10)|(1 << 9));
 
 	pci_write_config_dword(pdev, 0xCC, wdog);
 
@@ -359,29 +207,6 @@ static int __init ali_find_watchdog(void)
 }
 
 /*
- *	Kernel Interfaces
- */
-
-static const struct file_operations ali_fops = {
-	.owner 		=	THIS_MODULE,
-	.llseek 	=	no_llseek,
-	.write		=	ali_write,
-	.unlocked_ioctl =	ali_ioctl,
-	.open 		=	ali_open,
-	.release 	=	ali_release,
-};
-
-static struct miscdevice ali_miscdev = {
-	.minor =	WATCHDOG_MINOR,
-	.name =		"watchdog",
-	.fops =		&ali_fops,
-};
-
-static struct notifier_block ali_notifier = {
-	.notifier_call =	ali_notify_sys,
-};
-
-/*
  *	watchdog_init	-	module initialiser
  *
  *	Scan for a suitable watchdog and if so initialize it. Return an error
@@ -406,31 +231,16 @@ static int __init watchdog_init(void)
 	}
 
 	/* Calculate the watchdog's timeout */
-	ali_settimer(timeout);
-
-	ret = register_reboot_notifier(&ali_notifier);
-	if (ret != 0) {
-		printk(KERN_ERR PFX
-			"cannot register reboot notifier (err=%d)\n", ret);
-		goto out;
-	}
+	ali_settimer(&aliwd, timeout);
 
-	ret = misc_register(&ali_miscdev);
-	if (ret != 0) {
-		printk(KERN_ERR PFX
-			"cannot register miscdev on minor=%d (err=%d)\n",
-						WATCHDOG_MINOR, ret);
-		goto unreg_reboot;
+	ret = watchdog_register(&aliwd, nowayout);
+	if (ret < 0) {
+		pci_dev_put(ali_pci);
+		return ret;
 	}
-
 	printk(KERN_INFO PFX "initialized. timeout=%d sec (nowayout=%d)\n",
 		timeout, nowayout);
-
-out:
-	return ret;
-unreg_reboot:
-	unregister_reboot_notifier(&ali_notifier);
-	goto out;
+	return 0;
 }
 
 /*
@@ -441,12 +251,7 @@ unreg_reboot:
 
 static void __exit watchdog_exit(void)
 {
-	/* Stop the timer before we leave */
-	ali_stop();
-
-	/* Deregister */
-	misc_deregister(&ali_miscdev);
-	unregister_reboot_notifier(&ali_notifier);
+	watchdog_unregister(&aliwd);
 	pci_dev_put(ali_pci);
 }
 
diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c
index c495f36..a8259ac 100644
--- a/drivers/watchdog/alim7101_wdt.c
+++ b/drivers/watchdog/alim7101_wdt.c
@@ -36,6 +36,8 @@
 #include <linux/uaccess.h>
 #include <asm/system.h>
 
+#include "watchdog.h"
+
 #define OUR_NAME "alim7101_wdt"
 #define PFX OUR_NAME ": "
 
@@ -51,7 +53,7 @@
  * We're going to use a 1 second timeout.
  * If we reset the watchdog every ~250ms we should be safe.  */
 
-#define WDT_INTERVAL (HZ/4+1)
+#define WDT_INTERVAL (HZ / 4 + 1)
 
 /*
  * We must not require too good response from the userspace daemon.
@@ -75,8 +77,6 @@ MODULE_PARM_DESC(use_gpio,
 static void wdt_timer_ping(unsigned long);
 static DEFINE_TIMER(timer, wdt_timer_ping, 0, 1);
 static unsigned long next_heartbeat;
-static unsigned long wdt_is_open;
-static char wdt_expect_close;
 static struct pci_dev *alim7101_pmu;
 
 static int nowayout = WATCHDOG_NOWAYOUT;
@@ -150,9 +150,9 @@ static void wdt_change(int writeval)
 	}
 }
 
-static void wdt_startup(void)
+static int wdt_startup(struct watchdog *w)
 {
-	next_heartbeat = jiffies + (timeout * HZ);
+	next_heartbeat = jiffies + (w->timeout * HZ);
 
 	/* We must enable before we kick off the timer in case the timer
 	   occurs as we ping it */
@@ -163,185 +163,60 @@ static void wdt_startup(void)
 	mod_timer(&timer, jiffies + WDT_INTERVAL);
 
 	printk(KERN_INFO PFX "Watchdog timer is now enabled.\n");
+	return 0;
 }
 
-static void wdt_turnoff(void)
+static int wdt_turnoff(struct watchdog *w)
 {
 	/* Stop the timer */
 	del_timer_sync(&timer);
 	wdt_change(WDT_DISABLE);
 	printk(KERN_INFO PFX "Watchdog timer is now disabled...\n");
+	return 0;
 }
 
-static void wdt_keepalive(void)
+static void wdt_keepalive(struct watchdog *w)
 {
 	/* user land ping */
 	next_heartbeat = jiffies + (timeout * HZ);
 }
 
-/*
- * /dev/watchdog handling
- */
-
-static ssize_t fop_write(struct file *file, const char __user *buf,
-						size_t count, loff_t *ppos)
-{
-	/* See if we got the magic character 'V' and reload the timer */
-	if (count) {
-		if (!nowayout) {
-			size_t ofs;
-
-			/* note: just in case someone wrote the magic character
-			 * five months ago... */
-			wdt_expect_close = 0;
-
-			/* now scan */
-			for (ofs = 0; ofs != count; ofs++) {
-				char c;
-				if (get_user(c, buf+ofs))
-					return -EFAULT;
-				if (c == 'V')
-					wdt_expect_close = 42;
-			}
-		}
-		/* someone wrote to us, we should restart timer */
-		wdt_keepalive();
-	}
-	return count;
-}
-
-static int fop_open(struct inode *inode, struct file *file)
+static int wdt_set_timeout(struct watchdog *w, int t)
 {
-	/* Just in case we're already talking to someone... */
-	if (test_and_set_bit(0, &wdt_is_open))
-		return -EBUSY;
-	/* Good, fire up the show */
-	wdt_startup();
-	return nonseekable_open(inode, file);
-}
-
-static int fop_close(struct inode *inode, struct file *file)
-{
-	if (wdt_expect_close == 42)
-		wdt_turnoff();
-	else {
-		/* wim: shouldn't there be a: del_timer(&timer); */
-		printk(KERN_CRIT PFX
-		  "device file closed unexpectedly. Will not stop the WDT!\n");
-	}
-	clear_bit(0, &wdt_is_open);
-	wdt_expect_close = 0;
+	if (t < 0 || t > 65535)
+		return -EINVAL;
+	w->timeout = t;
 	return 0;
 }
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	int __user *p = argp;
-	static struct watchdog_info ident = {
-		.options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT
-							| WDIOF_MAGICCLOSE,
-		.firmware_version = 1,
-		.identity = "ALiM7101",
-	};
-
-	switch (cmd) {
-	case WDIOC_GETSUPPORT:
-		return copy_to_user(argp, &ident, sizeof(ident)) ? -EFAULT : 0;
-	case WDIOC_GETSTATUS:
-	case WDIOC_GETBOOTSTATUS:
-		return put_user(0, p);
-	case WDIOC_KEEPALIVE:
-		wdt_keepalive();
-		return 0;
-	case WDIOC_SETOPTIONS:
-	{
-		int new_options, retval = -EINVAL;
-
-		if (get_user(new_options, p))
-			return -EFAULT;
-		if (new_options & WDIOS_DISABLECARD) {
-			wdt_turnoff();
-			retval = 0;
-		}
-		if (new_options & WDIOS_ENABLECARD) {
-			wdt_startup();
-			retval = 0;
-		}
-		return retval;
-	}
-	case WDIOC_SETTIMEOUT:
-	{
-		int new_timeout;
-
-		if (get_user(new_timeout, p))
-			return -EFAULT;
-		/* arbitrary upper limit */
-		if (new_timeout < 1 || new_timeout > 3600)
-			return -EINVAL;
-		timeout = new_timeout;
-		wdt_keepalive();
-		/* Fall through */
-	}
-	case WDIOC_GETTIMEOUT:
-		return put_user(timeout, p);
-	default:
-		return -ENOTTY;
-	}
-}
-
-static const struct file_operations wdt_fops = {
-	.owner		=	THIS_MODULE,
-	.llseek		=	no_llseek,
-	.write		=	fop_write,
-	.open		=	fop_open,
-	.release	=	fop_close,
-	.unlocked_ioctl	=	fop_ioctl,
+static struct watchdog_ops wdt_ops = {
+	.start	=	wdt_startup,
+	.stop	=	wdt_turnoff,
+	.ping	=	wdt_keepalive,
+	.reboot =	wdt_turnoff,
+	.set_timeout =  wdt_set_timeout,
 };
 
-static struct miscdevice wdt_miscdev = {
-	.minor	=	WATCHDOG_MINOR,
-	.name	=	"watchdog",
-	.fops	=	&wdt_fops,
+static struct watchdog_info ident = {
+	.options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT
+						| WDIOF_MAGICCLOSE,
+	.firmware_version = 1,
+	.identity = "ALiM7101",
 };
 
+static struct watchdog aliwd = {
+	.name = "ALiM7101",
+	.info = &ident,
+	.ops = &wdt_ops,
+	.owner = THIS_MODULE,
+};
 /*
  *	Notifier for system down
  */
 
-static int wdt_notify_sys(struct notifier_block *this,
-					unsigned long code, void *unused)
-{
-	if (code == SYS_DOWN || code == SYS_HALT)
-		wdt_turnoff();
-
-	if (code == SYS_RESTART) {
-		/*
-		 * Cobalt devices have no way of rebooting themselves other
-		 * than getting the watchdog to pull reset, so we restart the
-		 * watchdog on reboot with no heartbeat
-		 */
-		wdt_change(WDT_ENABLE);
-		printk(KERN_INFO PFX "Watchdog timer is now enabled with no heartbeat - should reboot in ~1 second.\n");
-	}
-	return NOTIFY_DONE;
-}
-
-/*
- *	The WDT needs to learn about soft shutdowns in order to
- *	turn the timebomb registers off.
- */
-
-static struct notifier_block wdt_notifier = {
-	.notifier_call = wdt_notify_sys,
-};
-
 static void __exit alim7101_wdt_unload(void)
 {
-	wdt_turnoff();
-	/* Deregister */
-	misc_deregister(&wdt_miscdev);
-	unregister_reboot_notifier(&wdt_notifier);
+	watchdog_unregister(&aliwd);
 	pci_dev_put(alim7101_pmu);
 }
 
@@ -389,30 +264,15 @@ static int __init alim7101_wdt_init(void)
 			"timeout value must be 1 <= x <= 3600, using %d\n",
 								timeout);
 	}
-
-	rc = register_reboot_notifier(&wdt_notifier);
-	if (rc) {
-		printk(KERN_ERR PFX
-			"cannot register reboot notifier (err=%d)\n", rc);
-		goto err_out;
-	}
-
-	rc = misc_register(&wdt_miscdev);
-	if (rc) {
-		printk(KERN_ERR PFX "cannot register miscdev on minor=%d (err=%d)\n",
-			wdt_miscdev.minor, rc);
-		goto err_out_reboot;
-	}
-
-	if (nowayout)
-		__module_get(THIS_MODULE);
-
-	printk(KERN_INFO PFX "WDT driver for ALi M7101 initialised. timeout=%d sec (nowayout=%d)\n",
-		timeout, nowayout);
+	aliwd.timeout = timeout;
+
+	rc = watchdog_register(&aliwd, nowayout);
+	if (rc < 0)
+		return rc;
+	printk(KERN_INFO PFX
+	"WDT driver for ALi M7101 initialised. timeout=%d sec (nowayout=%d)\n",
+					timeout, nowayout);
 	return 0;
-
-err_out_reboot:
-	unregister_reboot_notifier(&wdt_notifier);
 err_out:
 	pci_dev_put(alim7101_pmu);
 	return rc;
diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
index bb3c75e..d1113a3 100644
--- a/drivers/watchdog/softdog.c
+++ b/drivers/watchdog/softdog.c
@@ -49,6 +49,8 @@
 #include <linux/jiffies.h>
 #include <linux/uaccess.h>
 
+#include "watchdog.h"
+
 #define PFX "SoftDog: "
 
 #define TIMER_MARGIN	60		/* Default is 60 seconds */
@@ -64,14 +66,11 @@ MODULE_PARM_DESC(nowayout,
 		"Watchdog cannot be stopped once started (default="
 				__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
 
-#ifdef ONLY_TESTING
-static int soft_noboot = 1;
-#else
-static int soft_noboot = 0;
-#endif  /* ONLY_TESTING */
-
+static int soft_noboot;
 module_param(soft_noboot, int, 0);
-MODULE_PARM_DESC(soft_noboot, "Softdog action, set to 1 to ignore reboots, 0 to reboot (default depends on ONLY_TESTING)");
+MODULE_PARM_DESC(soft_noboot, "Softdog action, set to 1 to ignore reboots, 0 to reboot (default 0)");
+
+static struct watchdog softdog;
 
 /*
  *	Our timer
@@ -81,9 +80,6 @@ static void watchdog_fire(unsigned long);
 
 static struct timer_list watchdog_ticktock =
 		TIMER_INITIALIZER(watchdog_fire, 0, 0);
-static unsigned long driver_open, orphan_timer;
-static char expect_close;
-
 
 /*
  *	If the timer expires..
@@ -91,7 +87,7 @@ static char expect_close;
 
 static void watchdog_fire(unsigned long data)
 {
-	if (test_and_clear_bit(0, &orphan_timer))
+	if (test_and_clear_bit(WDOG_ORPHAN, &softdog.status))
 		module_put(THIS_MODULE);
 
 	if (soft_noboot)
@@ -107,161 +103,52 @@ static void watchdog_fire(unsigned long data)
  *	Softdog operations
  */
 
-static int softdog_keepalive(void)
+static void softdog_keepalive(struct watchdog *w)
 {
-	mod_timer(&watchdog_ticktock, jiffies+(soft_margin*HZ));
-	return 0;
+	mod_timer(&watchdog_ticktock, jiffies + (w->timeout*HZ));
 }
 
-static int softdog_stop(void)
+static int softdog_start(struct watchdog *w)
 {
-	del_timer(&watchdog_ticktock);
+	mod_timer(&watchdog_ticktock, jiffies + (w->timeout*HZ));
 	return 0;
 }
 
-static int softdog_set_heartbeat(int t)
+static int softdog_stop(struct watchdog *w)
 {
-	if ((t < 0x0001) || (t > 0xFFFF))
-		return -EINVAL;
-
-	soft_margin = t;
+	del_timer(&watchdog_ticktock);
 	return 0;
 }
 
-/*
- *	/dev/watchdog handling
- */
-
-static int softdog_open(struct inode *inode, struct file *file)
+static int softdog_set_heartbeat(struct watchdog *w, int t)
 {
-	if (test_and_set_bit(0, &driver_open))
-		return -EBUSY;
-	if (!test_and_clear_bit(0, &orphan_timer))
-		__module_get(THIS_MODULE);
-	/*
-	 *	Activate timer
-	 */
-	softdog_keepalive();
-	return nonseekable_open(inode, file);
-}
+	if (t < 0x0001 || t > 0xFFFF)
+		return -EINVAL;
 
-static int softdog_release(struct inode *inode, struct file *file)
-{
-	/*
-	 *	Shut off the timer.
-	 * 	Lock it in if it's a module and we set nowayout
-	 */
-	if (expect_close == 42) {
-		softdog_stop();
-		module_put(THIS_MODULE);
-	} else {
-		printk(KERN_CRIT PFX
-			"Unexpected close, not stopping watchdog!\n");
-		set_bit(0, &orphan_timer);
-		softdog_keepalive();
-	}
-	clear_bit(0, &driver_open);
-	expect_close = 0;
+	w->timeout = t;
 	return 0;
 }
 
-static ssize_t softdog_write(struct file *file, const char __user *data,
-						size_t len, loff_t *ppos)
-{
-	/*
-	 *	Refresh the timer.
-	 */
-	if (len) {
-		if (!nowayout) {
-			size_t i;
-
-			/* In case it was set long ago */
-			expect_close = 0;
-
-			for (i = 0; i != len; i++) {
-				char c;
-
-				if (get_user(c, data + i))
-					return -EFAULT;
-				if (c == 'V')
-					expect_close = 42;
-			}
-		}
-		softdog_keepalive();
-	}
-	return len;
-}
-
-static long softdog_ioctl(struct file *file, unsigned int cmd,
-							unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	int __user *p = argp;
-	int new_margin;
-	static const struct watchdog_info ident = {
-		.options =		WDIOF_SETTIMEOUT |
-					WDIOF_KEEPALIVEPING |
-					WDIOF_MAGICCLOSE,
-		.firmware_version =	0,
-		.identity =		"Software Watchdog",
-	};
-	switch (cmd) {
-	default:
-		return -ENOTTY;
-	case WDIOC_GETSUPPORT:
-		return copy_to_user(argp, &ident, sizeof(ident)) ? -EFAULT : 0;
-	case WDIOC_GETSTATUS:
-	case WDIOC_GETBOOTSTATUS:
-		return put_user(0, p);
-	case WDIOC_KEEPALIVE:
-		softdog_keepalive();
-		return 0;
-	case WDIOC_SETTIMEOUT:
-		if (get_user(new_margin, p))
-			return -EFAULT;
-		if (softdog_set_heartbeat(new_margin))
-			return -EINVAL;
-		softdog_keepalive();
-		/* Fall */
-	case WDIOC_GETTIMEOUT:
-		return put_user(soft_margin, p);
-	}
-}
-
-/*
- *	Notifier for system down
- */
-
-static int softdog_notify_sys(struct notifier_block *this, unsigned long code,
-	void *unused)
-{
-	if (code == SYS_DOWN || code == SYS_HALT)
-		/* Turn the WDT off */
-		softdog_stop();
-	return NOTIFY_DONE;
-}
-
-/*
- *	Kernel Interfaces
- */
-
-static const struct file_operations softdog_fops = {
-	.owner		= THIS_MODULE,
-	.llseek		= no_llseek,
-	.write		= softdog_write,
-	.unlocked_ioctl	= softdog_ioctl,
-	.open		= softdog_open,
-	.release	= softdog_release,
+static const struct watchdog_info ident = {
+	.options =		WDIOF_SETTIMEOUT |
+				WDIOF_KEEPALIVEPING |
+				WDIOF_MAGICCLOSE,
+	.firmware_version =	0,
+	.identity =		"Software Watchdog",
 };
 
-static struct miscdevice softdog_miscdev = {
-	.minor		= WATCHDOG_MINOR,
-	.name		= "watchdog",
-	.fops		= &softdog_fops,
+static struct watchdog_ops wdt_ops = {
+	.start	=	softdog_start,
+	.stop	=	softdog_stop,
+	.ping	=	softdog_keepalive,
+	.set_timeout =  softdog_set_heartbeat
 };
 
-static struct notifier_block softdog_notifier = {
-	.notifier_call	= softdog_notify_sys,
+static struct watchdog softdog = {
+	.name = "softdog",
+	.info = &ident,
+	.ops = &wdt_ops,
+	.owner = THIS_MODULE,
 };
 
 static char banner[] __initdata = KERN_INFO "Software Watchdog Timer: 0.07 initialized. soft_noboot=%d soft_margin=%d sec (nowayout= %d)\n";
@@ -272,38 +159,22 @@ static int __init watchdog_init(void)
 
 	/* Check that the soft_margin value is within it's range;
 	   if not reset to the default */
-	if (softdog_set_heartbeat(soft_margin)) {
-		softdog_set_heartbeat(TIMER_MARGIN);
+	if (softdog_set_heartbeat(&softdog, soft_margin)) {
+		softdog_set_heartbeat(&softdog, TIMER_MARGIN);
 		printk(KERN_INFO PFX
 		    "soft_margin must be 0 < soft_margin < 65536, using %d\n",
 			TIMER_MARGIN);
 	}
-
-	ret = register_reboot_notifier(&softdog_notifier);
-	if (ret) {
-		printk(KERN_ERR PFX
-			"cannot register reboot notifier (err=%d)\n", ret);
-		return ret;
-	}
-
-	ret = misc_register(&softdog_miscdev);
-	if (ret) {
-		printk(KERN_ERR PFX
-			"cannot register miscdev on minor=%d (err=%d)\n",
-						WATCHDOG_MINOR, ret);
-		unregister_reboot_notifier(&softdog_notifier);
-		return ret;
-	}
-
-	printk(banner, soft_noboot, soft_margin, nowayout);
-
-	return 0;
+	
+	ret = watchdog_register(&softdog, nowayout);
+	if (ret == 0)
+		printk(banner, soft_noboot, soft_margin, nowayout);
+	return ret;
 }
 
 static void __exit watchdog_exit(void)
 {
-	misc_deregister(&softdog_miscdev);
-	unregister_reboot_notifier(&softdog_notifier);
+	watchdog_unregister(&softdog);
 }
 
 module_init(watchdog_init);
diff --git a/drivers/watchdog/watchdog.c b/drivers/watchdog/watchdog.c
new file mode 100644
index 0000000..4f89396
--- /dev/null
+++ b/drivers/watchdog/watchdog.c
@@ -0,0 +1,296 @@
+/*
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/types.h>
+#include <linux/miscdevice.h>
+#include <linux/watchdog.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/uaccess.h>
+#include <linux/notifier.h>
+#include <linux/reboot.h>
+#include "watchdog.h"
+
+/* For now we track a single watchdog */
+static struct watchdog *watchdog;
+static unsigned long watchdog_busy;
+
+static long watchdog_ioctl(struct file *file, unsigned int cmd,
+							unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	int __user *p = argp;
+	struct watchdog *w = file->private_data;
+	int val = 0;
+	int r;
+
+	if (w->ops->ioctl) {
+		r = w->ops->ioctl(w, cmd, arg);
+		if (r != -ENOIOCTLCMD)
+			return r;
+	}
+	switch (cmd) {
+	case WDIOC_GETSUPPORT:
+		return copy_to_user(argp, w->info,
+					sizeof(struct watchdog_info));
+	case WDIOC_GETSTATUS:
+		if (w->ops->status)
+			val = w->ops->status(w);		
+		return put_user(val, p);
+	case WDIOC_GETBOOTSTATUS:
+		return put_user(w->boot_status, p);
+	case WDIOC_KEEPALIVE:
+		w->ops->ping(w);
+		return 0;
+	case WDIOC_SETTIMEOUT:
+		if (w->ops->set_timeout == NULL)
+			return -EOPNOTSUPP;
+		if (get_user(val, p))
+			return -EFAULT;
+		r = w->ops->set_timeout(w, val);
+		if (r < 0)
+			return r;
+		w->timeout = val;
+		w->ops->ping(w);
+		return 0;
+	case WDIOC_GETTIMEOUT:
+		if (w->timeout)
+			return put_user(w->timeout, p);
+		return -EOPNOTSUPP;
+	case WDIOC_SETOPTIONS:
+		if (get_user(val, p))
+			return -EFAULT;
+		if (val & WDIOS_DISABLECARD) {
+			if (w->ops->stop == NULL)
+				return -EOPNOTSUPP;
+			r = w->ops->stop(w);
+			if (r < 0)
+				return r;
+		}
+		if (val & WDIOS_ENABLECARD) {
+			r = w->ops->start(w);
+			if (r < 0)
+				return r;
+		}
+		break;
+	default:
+		return -ENOTTY;
+	}
+	return -ENOTTY;
+}
+
+static int watchdog_open(struct inode *inode, struct file *file)
+{
+	int r = -EBUSY;
+	/* We will need to rework this when we support multiple dogs */
+	struct watchdog *w = watchdog;
+
+	if (!try_module_get(w->owner))
+		return r;
+	if (test_and_set_bit(WDOG_OPEN, &w->status))
+		goto out;
+	file->private_data = w;
+	
+	r = w->ops->start(w);
+	if (r < 0)
+		goto out_bit;
+
+	clear_bit(WDOG_EXPECT_RELEASE, &w->status);
+	r = nonseekable_open(inode, file);
+	if (r == 0) {
+		/* We leaked a reference to lock the module in on close
+		   now we can reclaim it as we re-opened before triggering */
+		if (test_and_clear_bit(WDOG_ORPHAN, &w->status))
+			module_put(w->owner);
+		return 0;
+	}
+	if (w->ops->stop)
+		w->ops->stop(w);
+out_bit:
+	clear_bit(WDOG_OPEN, &w->status);
+out:
+	module_put(watchdog->owner);
+	return r;
+}
+
+static int watchdog_release(struct inode *inode, struct file *file)
+{
+	struct watchdog *w = file->private_data;
+	if (test_bit(WDOG_EXPECT_RELEASE, &w->status) &&
+	    !test_bit(WDOG_NO_WAY_OUT, &w->status) &&
+						w->ops->stop != NULL) {
+		if (w->ops->stop(w) == 0) {
+			module_put(watchdog->owner);
+			return 0;
+		}
+	}
+	printk(KERN_CRIT "%s: not stopping watchdog.\n", w->name);
+	set_bit(WDOG_ORPHAN, &w->status);
+	/* Deliberately leak a module reference in this case */
+	return 0;
+}
+
+static int watchdog_write(struct file *file, const char __user *data,
+						size_t len, loff_t *ppos)
+{
+	struct watchdog *w = file->private_data;
+	size_t i;
+	if (len == 0)	/* Can we see this even ? */
+		return 0;
+
+	clear_bit(WDOG_EXPECT_RELEASE, &w->status);
+	/* scan to see whether or not we got the magic character */
+	for (i = 0; i != len; i++) {
+		char c;
+		if (get_user(c, data+i))
+			return -EFAULT;
+		if (c == 'V')
+			set_bit(WDOG_EXPECT_RELEASE, &w->status);
+	}
+	/* And fire the ping timer */
+	w->ops->ping(w);
+	return len;
+}
+
+static ssize_t watchdog_temp_read(struct file *file, char __user *buf,
+						size_t count, loff_t *ptr)
+{
+	struct watchdog *w = file->private_data;
+	u8 temperature = w->ops->temperature(w);
+	if (copy_to_user(buf, &temperature, 1))
+		return -EFAULT;
+	return 1;
+}
+
+static int watchdog_temp_open(struct inode *inode, struct file *file)
+{
+	int r;
+	file->private_data = watchdog;
+	if (!try_module_get(watchdog->owner))
+		return -EBUSY;
+	r = nonseekable_open(inode, file);
+	if (r < 0)
+		module_put(watchdog->owner);
+	return r;
+}
+
+static int watchdog_temp_release(struct inode *inode, struct file *file)
+{
+	struct watchdog *w = file->private_data;
+	module_put(w->owner);
+	return 0;
+}
+
+static const struct file_operations watchdog_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= no_llseek,
+	.write		= watchdog_write,
+	.unlocked_ioctl	= watchdog_ioctl,
+	.open		= watchdog_open,
+	.release	= watchdog_release,
+};
+
+static struct miscdevice watchdog_misc = {
+	.minor = WATCHDOG_MINOR,
+	.name = "watchdog",
+	.fops = &watchdog_fops,
+};
+
+static const struct file_operations temperature_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= no_llseek,
+	.read		= watchdog_temp_read,
+	.open		= watchdog_temp_open,
+	.release	= watchdog_temp_release,
+};
+
+static struct miscdevice temperature_misc = {
+	.minor = TEMP_MINOR,
+	.name = "temperature",
+	.fops = &temperature_fops,
+};
+
+int watchdog_register(struct watchdog *w, int nwo)
+{
+	int r;
+
+	if (test_and_set_bit(0, &watchdog_busy)) {
+		printk(KERN_ERR "watchdog: only one watchdog at a time currently supported.\n");
+		return -EBUSY;
+	}
+	
+	watchdog = w;
+	
+	w->status = 0;
+	if (nwo)
+		set_bit(WDOG_NO_WAY_OUT, &w->status);
+
+	if (w->ops->temperature) {
+		r = misc_register(&temperature_misc);
+		if (r < 0) {
+			printk(KERN_ERR
+			 "%s: cannot register miscdev on minor=%d (err=%d)\n",
+						w->name, TEMP_MINOR, r);
+			goto out_clear;
+		}
+	}
+	r = misc_register(&watchdog_misc);
+	if (r == 0)
+		return 0;
+	printk(KERN_ERR	"%s: cannot register miscdev on minor=%d (err=%d)\n",
+						w->name, WATCHDOG_MINOR, r);
+	if (w->ops->temperature)
+		misc_deregister(&temperature_misc);
+out_clear:
+	watchdog = NULL;
+	clear_bit(0, &watchdog_busy);
+	return r;
+}
+EXPORT_SYMBOL_GPL(watchdog_register);
+
+void watchdog_unregister(struct watchdog *w)
+{
+	watchdog = NULL;
+	misc_deregister(&watchdog_misc);
+	if (w->ops->temperature)
+		misc_deregister(&temperature_misc);
+	clear_bit(0, &watchdog_busy);
+}
+EXPORT_SYMBOL_GPL(watchdog_unregister);
+
+/* The notifier will need to change for multiple dogs, but at that point
+   hopefully we have a class and class based power methods anyway */
+
+static int watchdog_notify(struct notifier_block *this, unsigned long code,
+	void *dog)
+{
+	if (watchdog && (code == SYS_DOWN || code == SYS_HALT)) {
+		if (watchdog->ops->reboot)
+			watchdog->ops->reboot(watchdog);
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block watchdog_notifier = {
+	.notifier_call = watchdog_notify,
+};
+
+static int __init watchdog_init(void)
+{
+	int r = register_reboot_notifier(&watchdog_notifier);
+	if (r < 0)
+		printk(KERN_ERR "watchdog: unable to register notifier.\n");
+	return r;
+}
+
+static void __devexit watchdog_exit(void)
+{
+	unregister_reboot_notifier(&watchdog_notifier);
+}
+
+module_init(watchdog_init);
+module_exit(watchdog_exit);
diff --git a/drivers/watchdog/watchdog.h b/drivers/watchdog/watchdog.h
new file mode 100644
index 0000000..0d22eaa
--- /dev/null
+++ b/drivers/watchdog/watchdog.h
@@ -0,0 +1,35 @@
+struct watchdog;
+
+struct watchdog_ops
+{
+	int (*start)(struct watchdog *w);
+	int (*stop)(struct watchdog *w);
+	void (*ping)(struct watchdog *w);
+	int (*status)(struct watchdog *w);
+	int (*temperature)(struct watchdog *w);
+	int (*set_timeout)(struct watchdog *w, int t);
+	int (*reboot)(struct watchdog *w);
+	long (*ioctl)(struct watchdog *w, unsigned int cmd, unsigned long arg);
+};
+
+struct watchdog
+{
+	char *name;
+	const struct watchdog_info *info;
+	const struct watchdog_ops *ops;
+	int timeout;
+	int boot_status;
+	long status;
+#define WDOG_OPEN		0
+#define WDOG_EXPECT_RELEASE	1
+#define WDOG_ORPHAN		2
+#define WDOG_NO_WAY_OUT		3
+	struct module *owner;
+};
+
+
+extern void watchdog_unregister(struct watchdog *w);
+extern int watchdog_register(struct watchdog *w, int nwo);
+
+
+
diff --git a/drivers/watchdog/wdt.c b/drivers/watchdog/wdt.c
index 53a6b18..af947b5 100644
--- a/drivers/watchdog/wdt.c
+++ b/drivers/watchdog/wdt.c
@@ -1,5 +1,5 @@
 /*
- *	Industrial Computer Source WDT500/501 driver
+ *	Industrial Computer Source WDT501 driver
  *
  *	(c) Copyright 1996-1997 Alan Cox <alan@redhat.com>, All Rights Reserved.
  *				http://www.redhat.com
@@ -49,8 +49,8 @@
 #include <asm/system.h>
 #include "wd501p.h"
 
-static unsigned long wdt_is_open;
-static char expect_close;
+#include "watchdog.h"
+
 
 /*
  *	Module parameters
@@ -82,14 +82,15 @@ MODULE_PARM_DESC(io, "WDT io port (default=0x240)");
 module_param(irq, int, 0);
 MODULE_PARM_DESC(irq, "WDT irq (default=11)");
 
-#ifdef CONFIG_WDT_501
 /* Support for the Fan Tachometer on the WDT501-P */
 static int tachometer;
-
+static int type = 500;
 module_param(tachometer, int, 0);
 MODULE_PARM_DESC(tachometer,
 		"WDT501-P Fan Tachometer support (0=disable, default=0)");
-#endif /* CONFIG_WDT_501 */
+module_param(type, int, 0);
+MODULE_PARM_DESC(type,
+		"WDT501-P Card type (500 or 501 , default=500)");
 
 /*
  *	Programming support
@@ -115,7 +116,7 @@ static void wdt_ctr_load(int ctr, int val)
  *	Start the watchdog driver.
  */
 
-static int wdt_start(void)
+static int wdt_start(struct watchdog *w)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&wdt_lock, flags);
@@ -140,7 +141,7 @@ static int wdt_start(void)
  *	Stop the watchdog driver.
  */
 
-static int wdt_stop(void)
+static int wdt_stop(struct watchdog *w)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&wdt_lock, flags);
@@ -158,7 +159,7 @@ static int wdt_stop(void)
  *	reloading the cascade counter.
  */
 
-static int wdt_ping(void)
+static void wdt_ping(struct watchdog *w)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&wdt_lock, flags);
@@ -169,7 +170,6 @@ static int wdt_ping(void)
 	wdt_ctr_load(1, wd_heartbeat);	/* Heartbeat */
 	outb_p(0, WDT_DC);		/* Enable watchdog */
 	spin_unlock_irqrestore(&wdt_lock, flags);
-	return 0;
 }
 
 /**
@@ -181,12 +181,12 @@ static int wdt_ping(void)
  *	successful we return 0.
  */
 
-static int wdt_set_heartbeat(int t)
+static int wdt_set_heartbeat(struct watchdog *w, int t)
 {
 	if (t < 1 || t > 65535)
 		return -EINVAL;
 
-	heartbeat = t;
+	w->timeout = t;
 	wd_heartbeat = t * 100;
 	return 0;
 }
@@ -202,36 +202,36 @@ static int wdt_set_heartbeat(int t)
  *	we then map the bits onto the status ioctl flags.
  */
 
-static int wdt_get_status(int *status)
+static int wdt_get_status(struct watchdog *w)
 {
 	unsigned char new_status;
+	int status = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&wdt_lock, flags);
 	new_status = inb_p(WDT_SR);
 	spin_unlock_irqrestore(&wdt_lock, flags);
 
-	*status = 0;
+	status = 0;
 	if (new_status & WDC_SR_ISOI0)
-		*status |= WDIOF_EXTERN1;
+		status |= WDIOF_EXTERN1;
 	if (new_status & WDC_SR_ISII1)
-		*status |= WDIOF_EXTERN2;
-#ifdef CONFIG_WDT_501
-	if (!(new_status & WDC_SR_TGOOD))
-		*status |= WDIOF_OVERHEAT;
-	if (!(new_status & WDC_SR_PSUOVER))
-		*status |= WDIOF_POWEROVER;
-	if (!(new_status & WDC_SR_PSUUNDR))
-		*status |= WDIOF_POWERUNDER;
-	if (tachometer) {
-		if (!(new_status & WDC_SR_FANGOOD))
-			*status |= WDIOF_FANFAULT;
+		status |= WDIOF_EXTERN2;
+	if (type == 501) {
+		if (!(new_status & WDC_SR_TGOOD))
+			status |= WDIOF_OVERHEAT;
+		if (!(new_status & WDC_SR_PSUOVER))
+			status |= WDIOF_POWEROVER;
+		if (!(new_status & WDC_SR_PSUUNDR))
+			status |= WDIOF_POWERUNDER;
+		if (tachometer) {
+			if (!(new_status & WDC_SR_FANGOOD))
+				status |= WDIOF_FANFAULT;
+		}
 	}
-#endif /* CONFIG_WDT_501 */
-	return 0;
+	return status;
 }
 
-#ifdef CONFIG_WDT_501
 /**
  *	wdt_get_temperature:
  *
@@ -239,7 +239,7 @@ static int wdt_get_status(int *status)
  *	farenheit. It was designed by an imperial measurement luddite.
  */
 
-static int wdt_get_temperature(int *temperature)
+static int wdt_get_temperature(struct watchdog *w)
 {
 	unsigned short c;
 	unsigned long flags;
@@ -247,10 +247,18 @@ static int wdt_get_temperature(int *temperature)
 	spin_lock_irqsave(&wdt_lock, flags);
 	c = inb_p(WDT_RT);
 	spin_unlock_irqrestore(&wdt_lock, flags);
-	*temperature = (c * 11 / 15) + 7;
-	return 0;
+	return (c * 11 / 15) + 7;
+}
+
+static void wdt_decode_501(int status)
+{
+	if (!(status & WDC_SR_TGOOD))
+		printk(KERN_CRIT "Overheat alarm.(%d)\n", inb_p(WDT_RT));
+	if (!(status & WDC_SR_PSUOVER))
+		printk(KERN_CRIT "PSU over voltage.\n");
+	if (!(status & WDC_SR_PSUUNDR))
+		printk(KERN_CRIT "PSU under voltage.\n");
 }
-#endif /* CONFIG_WDT_501 */
 
 /**
  *	wdt_interrupt:
@@ -275,18 +283,13 @@ static irqreturn_t wdt_interrupt(int irq, void *dev_id)
 
 	printk(KERN_CRIT "WDT status %d\n", status);
 
-#ifdef CONFIG_WDT_501
-	if (!(status & WDC_SR_TGOOD))
-		printk(KERN_CRIT "Overheat alarm.(%d)\n", inb_p(WDT_RT));
-	if (!(status & WDC_SR_PSUOVER))
-		printk(KERN_CRIT "PSU over voltage.\n");
-	if (!(status & WDC_SR_PSUUNDR))
-		printk(KERN_CRIT "PSU under voltage.\n");
-	if (tachometer) {
-		if (!(status & WDC_SR_FANGOOD))
-			printk(KERN_CRIT "Possible fan fault.\n");
+	if (type == 501) {
+		wdt_decode_501(status);
+		if (tachometer) {
+			if (!(status & WDC_SR_FANGOOD))
+				printk(KERN_CRIT "Possible fan fault.\n");
+		}
 	}
-#endif /* CONFIG_WDT_501 */
 	if (!(status & WDC_SR_WCCR)) {
 #ifdef SOFTWARE_REBOOT
 #ifdef ONLY_TESTING
@@ -303,267 +306,31 @@ static irqreturn_t wdt_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-
-/**
- *	wdt_write:
- *	@file: file handle to the watchdog
- *	@buf: buffer to write (unused as data does not matter here
- *	@count: count of bytes
- *	@ppos: pointer to the position to write. No seeks allowed
- *
- *	A write to a watchdog device is defined as a keepalive signal. Any
- *	write of data will do, as we we don't define content meaning.
- */
-
-static ssize_t wdt_write(struct file *file, const char __user *buf,
-						size_t count, loff_t *ppos)
-{
-	if (count) {
-		if (!nowayout) {
-			size_t i;
-
-			/* In case it was set long ago */
-			expect_close = 0;
-
-			for (i = 0; i != count; i++) {
-				char c;
-				if (get_user(c, buf + i))
-					return -EFAULT;
-				if (c == 'V')
-					expect_close = 42;
-			}
-		}
-		wdt_ping();
-	}
-	return count;
-}
-
-/**
- *	wdt_ioctl:
- *	@file: file handle to the device
- *	@cmd: watchdog command
- *	@arg: argument pointer
- *
- *	The watchdog API defines a common set of functions for all watchdogs
- *	according to their available features. We only actually usefully support
- *	querying capabilities and current status.
- */
-
-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	int __user *p = argp;
-	int new_heartbeat;
-	int status;
-
-	static struct watchdog_info ident = {
-		.options =		WDIOF_SETTIMEOUT|
-					WDIOF_MAGICCLOSE|
-					WDIOF_KEEPALIVEPING,
-		.firmware_version =	1,
-		.identity =		"WDT500/501",
-	};
-
-	/* Add options according to the card we have */
-	ident.options |= (WDIOF_EXTERN1|WDIOF_EXTERN2);
-#ifdef CONFIG_WDT_501
-	ident.options |= (WDIOF_OVERHEAT|WDIOF_POWERUNDER|WDIOF_POWEROVER);
-	if (tachometer)
-		ident.options |= WDIOF_FANFAULT;
-#endif /* CONFIG_WDT_501 */
-
-	switch (cmd) {
-	default:
-		return -ENOTTY;
-	case WDIOC_GETSUPPORT:
-		return copy_to_user(argp, &ident, sizeof(ident)) ? -EFAULT : 0;
-	case WDIOC_GETSTATUS:
-		wdt_get_status(&status);
-		return put_user(status, p);
-	case WDIOC_GETBOOTSTATUS:
-		return put_user(0, p);
-	case WDIOC_KEEPALIVE:
-		wdt_ping();
-		return 0;
-	case WDIOC_SETTIMEOUT:
-		if (get_user(new_heartbeat, p))
-			return -EFAULT;
-		if (wdt_set_heartbeat(new_heartbeat))
-			return -EINVAL;
-		wdt_ping();
-		/* Fall */
-	case WDIOC_GETTIMEOUT:
-		return put_user(heartbeat, p);
-	}
-}
-
-/**
- *	wdt_open:
- *	@inode: inode of device
- *	@file: file handle to device
- *
- *	The watchdog device has been opened. The watchdog device is single
- *	open and on opening we load the counters. Counter zero is a 100Hz
- *	cascade, into counter 1 which downcounts to reboot. When the counter
- *	triggers counter 2 downcounts the length of the reset pulse which
- *	set set to be as long as possible.
- */
-
-static int wdt_open(struct inode *inode, struct file *file)
-{
-	if (test_and_set_bit(0, &wdt_is_open))
-		return -EBUSY;
-	/*
-	 *	Activate
-	 */
-	wdt_start();
-	return nonseekable_open(inode, file);
-}
-
-/**
- *	wdt_release:
- *	@inode: inode to board
- *	@file: file handle to board
- *
- *	The watchdog has a configurable API. There is a religious dispute
- *	between people who want their watchdog to be able to shut down and
- *	those who want to be sure if the watchdog manager dies the machine
- *	reboots. In the former case we disable the counters, in the latter
- *	case you have to open it again very soon.
- */
-
-static int wdt_release(struct inode *inode, struct file *file)
-{
-	if (expect_close == 42) {
-		wdt_stop();
-		clear_bit(0, &wdt_is_open);
-	} else {
-		printk(KERN_CRIT
-		 "wdt: WDT device closed unexpectedly.  WDT will not stop!\n");
-		wdt_ping();
-	}
-	expect_close = 0;
-	return 0;
-}
-
-#ifdef CONFIG_WDT_501
-/**
- *	wdt_temp_read:
- *	@file: file handle to the watchdog board
- *	@buf: buffer to write 1 byte into
- *	@count: length of buffer
- *	@ptr: offset (no seek allowed)
- *
- *	Temp_read reports the temperature in degrees Fahrenheit. The API is in
- *	farenheit. It was designed by an imperial measurement luddite.
- */
-
-static ssize_t wdt_temp_read(struct file *file, char __user *buf,
-						size_t count, loff_t *ptr)
-{
-	int temperature;
-
-	if (wdt_get_temperature(&temperature))
-		return -EFAULT;
-
-	if (copy_to_user(buf, &temperature, 1))
-		return -EFAULT;
-
-	return 1;
-}
-
-/**
- *	wdt_temp_open:
- *	@inode: inode of device
- *	@file: file handle to device
- *
- *	The temperature device has been opened.
- */
-
-static int wdt_temp_open(struct inode *inode, struct file *file)
-{
-	return nonseekable_open(inode, file);
-}
-
-/**
- *	wdt_temp_release:
- *	@inode: inode to board
- *	@file: file handle to board
- *
- *	The temperature device has been closed.
- */
-
-static int wdt_temp_release(struct inode *inode, struct file *file)
-{
-	return 0;
-}
-#endif /* CONFIG_WDT_501 */
-
-/**
- *	notify_sys:
- *	@this: our notifier block
- *	@code: the event being reported
- *	@unused: unused
- *
- *	Our notifier is called on system shutdowns. We want to turn the card
- *	off at reboot otherwise the machine will reboot again during memory
- *	test or worse yet during the following fsck. This would suck, in fact
- *	trust me - if it happens it does suck.
- */
-
-static int wdt_notify_sys(struct notifier_block *this, unsigned long code,
-	void *unused)
-{
-	if (code == SYS_DOWN || code == SYS_HALT)
-		wdt_stop();
-	return NOTIFY_DONE;
-}
-
-/*
- *	Kernel Interfaces
- */
-
-
-static const struct file_operations wdt_fops = {
-	.owner		= THIS_MODULE,
-	.llseek		= no_llseek,
-	.write		= wdt_write,
-	.unlocked_ioctl	= wdt_ioctl,
-	.open		= wdt_open,
-	.release	= wdt_release,
-};
-
-static struct miscdevice wdt_miscdev = {
-	.minor	= WATCHDOG_MINOR,
-	.name	= "watchdog",
-	.fops	= &wdt_fops,
-};
-
-#ifdef CONFIG_WDT_501
-static const struct file_operations wdt_temp_fops = {
-	.owner		= THIS_MODULE,
-	.llseek		= no_llseek,
-	.read		= wdt_temp_read,
-	.open		= wdt_temp_open,
-	.release	= wdt_temp_release,
+static struct watchdog_ops wdt_ops = {
+	.start	=	wdt_start,
+	.stop	=	wdt_stop,
+	.reboot =	wdt_stop,
+	.ping	=	wdt_ping,
+	.status = 	wdt_get_status,
+	.temperature =	wdt_get_temperature,
+	.set_timeout =  wdt_set_heartbeat
 };
 
-static struct miscdevice temp_miscdev = {
-	.minor	= TEMP_MINOR,
-	.name	= "temperature",
-	.fops	= &wdt_temp_fops,
+static struct watchdog_info wdt_ident = {
+	.options =		WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE |
+				WDIOF_KEEPALIVEPING | WDIOF_EXTERN1 |
+					WDIOF_EXTERN2,
+	.firmware_version =	1,
+	.identity =		"WDT500/501",
 };
-#endif /* CONFIG_WDT_501 */
-
-/*
- *	The WDT card needs to learn about soft shutdowns in order to
- *	turn the timebomb registers off.
- */
 
-static struct notifier_block wdt_notifier = {
-	.notifier_call = wdt_notify_sys,
+static struct watchdog wdt_dog = {
+	.name =		"wdt",
+	.ops =		&wdt_ops,
+	.info =		&wdt_ident,
+	.owner =	THIS_MODULE,
 };
-
+	
 /**
  *	cleanup_module:
  *
@@ -576,12 +343,8 @@ static struct notifier_block wdt_notifier = {
 
 static void __exit wdt_exit(void)
 {
-	misc_deregister(&wdt_miscdev);
-#ifdef CONFIG_WDT_501
-	misc_deregister(&temp_miscdev);
-#endif /* CONFIG_WDT_501 */
-	unregister_reboot_notifier(&wdt_notifier);
-	free_irq(irq, NULL);
+	watchdog_unregister(&wdt_dog);
+	free_irq(irq, &wdt_dog);
 	release_region(io, 8);
 }
 
@@ -597,14 +360,24 @@ static int __init wdt_init(void)
 {
 	int ret;
 
+	if (type != 500 && type != 501) {
+		printk(KERN_ERR "wdt: unknown card type '%d'.\n", type);
+		return -ENODEV;
+	}
+	if (type == 501)
+		wdt_ident.options |= (WDIOF_OVERHEAT | WDIOF_POWERUNDER
+						|WDIOF_POWEROVER);
+	else
+		wdt_ops.temperature = NULL;
+	if (tachometer)
+		wdt_ident.options |= WDIOF_FANFAULT;
 	/* Check that the heartbeat value is within it's range;
 	   if not reset to the default */
-	if (wdt_set_heartbeat(heartbeat)) {
-		wdt_set_heartbeat(WD_TIMO);
+	if (wdt_set_heartbeat(&wdt_dog, heartbeat)) {
+		wdt_set_heartbeat(&wdt_dog, WD_TIMO);
 		printk(KERN_INFO "wdt: heartbeat value must be 0 < heartbeat < 65536, using %d\n",
 			WD_TIMO);
 	}
-
 	if (!request_region(io, 8, "wdt501p")) {
 		printk(KERN_ERR
 			"wdt: I/O address 0x%04x already in use\n", io);
@@ -612,59 +385,27 @@ static int __init wdt_init(void)
 		goto out;
 	}
 
-	ret = request_irq(irq, wdt_interrupt, IRQF_DISABLED, "wdt501p", NULL);
+	ret = request_irq(irq, wdt_interrupt, IRQF_DISABLED,
+							"wdt501p", &wdt_dog);
 	if (ret) {
 		printk(KERN_ERR "wdt: IRQ %d is not free.\n", irq);
 		goto outreg;
 	}
 
-	ret = register_reboot_notifier(&wdt_notifier);
-	if (ret) {
-		printk(KERN_ERR
-		      "wdt: cannot register reboot notifier (err=%d)\n", ret);
-		goto outirq;
-	}
-
-#ifdef CONFIG_WDT_501
-	ret = misc_register(&temp_miscdev);
-	if (ret) {
-		printk(KERN_ERR
-			"wdt: cannot register miscdev on minor=%d (err=%d)\n",
-							TEMP_MINOR, ret);
-		goto outrbt;
-	}
-#endif /* CONFIG_WDT_501 */
-
-	ret = misc_register(&wdt_miscdev);
-	if (ret) {
-		printk(KERN_ERR
-			"wdt: cannot register miscdev on minor=%d (err=%d)\n",
-							WATCHDOG_MINOR, ret);
-		goto outmisc;
+	ret = watchdog_register(&wdt_dog, nowayout);
+	
+	if (ret == 0) {
+		printk(KERN_INFO "WDT500/501-P driver 0.10 at 0x%04x (Interrupt %d). heartbeat=%d sec (nowayout=%d)\n",
+			io, irq, heartbeat, nowayout);
+		printk(KERN_INFO "wdt: Fan Tachometer is %s\n",
+					(tachometer ? "Enabled" : "Disabled"));
+		return 0;
 	}
-
-	ret = 0;
-	printk(KERN_INFO "WDT500/501-P driver 0.10 at 0x%04x (Interrupt %d). heartbeat=%d sec (nowayout=%d)\n",
-		io, irq, heartbeat, nowayout);
-#ifdef CONFIG_WDT_501
-	printk(KERN_INFO "wdt: Fan Tachometer is %s\n",
-				(tachometer ? "Enabled" : "Disabled"));
-#endif /* CONFIG_WDT_501 */
-
-out:
-	return ret;
-
-outmisc:
-#ifdef CONFIG_WDT_501
-	misc_deregister(&temp_miscdev);
-outrbt:
-#endif /* CONFIG_WDT_501 */
-	unregister_reboot_notifier(&wdt_notifier);
-outirq:
 	free_irq(irq, NULL);
 outreg:
 	release_region(io, 8);
-	goto out;
+out:
+	return ret;
 }
 
 module_init(wdt_init);


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH, RFC] char dev BKL pushdown
  2008-05-20 17:21               ` Arnd Bergmann
@ 2008-05-20 18:51                 ` Alan Cox
  0 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-20 18:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alexander Viro, linux-kernel,
	Wim Van Sebroeck

> Right, unless Alan or Wim are confident enough that removing the
> BKL won't break the drivers (more than they are today).
> Almost all of the open functions go along the lines of
> 
> int open(struct file *f, struct inode *i)
> {
> 	if (wd_is_open)
> 		return -EBUSY;
> 	wd_is_open = 1;
> 	
> 	start_wd();
> 
> 	return nonseekable_open(f, i);
> }
> 
> nonseekable_open doesn't need the BKL by itself, and the wd_is_open
> variable is protected by the misc_mtx mutex.
> I can't see any scenario in which start_wd() would need the BKL, or

You need to review the use of misc_register(). Which is what I did
already and sorted out for each watchdog - the job is done and completed
and the various problem cases fixed. Watchdog has already been made BKL
removal safe in the patch series I sent.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20 18:31                       ` Alan Cox
@ 2008-05-20 21:00                         ` Arnd Bergmann
  2008-05-22  9:34                           ` Alan Cox
  0 siblings, 1 reply; 78+ messages in thread
From: Arnd Bergmann @ 2008-05-20 21:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: Wim Van Sebroeck, Christoph Hellwig, Jonathan Corbet,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

On Tuesday 20 May 2008, Alan Cox wrote:
> Current patch (& example conversions) below.

Very nice code!

> +       if (w->ops->ioctl) {
> +               r = w->ops->ioctl(w, cmd, arg);
> +               if (r != -ENOIOCTLCMD)
> +                       return r;
> +       }

Are you planning this as a transitional method for
converting drivers, or are you aware of any driver that
actually needs its own ioctl method?

> +static const struct file_operations watchdog_fops = {
> +       .owner          = THIS_MODULE,
> +       .llseek         = no_llseek,
> +       .write          = watchdog_write,
> +       .unlocked_ioctl = watchdog_ioctl,
> +       .open           = watchdog_open,
> +       .release        = watchdog_release,
> +};

All the ioctl numbers are compatible, so it would be good
to register the watchdog ioctl function as compat_ioctl
as well. Once all drivers are using the common abstraction,
we can also kill their COMPATIBLE_IOCTL() entries in
fs/compat_ioctl.c.

> --- /dev/null
> +++ b/drivers/watchdog/watchdog.h

There are a few watchdog drivers living outside of drivers/watchdog/,
I could find:

* arch/um/drivers/harddog_kern.c
* drivers/char/ipmi/ipmi_watchdog.c
* drivers/rtc/rtc-m41t80.c
* drivers/s390/char/vmwatchdog.c
* drivers/sbus/char/cpwatchdog.c
* drivers/sbus/char/riowatchdog.c

In order to conver those to the new model, you either have to
move them to the right place, or move the new declarations to
include/linux/watchdog.h.

	Arnd <><

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-19 23:26             ` [PATCH 1/3, RFC] misc char " Arnd Bergmann
  2008-05-20  0:07               ` Mike Frysinger
  2008-05-20  8:46               ` Alan Cox
@ 2008-05-20 23:01               ` Mike Frysinger
  2008-05-20 23:25                 ` Jonathan Corbet
  2 siblings, 1 reply; 78+ messages in thread
From: Mike Frysinger @ 2008-05-20 23:01 UTC (permalink / raw)
  To: Arnd Bergmann, Wu, Bryan
  Cc: Jonathan Corbet, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Thomas Gleixner, Alan Cox, Alexander Viro,
	linux-kernel

On Mon, May 19, 2008 at 7:26 PM, Arnd Bergmann wrote:
> The Big Kernel Lock has been pushed down from chardev_open
> to misc_open, this change moves it to the individual misc
> driver open functions.
>
> As before, the change was purely mechanical, most drivers
> should actually not need the BKL. In particular, we still
> hold the misc_mtx() while calling the open() function
> The patch should probably be split into one changeset
> per driver.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> ---
> Index: linux-2.6/arch/blackfin/mach-bf561/coreb.c
> ===================================================================
> --- linux-2.6.orig/arch/blackfin/mach-bf561/coreb.c
> +++ linux-2.6/arch/blackfin/mach-bf561/coreb.c
> @@ -32,6 +32,7 @@
>  #include <linux/device.h>
>  #include <linux/ioport.h>
>  #include <linux/module.h>
> +#include <linux/smp_lock.h>
>  #include <linux/uaccess.h>
>  #include <linux/fs.h>
>  #include <asm/dma.h>
> @@ -196,6 +197,7 @@ static loff_t coreb_lseek(struct file *f
>
>  static int coreb_open(struct inode *inode, struct file *file)
>  {
> +       lock_kernel();
>        spin_lock_irq(&coreb_lock);
>
>        if (coreb_status & COREB_IS_OPEN)
> @@ -204,10 +206,12 @@ static int coreb_open(struct inode *inod
>        coreb_status |= COREB_IS_OPEN;
>
>        spin_unlock_irq(&coreb_lock);
> +       unlock_kernel();
>        return 0;
>
>  out_busy:
>        spin_unlock_irq(&coreb_lock);
> +       unlock_kernel();
>        return -EBUSY;
>  }

please drop the coreb.c changes from your patch
-mike

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-20 23:01               ` Mike Frysinger
@ 2008-05-20 23:25                 ` Jonathan Corbet
  2008-05-21 16:22                   ` Mike Frysinger
  0 siblings, 1 reply; 78+ messages in thread
From: Jonathan Corbet @ 2008-05-20 23:25 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Arnd Bergmann, Wu, Bryan, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alan Cox,
	Alexander Viro, linux-kernel

Mike Frysinger <vapier.adi@gmail.com> wrote:

> please drop the coreb.c changes from your patch

At a minimum, I would hope such a request would say something like "I've
looked at the driver's locking and am convinced that the BKL is not
needed."  Have you done that?  There is a certain leap of faith involved
in removing that protection from a driver.

I decided to take a quick look...

- You use spin_lock_irq(&coreb_lock) in a number of places, but you do
  not take the lock in the interrupt handler.  You also do not take the
  lock in coreb_write() or coreb_read(), so those can race with the
  interrupt handler, with ioctl(), and with each other.

- coreb_write() and coreb_read() do interruptible waits, but do not
  check to see whether they were interrupted.  They will, in fact,
  continue in their I/O loops after a signal.

- In both functions you have:

	unsigned long p = *ppos;

	if (p + count > coreb_size)
		return -EFAULT;

  that calculation can overflow.

- You also do this:

  static ssize_t coreb_write(struct file *file, const char *buf, size_t count,
	 		     loff_t * ppos)
  /* ... */
  		set_dma_start_addr(CH_MEM_STREAM2_SRC, (unsigned long)buf);

  In other words, the DMA is done directly to/from a user-space
  address.  Maybe that's safe on Blackfin, I don't know...

- I have no idea why some of your functions are using d_inode->i_mutex.

- In coreb_ioctl():

		spin_lock_irq(&coreb_lock);
		if (coreb_status & COREB_IS_RUNNING) {
			retval = -EBUSY;
			break;
		}

  this will exit the function with the spinlock still held and
  interrupts disabled.

	case CMD_COREB_RESET:
		printk(KERN_INFO "Resetting Core B\n");
		bfin_write_SICB_SYSCR(bfin_read_SICB_SYSCR() | 0x0080);
		break;

  You do not acquire the lock here, so this can race against other
  ioctl() calls.  And ioctl() can race against read() and write().

Registration and such seem reasonable, so I can't come up with a
scenario where loss of BKL protection will create trouble.  Given the
other problems there, though, I'll confess to being a bit nervous about
it.

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/3, RFC] misc char dev BKL pushdown
  2008-05-20 23:25                 ` Jonathan Corbet
@ 2008-05-21 16:22                   ` Mike Frysinger
  0 siblings, 0 replies; 78+ messages in thread
From: Mike Frysinger @ 2008-05-21 16:22 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Arnd Bergmann, Wu, Bryan, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Thomas Gleixner, Alan Cox,
	Alexander Viro, linux-kernel

On Tue, May 20, 2008 at 7:25 PM, Jonathan Corbet wrote:
> Mike Frysinger wrote:
>> please drop the coreb.c changes from your patch
>
> At a minimum, I would hope such a request would say something like "I've
> looked at the driver's locking and am convinced that the BKL is not
> needed."  Have you done that?  There is a certain leap of faith involved
> in removing that protection from a driver.
>
> I decided to take a quick look...
>
> - You use spin_lock_irq(&coreb_lock) in a number of places, but you do
>  not take the lock in the interrupt handler.  You also do not take the
>  lock in coreb_write() or coreb_read(), so those can race with the
>  interrupt handler, with ioctl(), and with each other.

the lock is to protect one thing: coreb_status.  we lock around any
access to it, so it not being grabbed in the irq handler or any other
function where coreb_status is not utilized is irrelevant.  that means
the BKL is not needed in the driver.

the rest of your comments are more or less on target, but again
irrelevant to the topic of the BKL.  i'll keep them in mind when i
rewrite the driver, thanks.
-mike

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/3, RFC] watchdog dev BKL pushdown
  2008-05-20 21:00                         ` Arnd Bergmann
@ 2008-05-22  9:34                           ` Alan Cox
  0 siblings, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-22  9:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Wim Van Sebroeck, Christoph Hellwig, Jonathan Corbet,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

> Are you planning this as a transitional method for
> converting drivers, or are you aware of any driver that
> actually needs its own ioctl method?

I added it "in case" and to allow for special cases later. We may not
need it for any existing devices.

> All the ioctl numbers are compatible, so it would be good
> to register the watchdog ioctl function as compat_ioctl
> as well. Once all drivers are using the common abstraction,
> we can also kill their COMPATIBLE_IOCTL() entries in
> fs/compat_ioctl.c.

Good point. Wim will no doubt comment on all this once he has finished
his more pressing jobs.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [announce] "kill the Big Kernel Lock (BKL)" tree
  2008-05-14 22:07     ` Jonathan Corbet
  2008-05-14 22:14       ` Linus Torvalds
@ 2008-05-22 20:20       ` Alan Cox
  1 sibling, 0 replies; 78+ messages in thread
From: Alan Cox @ 2008-05-22 20:20 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Alexander Viro, linux-kernel

> This is all certainly doable, but it leaves me with one concern: there
> will be no signal to external module maintainers that the change needs
> to be made.  So, beyond doubt, quite a few of them will just continue to
> be shipped unfixed - and they will still run.  If any of them actually
> *need* the BKL, something awful may happen to somebody someday.

I now have a large patch and my full x86-32 build tree building without
->ioctl() in file_operations. Its a 350K patch and took all day so I'll
begin splitting it out and sending chunks to tree maintainers.

Alan

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2008-05-22 20:33 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-14 17:49 [announce] "kill the Big Kernel Lock (BKL)" tree Ingo Molnar
2008-05-14 18:30 ` Andi Kleen
2008-05-14 21:00   ` Alan Cox
2008-05-14 21:13     ` Andi Kleen
2008-05-14 21:16       ` H. Peter Anvin
2008-05-14 21:17         ` Alan Cox
2008-05-14 21:19       ` Alan Cox
2008-05-14 21:45         ` Linus Torvalds
2008-05-14 22:03           ` Andi Kleen
2008-05-15 13:34             ` Alan Cox
2008-05-15 14:27               ` Andi Kleen
2008-05-15 15:36                 ` Alan Cox
2008-05-16 10:21                   ` Andi Kleen
2008-05-15  8:02           ` Ingo Molnar
2008-05-14 18:41 ` Linus Torvalds
2008-05-14 19:41   ` Ingo Molnar
2008-05-14 20:05     ` Frederik Deweerdt
2008-05-14 21:45 ` Jonathan Corbet
2008-05-14 21:39   ` Alan Cox
2008-05-14 21:56   ` Linus Torvalds
2008-05-14 22:07     ` Jonathan Corbet
2008-05-14 22:14       ` Linus Torvalds
2008-05-22 20:20       ` Alan Cox
2008-05-16 15:44     ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
2008-05-16 15:49       ` Christoph Hellwig
2008-05-16 16:03         ` [PATCH] kill empty chardev open/release methods Christoph Hellwig
2008-05-16 16:24           ` Alan Cox
2008-05-16 20:55           ` Alan Cox
2008-05-18 19:46             ` Jonathan Corbet
2008-05-18 19:58               ` Alan Cox
2008-05-16 16:22       ` [PATCH, RFC] char dev BKL pushdown Alan Cox
2008-05-16 16:30       ` Linus Torvalds
2008-05-16 16:43         ` Jonathan Corbet
2008-05-17 21:15       ` Arnd Bergmann
2008-05-18 20:26         ` Jonathan Corbet
2008-05-19 23:07           ` Arnd Bergmann
     [not found]             ` <200805200111.47275.arnd@arndb.de>
2008-05-19 23:14               ` [PATCH 2/3, RFC] watchdog " Arnd Bergmann
2008-05-20  6:20                 ` Christoph Hellwig
2008-05-20  8:30                   ` Arnd Bergmann
2008-05-20 15:47                     ` Wim Van Sebroeck
2008-05-20 18:31                       ` Alan Cox
2008-05-20 21:00                         ` Arnd Bergmann
2008-05-22  9:34                           ` Alan Cox
2008-05-20  9:08                   ` Alan Cox
2008-05-20  8:42                 ` Alan Cox
2008-05-19 23:26             ` [PATCH 1/3, RFC] misc char " Arnd Bergmann
2008-05-20  0:07               ` Mike Frysinger
2008-05-20  0:21                 ` Jonathan Corbet
2008-05-20  0:46                   ` Mike Frysinger
2008-05-20  8:46               ` Alan Cox
2008-05-20 23:01               ` Mike Frysinger
2008-05-20 23:25                 ` Jonathan Corbet
2008-05-21 16:22                   ` Mike Frysinger
2008-05-19 23:34             ` [PATCH 3/3, RFC] remove BKL from misc_open() Arnd Bergmann
2008-05-20 15:13             ` [PATCH, RFC] char dev BKL pushdown Jonathan Corbet
2008-05-20 17:21               ` Arnd Bergmann
2008-05-20 18:51                 ` Alan Cox
2008-05-17 21:58       ` Linus Torvalds
2008-05-18 20:07         ` Jonathan Corbet
2008-05-14 22:11   ` [announce] "kill the Big Kernel Lock (BKL)" tree Andi Kleen
2008-05-14 22:16     ` Linus Torvalds
2008-05-14 22:21       ` Andi Kleen
2008-05-15 13:30         ` Alan Cox
2008-05-15 15:05         ` John Stoffel
2008-05-15 15:10           ` Andi Kleen
2008-05-15 15:18             ` John Stoffel
2008-05-15 15:45               ` Andi Kleen
2008-05-15  8:44   ` Jan Engelhardt
2008-05-15 14:54     ` Diego Calleja
2008-05-14 21:46 ` Alan Cox
2008-05-14 22:11   ` Linus Torvalds
2008-05-14 22:15   ` Andi Kleen
2008-05-15 17:41 ` Linus Torvalds
2008-05-15 20:27   ` Arjan van de Ven
2008-05-15 20:45     ` Peter Zijlstra
2008-05-15 21:22       ` Arjan van de Ven
2008-05-17  0:14 ` Kevin Winchester
2008-05-17  0:37   ` Kevin Winchester

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).