linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
@ 2005-02-04 10:03 Ingo Molnar
  2005-02-04 15:19 ` Kevin Hilman
                   ` (6 more replies)
  0 siblings, 7 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-04 10:03 UTC (permalink / raw)
  To: linux-kernel


i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
downloaded from the usual place:

  http://redhat.com/~mingo/realtime-preempt/

Changes since -37-03:

 - merged to 2.6.11-rc3

 - deadlock-tracer fix from Eugeny S. Mints

 - converted an oprofile spinlock to raw, which should fix the bug 
   reported by Peter Zijlstra.

to create a -V0.7.38-01 tree from scratch, the patching order is:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.bz2
  http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.11-rc3.bz2
  http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-rc3-V0.7.38-01

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
@ 2005-02-04 15:19 ` Kevin Hilman
  2005-02-04 17:30   ` Ingo Molnar
  2005-02-04 18:19 ` Tom Rini
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Kevin Hilman @ 2005-02-04 15:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

What is the proper way to setup a real counting semaphore under the
-RT kernel?

I've noticed that just using a struct semaphore, normal counting
semaphore usage[*] can trigger the "lock recursion deadlock" in
kernel/rt.c since 'struct semaphore' now uses an rt_mutex.  

What I've done for now is to use sema_init_nocheck() to disable the
checking in the case of a counting semaphore, but I remember seeing
discussion in an earlier thread about creating a separate counting
semaphore type.  Is this still planned?

Kevin
http://hilman.org/kevin/

[*] For example, an open semaphore being down'ed and thus acquired and
the same thread doing a down() again before another thread has a
chance to up() the semaphore.  


Ingo Molnar <mingo@elte.hu> writes:

> i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
> downloaded from the usual place:
> 
>   http://redhat.com/~mingo/realtime-preempt/
> 
> Changes since -37-03:
> 
>  - merged to 2.6.11-rc3
> 
>  - deadlock-tracer fix from Eugeny S. Mints
> 
>  - converted an oprofile spinlock to raw, which should fix the bug 
>    reported by Peter Zijlstra.
> 
> to create a -V0.7.38-01 tree from scratch, the patching order is:
> 
>   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.10.tar.bz2
>   http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.11-rc3.bz2
>   http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-rc3-V0.7.38-01

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 15:19 ` Kevin Hilman
@ 2005-02-04 17:30   ` Ingo Molnar
  0 siblings, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-04 17:30 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linux-kernel, Thomas Gleixner


* Kevin Hilman <kevin@hilman.org> wrote:

> What I've done for now is to use sema_init_nocheck() to disable the
> checking in the case of a counting semaphore, but I remember seeing
> discussion in an earlier thread about creating a separate counting
> semaphore type.  Is this still planned?

the nocheck variant is the counting semaphore in essence. I removed the
counting semaphore implementation because it caused more problems than
it solved - but it can be reintroduced later.

> [*] For example, an open semaphore being down'ed and thus acquired and
> the same thread doing a down() again before another thread has a
> chance to up() the semaphore. 

yeah, these are cases where the code is better off using completions
anyway. Thomas Gleixner had a good bunch of patches to convers such
semaphore use to completions - the most necessary ones are in -RT, and i
hope he'll submit the whole bunch upstream after 2.6.11 is out :-)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
  2005-02-04 15:19 ` Kevin Hilman
@ 2005-02-04 18:19 ` Tom Rini
  2005-02-07  9:03   ` Ingo Molnar
  2005-02-06  4:19 ` Valdis.Kletnieks
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Tom Rini @ 2005-02-04 18:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote:
> 
> i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
> downloaded from the usual place:
> 
>   http://redhat.com/~mingo/realtime-preempt/

I thought I saw you say x64 should be OK now a few releases ago, so:
linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function)
linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant
linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value')
linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict
make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1
make[1]: *** [arch/x86_64/kernel] Error 2
make: *** [_all] Error 2

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
  2005-02-04 15:19 ` Kevin Hilman
  2005-02-04 18:19 ` Tom Rini
@ 2005-02-06  4:19 ` Valdis.Kletnieks
  2005-02-07  9:21   ` Ingo Molnar
  2005-02-08  7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Valdis.Kletnieks @ 2005-02-06  4:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]

On Fri, 04 Feb 2005 11:03:47 +0100, Ingo Molnar said:
> 
> i have released the -V0.7.38-01 Real-Time Preemption patch, which can be

Building with:

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT_DESKTOP=y
# CONFIG_PREEMPT_RT is not set

  CC      kernel/sched.o
kernel/sched.c:314:1: warning: "_finish_arch_switch" redefined
kernel/sched.c:306:1: warning: this is the location of the previous definition

caused by this part of the patch:

@@ -288,12 +295,20 @@ static DEFINE_PER_CPU(struct runqueue, r
 #define task_rq(p)             cpu_rq(task_cpu(p))
 #define cpu_curr(cpu)          (cpu_rq(cpu)->curr)
 
+#ifdef CONFIG_PREEMPT_RT
+# ifdef prepare_arch_switch
+#   error FIXME
+# endif        
+#else  
+# define _finish_arch_switch finish_arch_switch
+#endif 
+       
 /*     
  * Default context-switch locking:
  */    
 #ifndef prepare_arch_switch
 # define prepare_arch_switch(rq, next) do { } while (0)
-# define finish_arch_switch(rq, next)  spin_unlock_irq(&(rq)->lock)
+# define _finish_arch_switch(rq, next) spin_unlock(&(rq)->lock)
 # define task_running(rq, p)           ((rq)->curr == (p))
 #endif
  

What was intended for non-RT builds?

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 18:19 ` Tom Rini
@ 2005-02-07  9:03   ` Ingo Molnar
  2005-02-07 14:35     ` Tom Rini
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-07  9:03 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-kernel


* Tom Rini <trini@kernel.crashing.org> wrote:

> On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote:
> > 
> > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
> > downloaded from the usual place:
> > 
> >   http://redhat.com/~mingo/realtime-preempt/
> 
> I thought I saw you say x64 should be OK now a few releases ago, so:
> linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function)
> linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant
> linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value')
> linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict
> make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1
> make[1]: *** [arch/x86_64/kernel] Error 2
> make: *** [_all] Error 2

please send me your .config - mine builds/boots/works fine.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-06  4:19 ` Valdis.Kletnieks
@ 2005-02-07  9:21   ` Ingo Molnar
  2005-02-07 15:08     ` Real-Time Preemption and UML? Esben Nielsen
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-07  9:21 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel


* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:

> Building with:
> 
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT_DESKTOP=y
> # CONFIG_PREEMPT_RT is not set
> 
>   CC      kernel/sched.o
> kernel/sched.c:314:1: warning: "_finish_arch_switch" redefined
> kernel/sched.c:306:1: warning: this is the location of the previous definition

ok, i fixed this in the -03 patch.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-07  9:03   ` Ingo Molnar
@ 2005-02-07 14:35     ` Tom Rini
  2005-02-08  8:27       ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Tom Rini @ 2005-02-07 14:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Mon, Feb 07, 2005 at 10:03:56AM +0100, Ingo Molnar wrote:
> 
> * Tom Rini <trini@kernel.crashing.org> wrote:
> 
> > On Fri, Feb 04, 2005 at 11:03:47AM +0100, Ingo Molnar wrote:
> > > 
> > > i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
> > > downloaded from the usual place:
> > > 
> > >   http://redhat.com/~mingo/realtime-preempt/
> > 
> > I thought I saw you say x64 should be OK now a few releases ago, so:
> > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: `_atomic_dec_and_lock' undeclared here (not in a function)
> > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: initializer element is not constant
> > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: (near initialization for `__ksymtab__atomic_dec_and_lock.value')
> > linux-2.6.11-rc3/arch/x86_64/kernel/x8664_ksyms.c:197: error: __ksymtab__atomic_dec_and_lock causes a section type conflict
> > make[2]: *** [arch/x86_64/kernel/x8664_ksyms.o] Error 1
> > make[1]: *** [arch/x86_64/kernel] Error 2
> > make: *** [_all] Error 2
> 
> please send me your .config - mine builds/boots/works fine.

I don't have it handy anymore, but I just cp'd arch/x86_64/defconfig to
.config, ran oldconfig and turned RT off (PREEMPT_NONE=y) (oops, I did
forget to mention that, didn't I?  Sorry).

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Real-Time Preemption and UML?
  2005-02-07  9:21   ` Ingo Molnar
@ 2005-02-07 15:08     ` Esben Nielsen
  2005-02-07 18:35       ` Jeff Dike
  0 siblings, 1 reply; 125+ messages in thread
From: Esben Nielsen @ 2005-02-07 15:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I
managed to get it to compile but it wont start - it simply stops somewhere
in start_kernel() :-(

Have anyone else looked at it?

It doesn't sound like it makes much sense to have PREEMPT_REALTIME for UML
but I thought it was a good developing platform for playing around
before going to the real hardware, where the latency meassurements
of course have to take place. The turn around time should be much shorter
than rebooting a full PC every time and the possibility of getting debug
output in the beginning should also be much better.

Esben


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-07 15:08     ` Real-Time Preemption and UML? Esben Nielsen
@ 2005-02-07 18:35       ` Jeff Dike
  2005-02-07 23:14         ` Esben Nielsen
  0 siblings, 1 reply; 125+ messages in thread
From: Jeff Dike @ 2005-02-07 18:35 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel

simlo@phys.au.dk said:
> Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I
> managed to get it to compile but it wont start - it simply stops
> somewhere in start_kernel() :-( 

I've never played with preemption on UML.  No doubt it needs some work...

				Jeff


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-07 18:35       ` Jeff Dike
@ 2005-02-07 23:14         ` Esben Nielsen
  2005-02-08  8:39           ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Esben Nielsen @ 2005-02-07 23:14 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Ingo Molnar, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1276 bytes --]

Well, I keep trying a little bit more. In the mean while you can get some
of the stuff I needed to change to at least get it to compile:

One of the problems was use of direct architecture specific semaphores
(which doesn't work under PREEMPT_REALTIME) and in places where a quick
(maybe too quick) look at the code told me that completions ought to be
used. Therefore I changed two semaphores to completions which compiled
fine. I have tried the change on 2.6.11-rc2, and it seemed to work, but I
have not tested it heavily.

The patch is in an attachment - I hope the mail-list will alow that. It is
simply too trouplesome otherwise when I am using Pine as mail client.

Esben


On Mon, 7 Feb 2005, Jeff Dike wrote:

> simlo@phys.au.dk said:
> > Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I
> > managed to get it to compile but it wont start - it simply stops
> > somewhere in start_kernel() :-( 
> 
> I've never played with preemption on UML.  No doubt it needs some work...
> 
> 				Jeff
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2383 bytes --]

--- linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c.orig	2005-01-23 15:53:29.000000000 +0100
+++ linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c	2005-02-06 19:54:52.000000000 +0100
@@ -23,7 +23,7 @@
 struct port_list {
 	struct list_head list;
 	int has_connection;
-	struct semaphore sem;
+	struct completion done;
 	int port;
 	int fd;
 	spinlock_t lock;
@@ -66,7 +66,7 @@
 	conn->fd = fd;
 	list_add(&conn->list, &conn->port->connections);
 
-	up(&conn->port->sem);
+	complete(&conn->port->done);
 	return(IRQ_HANDLED);
 }
 
@@ -183,13 +183,14 @@
 	*port = ((struct port_list) 
 		{ .list 	 	= LIST_HEAD_INIT(port->list),
 		  .has_connection 	= 0,
-		  .sem 			= __SEMAPHORE_INITIALIZER(port->sem, 
-								  0),
 		  .lock 		= SPIN_LOCK_UNLOCKED,
 		  .port 	 	= port_num,
 		  .fd  			= fd,
 		  .pending 		= LIST_HEAD_INIT(port->pending),
 		  .connections 		= LIST_HEAD_INIT(port->connections) });
+
+	init_completion(&port->done), 
+
 	list_add(&port->list, &ports);
 
  found:
@@ -221,7 +222,7 @@
 	int fd;
 
 	while(1){
-		if(down_interruptible(&port->sem))
+		if(wait_for_completion_interruptible(&port->done))
 			return(-ERESTARTSYS);
 
 		spin_lock(&port->lock);
--- linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c.orig	2005-01-23 15:53:29.000000000 +0100
+++ linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c	2005-02-06 19:54:58.000000000 +0100
@@ -16,7 +16,7 @@
 #include "xterm.h"
 
 struct xterm_wait {
-	struct semaphore sem;
+	struct completion ready;
 	int fd;
 	int pid;
 	int new_fd;
@@ -32,7 +32,7 @@
 		return(IRQ_NONE);
 
 	xterm->new_fd = fd;
-	up(&xterm->sem);
+	complete(&xterm->ready);
 	return(IRQ_HANDLED);
 }
 
@@ -49,10 +49,10 @@
 
 	/* This is a locked semaphore... */
 	*data = ((struct xterm_wait) 
-		{ .sem  	= __SEMAPHORE_INITIALIZER(data->sem, 0),
-		  .fd 		= socket,
+		{ .fd 		= socket,
 		  .pid 		= -1,
 		  .new_fd 	= -1 });
+	init_completion(&data->ready);
 
 	err = um_request_irq(XTERM_IRQ, socket, IRQ_READ, xterm_interrupt, 
 			     SA_INTERRUPT | SA_SHIRQ | SA_SAMPLE_RANDOM, 
@@ -68,7 +68,7 @@
 	 *
 	 * XXX Note, if the xterm doesn't work for some reason (eg. DISPLAY
 	 * isn't set) this will hang... */
-	down(&data->sem);
+	wait_for_completion(&data->ready);
 
 	free_irq_by_irq_and_dev(XTERM_IRQ, data);
 	free_irq(XTERM_IRQ, data);

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
                   ` (2 preceding siblings ...)
  2005-02-06  4:19 ` Valdis.Kletnieks
@ 2005-02-08  7:55 ` Valdis.Kletnieks
  2005-02-08  8:45   ` Ingo Molnar
  2005-02-08 21:58 ` William Weston
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Valdis.Kletnieks @ 2005-02-08  7:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

On Fri, 04 Feb 2005 11:03:47 +0100, Ingo Molnar said:
> 
> i have released the -V0.7.38-01 Real-Time Preemption patch, which can be
> downloaded from the usual place:

Hey Ingo.. Sorry to keep breaking stuff on you, but.. ;)

Summary: Looks like CONFIG_NET_PKTGEN=y gives -V0.7.38-03 indigestion.

I retrofitted 0.7.38-03 onto -rc3-mm1, and at boot it wedged up hard scrolling
an error message.  Looked like a 'scheduling while atomic' error coming from
net/pktgen.o.   Sorry for the incomplete traceback, but it locked before
userspace came up, and I don't have hardware handy for a serial console..

I found a CONFIG_NET_PKTGEN=Y in the config, rebuilt with =n, and the resulting
kernel boots fine (am using it as I type). Vanilla -rc3-mm1 also boots fine
with the PTKGEN=y setting (as did 2.6.10-mm1-V0.7.34-01, the last -mm I built
with a -RT patch).  I haven't tried a vanilla -rc3-V0.7.38-03, but I don't see
anyplace -mm1 hits pktgen.c

If the above isn't enough to track down the issue, feel free to let me know
what you'd like me to try next.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-07 14:35     ` Tom Rini
@ 2005-02-08  8:27       ` Ingo Molnar
  0 siblings, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-08  8:27 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-kernel


* Tom Rini <trini@kernel.crashing.org> wrote:

> > please send me your .config - mine builds/boots/works fine.
> 
> I don't have it handy anymore, but I just cp'd arch/x86_64/defconfig
> to .config, ran oldconfig and turned RT off (PREEMPT_NONE=y) [...]

thanks - managed to reproduce it this way. The patch below fixes the x64
build error on !PREEMPT_RT and the resulting kernel boots fine as well,
plus it fixes an x64 SMP build error as well. I have uploaded the -38-04
release with this fix.

	Ingo

--- linux/arch/x86_64/kernel/x8664_ksyms.c	
+++ linux/arch/x86_64/kernel/x8664_ksyms.c	
@@ -194,7 +194,7 @@ EXPORT_SYMBOL(rwsem_down_write_failed_th
 EXPORT_SYMBOL(empty_zero_page);
 
 #ifdef CONFIG_HAVE_DEC_LOCK
-EXPORT_SYMBOL(_atomic_dec_and_lock);
+EXPORT_SYMBOL(_atomic_dec_and_raw_spin_lock);
 #endif
 
 EXPORT_SYMBOL(die_chain);
--- linux/arch/x86_64/lib/dec_and_lock.c	
+++ linux/arch/x86_64/lib/dec_and_lock.c	
@@ -10,7 +10,7 @@
 #include <linux/spinlock.h>
 #include <asm/atomic.h>
 
-int _atomic_dec_and_lock(atomic_t *atomic, raw_spinlock_t *lock)
+int _atomic_dec_and_raw_spin_lock(atomic_t *atomic, raw_spinlock_t *lock)
 {
 	int counter;
 	int newcount;

--- linux/arch/x86_64/kernel/smp.c	
+++ linux/arch/x86_64/kernel/smp.c	
@@ -266,6 +266,16 @@ void smp_send_reschedule(int cpu)
 }
 
 /*
+ * this function sends a 'reschedule' IPI to all other CPUs.
+ * This is used when RT tasks are starving and other CPUs
+ * might be able to run them:
+ */
+void smp_send_reschedule_allbutself(void)
+{
+	send_IPI_allbutself(RESCHEDULE_VECTOR);
+}
+
+/*
  * Structure and data for smp_call_function(). This is designed to minimise
  * static memory requirements. It also looks cleaner.
  */

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-07 23:14         ` Esben Nielsen
@ 2005-02-08  8:39           ` Ingo Molnar
  2005-02-08 18:55             ` Jeff Dike
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-08  8:39 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Jeff Dike, linux-kernel


* Esben Nielsen <simlo@phys.au.dk> wrote:

> Well, I keep trying a little bit more. In the mean while you can get
> some of the stuff I needed to change to at least get it to compile:
> 
> One of the problems was use of direct architecture specific semaphores
> (which doesn't work under PREEMPT_REALTIME) and in places where a
> quick (maybe too quick) look at the code told me that completions
> ought to be used. Therefore I changed two semaphores to completions
> which compiled fine. I have tried the change on 2.6.11-rc2, and it
> seemed to work, but I have not tested it heavily.

Jeff, any objections against adding this change to UML at some point? 
It's at most a cleanup for now (PREEMPT_RT not being an upstream
feature), but it makes life easier if 'more exotic' semaphore details
are not being relied on (even if that reliance is 100% correct
currently).

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-08  7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks
@ 2005-02-08  8:45   ` Ingo Molnar
  2005-02-08 10:26     ` Valdis.Kletnieks
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-08  8:45 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel


* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:

> I found a CONFIG_NET_PKTGEN=Y in the config, rebuilt with =n, and the
> resulting kernel boots fine (am using it as I type). Vanilla -rc3-mm1
> also boots fine with the PTKGEN=y setting (as did
> 2.6.10-mm1-V0.7.34-01, the last -mm I built with a -RT patch).  I
> haven't tried a vanilla -rc3-V0.7.38-03, but I don't see anyplace -mm1
> hits pktgen.c
> 
> If the above isn't enough to track down the issue, feel free to let me
> know what you'd like me to try next.

i tried to enable NET_PKTGEN in my vanilla-based -RT tree and it
boots/works fine. Could you try a vanilla-based -RT tree too, with
NET_PKTGEN enabled, and if it breaks send me your .config - if it doesnt
break then could you send me your -mm1 .config?

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-08  8:45   ` Ingo Molnar
@ 2005-02-08 10:26     ` Valdis.Kletnieks
  0 siblings, 0 replies; 125+ messages in thread
From: Valdis.Kletnieks @ 2005-02-08 10:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 707 bytes --]

On Tue, 08 Feb 2005 09:45:29 +0100, Ingo Molnar said:

> i tried to enable NET_PKTGEN in my vanilla-based -RT tree and it
> boots/works fine. Could you try a vanilla-based -RT tree too, with
> NET_PKTGEN enabled

Plain -rc3-V0.7.38-03 loops at boot as well, so that rules out any -mm1
issues or a botched merge on my part. .config attached.

Gut instinct is "yet another thing I broke by compiling with PREEMPT_DESKTOP
rather than PREEMPT_RT"... 

(userspace is Fedora Core -devel tree as of today, gcc-3.4.3-17, just
in case this is some squirrelly toolchain issue...)

(Feel free to back-burner this issue if somebody has a more severe problem - I'm
not in any actual need of NET_PKTGEN at the moment).



[-- Attachment #1.2: .config --]
[-- Type: text/plain , Size: 34709 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.11-rc3-RT-V0.7.38-03
# Tue Feb  8 04:43:25 2005
#
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_SYSCTL=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_EMBEDDED=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
# CONFIG_TINY_SHMEM is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT_DESKTOP=y
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_SOFTIRQS=y
CONFIG_PREEMPT_HARDIRQS=y
# CONFIG_SPINLOCK_BKL is not set
CONFIG_PREEMPT_BKL=y
CONFIG_ASM_SEMAPHORES=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MCE_P4THERMAL=y
# CONFIG_TOSHIBA is not set
CONFIG_I8K=m
CONFIG_MICROCODE=m
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_HAVE_DEC_LOCK=y
CONFIG_REGPARM=y

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_INTERPRETER=y
# CONFIG_ACPI_SLEEP is not set
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=m
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_THERMAL=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_BUS=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
# CONFIG_CPU_FREQ_DEBUG is not set
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_TABLE=y

#
# CPUFreq processor drivers
#
# CONFIG_X86_ACPI_CPUFREQ is not set
# CONFIG_X86_POWERNOW_K6 is not set
# CONFIG_X86_POWERNOW_K7 is not set
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_GX_SUSPMOD is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_SPEEDSTEP_ICH=y
# CONFIG_X86_SPEEDSTEP_SMI is not set
CONFIG_X86_P4_CLOCKMOD=y
# CONFIG_X86_CPUFREQ_NFORCE2 is not set
# CONFIG_X86_LONGRUN is not set
# CONFIG_X86_LONGHAUL is not set

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=y
# CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCI_MSI is not set
# CONFIG_PCI_LEGACY_PROC is not set
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set

#
# PCCARD (PCMCIA/CardBus) support
#
CONFIG_PCCARD=m
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=m
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=m
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
# CONFIG_I82365 is not set
# CONFIG_TCIC is not set
CONFIG_PCMCIA_PROBE=y
CONFIG_PCCARD_NONSTATIC=m

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_MISC=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
# CONFIG_DEBUG_DRIVER is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#
CONFIG_PNP=y
CONFIG_PNP_DEBUG=y

#
# Protocols
#
# CONFIG_ISAPNP is not set
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y

#
# Block devices
#
CONFIG_BLK_DEV_FD=m
# CONFIG_BLK_DEV_XD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=10240
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_LBD is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_ATA_OVER_ETH is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
CONFIG_IDE_TASK_IOCTL=y

#
# IDE chipset support/bugfixes
#
# CONFIG_IDE_GENERIC is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
# CONFIG_SCSI is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=y
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
# CONFIG_DM_ZERO is not set

#
# Fusion MPT device support
#

#
# IEEE 1394 (FireWire) support
#
CONFIG_IEEE1394=m

#
# Subsystem Options
#
# CONFIG_IEEE1394_VERBOSEDEBUG is not set
# CONFIG_IEEE1394_OUI_DB is not set
# CONFIG_IEEE1394_EXTRA_CONFIG_ROMS is not set

#
# Device Drivers
#
# CONFIG_IEEE1394_PCILYNX is not set
CONFIG_IEEE1394_OHCI1394=m

#
# Protocol Drivers
#
# CONFIG_IEEE1394_VIDEO1394 is not set
# CONFIG_IEEE1394_ETH1394 is not set
# CONFIG_IEEE1394_DV1394 is not set
# CONFIG_IEEE1394_RAWIO is not set
# CONFIG_IEEE1394_CMP is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Networking support
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_NETLINK_DEV=y
CONFIG_UNIX=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
CONFIG_INET_IPCOMP=y
CONFIG_INET_TUNNEL=y
CONFIG_IP_TCPDIAG=y
CONFIG_IP_TCPDIAG_IPV6=y

#
# IP: Virtual Server Configuration
#
# CONFIG_IP_VS is not set
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_INET6_AH=y
CONFIG_INET6_ESP=y
CONFIG_INET6_IPCOMP=y
CONFIG_INET6_TUNNEL=y
# CONFIG_IPV6_TUNNEL is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set

#
# IP: Netfilter Configuration
#
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_CT_ACCT=y
CONFIG_IP_NF_CONNTRACK_MARK=y
CONFIG_IP_NF_CT_PROTO_SCTP=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IRC=m
CONFIG_IP_NF_TFTP=m
# CONFIG_IP_NF_AMANDA is not set
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_IPRANGE=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_PKTTYPE=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_RECENT=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_AH_ESP=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_HELPER=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_MATCH_CONNTRACK=m
CONFIG_IP_NF_MATCH_OWNER=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_REALM=m
CONFIG_IP_NF_MATCH_SCTP=m
CONFIG_IP_NF_MATCH_COMMENT=m
CONFIG_IP_NF_MATCH_CONNMARK=m
CONFIG_IP_NF_MATCH_HASHLIMIT=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_SAME=m
# CONFIG_IP_NF_NAT_SNMP_BASIC is not set
CONFIG_IP_NF_NAT_IRC=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_NAT_TFTP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_DSCP=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_CLASSIFY=m
CONFIG_IP_NF_TARGET_CONNMARK=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_TARGET_NOTRACK=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
# CONFIG_IP6_NF_QUEUE is not set
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_LIMIT=m
CONFIG_IP6_NF_MATCH_MAC=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_MULTIPORT=m
CONFIG_IP6_NF_MATCH_OWNER=m
CONFIG_IP6_NF_MATCH_MARK=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_AHESP=m
CONFIG_IP6_NF_MATCH_LENGTH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_TARGET_MARK=m
CONFIG_IP6_NF_RAW=m
CONFIG_XFRM=y
CONFIG_XFRM_USER=y

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
CONFIG_NET_CLS_ROUTE=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set
# CONFIG_NET_SB1000 is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
# CONFIG_EL3 is not set
# CONFIG_3C515 is not set
CONFIG_VORTEX=y
# CONFIG_TYPHOON is not set
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set

#
# Tulip family network device support
#
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
# CONFIG_TULIP is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
CONFIG_PCMCIA_XIRCOM=y
# CONFIG_PCMCIA_XIRTULIP is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
# CONFIG_NET_PCI is not set
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
CONFIG_NET_RADIO=y

#
# Obsolete Wireless cards support (pre-802.11)
#
# CONFIG_STRIP is not set
# CONFIG_ARLAN is not set
# CONFIG_WAVELAN is not set
# CONFIG_PCMCIA_WAVELAN is not set
# CONFIG_PCMCIA_NETWAVE is not set

#
# Wireless 802.11 Frequency Hopping cards support
#
# CONFIG_PCMCIA_RAYCS is not set

#
# Wireless 802.11b ISA/PCI cards support
#
# CONFIG_AIRO is not set
CONFIG_HERMES=y
# CONFIG_PLX_HERMES is not set
# CONFIG_TMD_HERMES is not set
CONFIG_PCI_HERMES=y
# CONFIG_ATMEL is not set

#
# Wireless 802.11b Pcmcia/Cardbus cards support
#
CONFIG_PCMCIA_HERMES=m
# CONFIG_AIRO_CS is not set
# CONFIG_PCMCIA_WL3501 is not set

#
# Prism GT/Duette 802.11(a/b/g) PCI/Cardbus support
#
# CONFIG_PRISM54 is not set
CONFIG_NET_WIRELESS=y

#
# PCMCIA network device support
#
CONFIG_NET_PCMCIA=y
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_FMVJ18X is not set
# CONFIG_PCMCIA_PCNET is not set
# CONFIG_PCMCIA_NMCLAN is not set
# CONFIG_PCMCIA_SMC91C92 is not set
CONFIG_PCMCIA_XIRC2PS=m
# CONFIG_PCMCIA_AXNET is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
CONFIG_PPP=m
# CONFIG_PPP_MULTILINK is not set
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
# CONFIG_PPPOE is not set
# CONFIG_SLIP is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input I/O drivers
#
# CONFIG_GAMEPORT is not set
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
# CONFIG_INPUT_UINPUT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_CS=m
# CONFIG_SERIAL_8250_ACPI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
# CONFIG_SERIAL_8250_MULTIPORT is not set
# CONFIG_SERIAL_8250_RSA is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_WAFER_WDT is not set
CONFIG_I8XX_TCO=m
# CONFIG_SC1200_WDT is not set
# CONFIG_SCx200_WDT is not set
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_MACHZ_WDT is not set

#
# ISA-based Watchdog Cards
#
# CONFIG_PCWATCHDOG is not set
# CONFIG_MIXCOMWD is not set
# CONFIG_WDT is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_HW_RANDOM=y
CONFIG_NVRAM=m
CONFIG_RTC=m
CONFIG_RTC_HISTOGRAM=m
CONFIG_BLOCKER=m
# CONFIG_GEN_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=m
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
CONFIG_AGP_INTEL=m
# CONFIG_AGP_INTEL_MCH is not set
# CONFIG_AGP_NVIDIA is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_SWORKS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_AGP_EFFICEON is not set
# CONFIG_DRM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=y

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
# CONFIG_I2C_ALGOBIT is not set
# CONFIG_I2C_ALGOPCF is not set
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_ELEKTOR is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_I810 is not set
# CONFIG_I2C_ISA is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
CONFIG_I2C_PIIX4=y
# CONFIG_I2C_PROSAVAGE is not set
# CONFIG_I2C_SAVAGE4 is not set
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Hardware Sensors Chip support
#
# CONFIG_I2C_SENSOR is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83627HF is not set

#
# Other I2C Chip support
#
# CONFIG_SENSORS_EEPROM is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_RTC8564 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
CONFIG_FB=y
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
CONFIG_FB_VESA=y
CONFIG_VIDEO_SELECT=y
# CONFIG_FB_HGA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I810 is not set
# CONFIG_FB_INTEL is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON_OLD is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_VIRTUAL is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y

#
# Logo configuration
#
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Sound
#
CONFIG_SOUND=y

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_SEQUENCER=y
# CONFIG_SND_SEQ_DUMMY is not set
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_SEQUENCER_OSS=y
# CONFIG_SND_RTCTIMER is not set
CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_MEMORY=y
CONFIG_SND_DEBUG_DETECT=y

#
# Generic devices
#
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_VIRMIDI is not set
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set

#
# ISA devices
#
# CONFIG_SND_AD1848 is not set
# CONFIG_SND_CS4231 is not set
# CONFIG_SND_CS4232 is not set
# CONFIG_SND_CS4236 is not set
# CONFIG_SND_ES1688 is not set
# CONFIG_SND_ES18XX is not set
# CONFIG_SND_GUSCLASSIC is not set
# CONFIG_SND_GUSEXTREME is not set
# CONFIG_SND_GUSMAX is not set
# CONFIG_SND_INTERWAVE is not set
# CONFIG_SND_INTERWAVE_STB is not set
# CONFIG_SND_OPTI92X_AD1848 is not set
# CONFIG_SND_OPTI92X_CS4231 is not set
# CONFIG_SND_OPTI93X is not set
# CONFIG_SND_SB8 is not set
# CONFIG_SND_SB16 is not set
# CONFIG_SND_SBAWE is not set
# CONFIG_SND_WAVEFRONT is not set
# CONFIG_SND_CMI8330 is not set
# CONFIG_SND_OPL3SA2 is not set
# CONFIG_SND_SGALAXY is not set
# CONFIG_SND_SSCAPE is not set

#
# PCI devices
#
CONFIG_SND_AC97_CODEC=y
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_EMU10K1X is not set
# CONFIG_SND_CA0106 is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_HDSP is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_YMFPCI is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
CONFIG_SND_INTEL8X0=y
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VX222 is not set

#
# USB devices
#
# CONFIG_SND_USB_AUDIO is not set
# CONFIG_SND_USB_USX2Y is not set

#
# PCMCIA devices
#
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_VXP440 is not set
# CONFIG_SND_PDAUDIOCF is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_BANDWIDTH=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y

#
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_SPLIT_ISO is not set
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_AUDIO is not set
# CONFIG_USB_BLUETOOTH_TTY is not set
# CONFIG_USB_MIDI is not set
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information
#
# CONFIG_USB_STORAGE is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_USB_HIDINPUT=y
# CONFIG_HID_FF is not set
# CONFIG_USB_HIDDEV is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set

#
# USB port drivers
#

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_TEST is not set

#
# USB ATM/DSL drivers
#

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#
# CONFIG_INFINIBAND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y

#
# XFS support
#
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_QUOTA=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS_XATTR=y
CONFIG_DEVPTS_FS_SECURITY=y
CONFIG_TMPFS=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_SECURITY=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V4=y
CONFIG_NFS_DIRECTIO=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Profiling support
#
CONFIG_PROFILING=y
CONFIG_OPROFILE=y

#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_WAKEUP_TIMING=y
CONFIG_PREEMPT_TRACE=y
# CONFIG_CRITICAL_PREEMPT_TIMING is not set
# CONFIG_CRITICAL_IRQSOFF_TIMING is not set
CONFIG_LATENCY_TIMING=y
# CONFIG_LATENCY_TRACE is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_FS is not set
CONFIG_USE_FRAME_POINTER=y
CONFIG_FRAME_POINTER=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_KPROBES=y
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_4KSTACKS is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_SECLVL=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
# CONFIG_SECURITY_SELINUX_DISABLE is not set
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
# CONFIG_SECURITY_SELINUX_MLS is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_AES_586=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
CONFIG_CRYPTO_CRC32C=m
# CONFIG_CRYPTO_TEST is not set

#
# Hardware crypto devices
#
# CONFIG_CRYPTO_DEV_PADLOCK is not set

#
# Library routines
#
CONFIG_CRC_CCITT=m
CONFIG_CRC32=y
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-08  8:39           ` Ingo Molnar
@ 2005-02-08 18:55             ` Jeff Dike
  2005-02-08 21:20               ` Esben Nielsen
  0 siblings, 1 reply; 125+ messages in thread
From: Jeff Dike @ 2005-02-08 18:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, linux-kernel

mingo@elte.hu said:
> Jeff, any objections against adding this change to UML at some point?

No, not at all.  I just need to understand what CONFIG_PREEMPT requires of
UML.

>From a quick read of Documentation/preempt-locking.txt, this looks like it's
implementing Rule #3 (unlock by the same task that locked), which looks fine.

				Jeff


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-08 18:55             ` Jeff Dike
@ 2005-02-08 21:20               ` Esben Nielsen
  2005-02-08 21:44                 ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Esben Nielsen @ 2005-02-08 21:20 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Ingo Molnar, linux-kernel

On Tue, 8 Feb 2005, Jeff Dike wrote:

> mingo@elte.hu said:
> > Jeff, any objections against adding this change to UML at some point?
> 
> No, not at all.  I just need to understand what CONFIG_PREEMPT requires of
> UML.

Ingo can probably tell you in much more detail. My problem when I tried to
compile with CONFIG_PREEMPT_RT (not CONFIG_PREEMPT!) was that
__SEMAPHORE_INITIALIZER didn't exist since the architecture specific
semaphore.h is not included in that configuration. The reason again is
that locking (not completions) is changed a lot under CONFIG_PREEMPT_RT to
introduce muteces instead of raw spinlocks and priority inheritance to
make these lockings behave deterministicly.

> 
> >From a quick read of Documentation/preempt-locking.txt, this looks like it's
> implementing Rule #3 (unlock by the same task that locked), which looks fine.
>

Now I don't really know who I am responding to. But both up()s now changed
to complete()s are in something looking very much like an interrupt
handler. But again, as I said, I didn't analyze the code in detail, I just
made it compile and checked that it worked in bare 2.6.11-rc2 UML  - which
I am not too sure how to set up and use to begin with!
 
> 				Jeff
> 

Esben



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-08 21:20               ` Esben Nielsen
@ 2005-02-08 21:44                 ` Ingo Molnar
  2005-02-08 23:02                   ` Esben Nielsen
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-08 21:44 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Jeff Dike, linux-kernel


* Esben Nielsen <simlo@phys.au.dk> wrote:

> Now I don't really know who I am responding to. But both up()s now
> changed to complete()s are in something looking very much like an
> interrupt handler. But again, as I said, I didn't analyze the code in
> detail, I just made it compile and checked that it worked in bare
> 2.6.11-rc2 UML - which I am not too sure how to set up and use to
> begin with!

btw., UML is really easy to begin with: after you've compiled you get a
'linux' binary in the toplevel directory - just execute it via './linux'
and you'll see a Linux kernel booting - that's all you need!

Add a filesystem image via a root= parameter to that command and the UML
kernel will start booting that filesystem image. (if you are adventurous
you can even boot a real partition, but for the first user this is
strongly discouraged.) There are a number of UML-ready filesystem images
downloadable from the net.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
                   ` (3 preceding siblings ...)
  2005-02-08  7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks
@ 2005-02-08 21:58 ` William Weston
  2005-02-09 11:51   ` Ingo Molnar
  2005-02-09 12:48   ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley
  2005-02-19  5:08 ` Lee Revell
  2005-03-11  9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar
  6 siblings, 2 replies; 125+ messages in thread
From: William Weston @ 2005-02-08 21:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6629 bytes --]

Hi Ingo,

Great work on the -RT kernel!  Here's a status report from my Athlon box
w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, 
alsa-1.0.8, and latencytest-0.5.5:

Latencytest (measured with RTC instead of latencytest LKM, which appears
to be somewhat broken under later kernels) is reporting consistent
latencies down below 0.24ms and no xruns.

Jack_test4.1 is giving me good results with the default settings.  I tried
increasing the number of clients, but ran into the same issues other have.

Jackd (-R -P64 -dalsa -dhw:0 -r44100 -p64 -n3 -i2 -o2) w/ one soft-synth
client (using 15% to 30% of the CPU) will run for over 12 hours without
any xruns, even during kernel compiles and nightly updatedb runs.

Running wmcube (an impractical, greedy, little CPU meter), even when
niced, causes lots of xruns.  It may be good for worst-case-scenario
desktop load testing.

A couple BUGs are being logged (see below), but without any ill effect
other than taking up space on my /var.


jack_test4.1 results (with default settings):

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . :   300
Number of clients . . . . . . :    14
Ports per client  . . . . . . :     4
Frames per buffer . . . . . . :    64
Number of runs  . . . . . . . :(    1)
*********************************************
Timeout Count . . . . . . . . :(    0)
XRUN Count  . . . . . . . . . :     0
Delay Count (>spare time) . . :     0
Delay Count (>1000 usecs) . . :     0
Delay Maximum . . . . . . . . :    92   usecs
Cycle Maximum . . . . . . . . :  1100   usecs
Average DSP Load. . . . . . . :    60.2 %
Average CPU System Load . . . :    24.3 %
Average CPU User Load . . . . :    40.2 %
Average CPU Nice Load . . . . :     0.3 %
Average CPU I/O Wait Load . . :     0.6 %
Average CPU IRQ Load  . . . . :     0.0 %
Average CPU Soft-IRQ Load . . :     0.0 %
Average Interrupt Rate  . . . :  1751.7 /sec
Average Context-Switch Rate . : 18563.4 /sec
*********************************************
Delta Maximum . . . . . . . . : 0.00000
*********************************************


Network interface (via rhine) startup triggers these two BUGs:

BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
kernel/rt.c:1448
in_atomic():1 [00000001], irqs_disabled():0
 [<c0103e77>] dump_stack+0x17/0x20 (12)
 [<c0119f89>] __might_sleep+0xd9/0xf0 (40)
 [<c0134816>] __spin_lock+0x36/0x50 (24)
 [<c0147914>] kmem_cache_alloc+0x34/0x120 (44)
 [<c01d3143>] sel_netif_lookup+0x63/0x150 (28)
 [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28)
 [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144)
 [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28)
 [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100)
 [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32)
 [<c02dc595>] ip_rcv+0x215/0x3f0 (56)
 [<c02c201c>] netif_receive_skb+0x12c/0x160 (40)
 [<c02c20ce>] process_backlog+0x7e/0x110 (32)
 [<c02c21d2>] net_rx_action+0x72/0x130 (24)
 [<c0122428>] ___do_softirq+0x48/0xd0 (40)
 [<c012254b>] _do_softirq+0x1b/0x30 (8)
 [<c0122920>] ksoftirqd+0xa0/0xf0 (28)
 [<c01312fb>] kthread+0x8b/0xc0 (36)
 [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c013dd3f>] .... __do_IRQ+0xef/0x180
.....[<c0105306>] ..   ( <= do_IRQ+0x56/0xa0)
.. [<c0135240>] .... print_traces+0x10/0x40
.....[<c0103e77>] ..   ( <= dump_stack+0x17/0x20)

BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
kernel/rt.c:1448
in_atomic():1 [00000001], irqs_disabled():0
 [<c0103e77>] dump_stack+0x17/0x20 (12)
 [<c0119f89>] __might_sleep+0xd9/0xf0 (40)
 [<c0134816>] __spin_lock+0x36/0x50 (24)
 [<c0147914>] kmem_cache_alloc+0x34/0x120 (44)
 [<c01d3143>] sel_netif_lookup+0x63/0x150 (28)
 [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28)
 [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144)
 [<c02f6be6>] tcp_v4_rcv+0x4c6/0x8b0 (84)
 [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32)
 [<c02dc595>] ip_rcv+0x215/0x3f0 (56)
 [<c02c201c>] netif_receive_skb+0x12c/0x160 (40)
 [<c02c20ce>] process_backlog+0x7e/0x110 (32)
 [<c02c21d2>] net_rx_action+0x72/0x130 (24)
 [<c0122428>] ___do_softirq+0x48/0xd0 (40)
 [<c012254b>] _do_softirq+0x1b/0x30 (8)
 [<c0122920>] ksoftirqd+0xa0/0xf0 (28)
 [<c01312fb>] kthread+0x8b/0xc0 (36)
 [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c013dcc4>] .... __do_IRQ+0x74/0x180
.....[<c0105306>] ..   ( <= do_IRQ+0x56/0xa0)
.. [<c0118922>] .... scheduler_tick+0x62/0x300
.....[<c0107b2d>] ..   ( <= timer_interrupt+0x4d/0x160)


MIDI playback through any MPU-401 interface triggers the following BUG, 
reported once for each outgoing MIDI event (non MPU-401 hw interfaces and 
sw interfaces not affected):

BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
kernel/rt.c:1448
in_atomic():0 [00000000], irqs_disabled():1
 [<c0103e77>] dump_stack+0x17/0x20 (12)
 [<c0119f89>] __might_sleep+0xd9/0xf0 (40)
 [<c0134816>] __spin_lock+0x36/0x50 (24)
 [<c013486b>] _spin_lock_irqsave+0xb/0x10 (8)
 [<e089674a>] snd_rawmidi_transmit_peek+0x3a/0xe0 [snd_rawmidi] (40)
 [<e088c700>] snd_mpu401_uart_output_write+0x20/0x90 [snd_mpu401_uart] (24)
 [<e088c7fc>] snd_mpu401_uart_output_trigger+0x8c/0xa0 [snd_mpu401_uart] (20)
 [<e0896a5c>] snd_rawmidi_kernel_write1+0x17c/0x190 [snd_rawmidi] (48)
 [<e0896a82>] snd_rawmidi_kernel_write+0x12/0x20 [snd_rawmidi] (12)
 [<e0c06117>] dump_midi+0x27/0x50 [snd_seq_midi] (16)
 [<e0c06192>] event_process_midi+0x52/0xb0 [snd_seq_midi] (40)
 [<e0a11acc>] snd_seq_deliver_single_event+0x12c/0x140 [snd_seq] (40)
 [<e0a11ca6>] snd_seq_deliver_event+0x36/0x50 [snd_seq] (24)
 [<e0a11cfb>] snd_seq_dispatch_event+0x3b/0x130 [snd_seq] (68)
 [<e0a14d4c>] snd_seq_check_queue+0xec/0x110 [snd_seq] (28)
 [<e0841067>] snd_timer_interrupt+0x2a7/0x2f0 [snd_timer] (56)
 [<c0126428>] run_timer_softirq+0x1c8/0x3c0 (52)
 [<c0122428>] ___do_softirq+0x48/0xd0 (40)
 [<c012254b>] _do_softirq+0x1b/0x30 (8)
 [<c0122920>] ksoftirqd+0xa0/0xf0 (28)
 [<c01312fb>] kthread+0x8b/0xc0 (36)
 [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692)
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0135240>] .... print_traces+0x10/0x40
.....[<c0103e77>] ..   ( <= dump_stack+0x17/0x20)


Please let me know if there's anything else I can do to help debug this.


Best Regards,
--William Weston <weston at sysex.net>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 46320 bytes --]

ver_linux output:

Linux astarte.lysdexia.org 2.6.11-rc3-RT-V0.7.38-03 #1 Mon Feb 7 21:05:43 PST 2005 i686 athlon i386 GNU/Linux
 
Gnu C                  3.4.2
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12a
mount                  2.12a
module-init-tools      3.1-pre5
e2fsprogs              1.35
jfsutils               1.1.7
reiserfsprogs          3.6.18
reiser4progs           line
quota-tools            3.12.
PPP                    2.4.2
isdn4k-utils           3.3
nfs-utils              1.0.6
Linux C Library        2.3.4
Dynamic linker (ldd)   2.3.4
Procps                 3.2.3
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   039
Modules Loaded         nls_utf8 snd_seq_oss snd_seq_midi it87 eeprom i2c_sensor i2c_isa i2c_viapro i2c_dev i2c_core realtime binfmt_misc fan button ohci1394 ieee1394 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq snd_emu10k1 snd_util_mem snd_hwdep snd_mpu401 snd_via82xx snd_mpu401_uart snd_cs46xx snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc gameport

/proc/interrupts:

           CPU0       
  0:   51224998    IO-APIC-edge  timer  0/24998
  1:      82207    IO-APIC-edge  i8042  2/82207
  3:     102061    IO-APIC-edge  MPU401 UART  0/2061
  8:          1    IO-APIC-edge  rtc  0/1
  9:          0   IO-APIC-level  acpi  0/0
 12:     112406    IO-APIC-edge  i8042  0/12405
 14:     215412    IO-APIC-edge  ide0  0/15262
 15:        102    IO-APIC-edge  ide1  1/100
 16:    3683478   IO-APIC-level  ohci1394, radeon@pci:0000:01:00.0  0/83478
 17:   53626984   IO-APIC-level  CS46XX  0/26984
 19:          0   IO-APIC-level  EMU10K1  0/0
 22:          0   IO-APIC-level  VIA8233  0/0
 23:      74975   IO-APIC-level  eth0  0/74975
NMI:          0 
LOC:   51226373 
ERR:          0
MIS:          0

/proc/ioports:

0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0290-0297 : pnp 00:0c
  0290-0297 : it87
0300-0301 : MPU401 UART
0370-0375 : pnp 00:0c
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
0cf8-0cff : PCI conf1
9800-98ff : 0000:00:12.0
  9800-98ff : via-rhine
a000-a00f : 0000:00:11.1
  a000-a007 : ide0
  a008-a00f : ide1
b400-b407 : 0000:00:0b.1
b800-b83f : 0000:00:0b.0
  b800-b83f : EMU10K1
d000-dfff : PCI Bus #01
  d800-d8ff : 0000:01:00.0
e000-e0ff : 0000:00:11.5
  e000-e0ff : VIA8233
e400-e47f : motherboard
  e400-e403 : PM1a_EVT_BLK
  e404-e405 : PM1a_CNT_BLK
  e408-e40b : PM_TMR
  e420-e423 : GPE0_BLK
e800-e81f : motherboard
  e800-e81f : pnp 00:01
    e800-e807 : viapro-smbus

lspci -vvv output:

00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge
	Subsystem: ASUSTeK Computer Inc. A7V8X motherboard
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 0
	Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [a0] AGP version 3.5
		Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4
		Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: ef000000-efefffff
	Prefetchable memory behind bridge: eff00000-f7ffffff
	Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
	Subsystem: Creative Labs SB Audigy 2 ZS (SB0350)
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (500ns min, 5000ns max)
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at b800 [size=64]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev 04)
	Subsystem: Creative Labs SB Audigy MIDI/Game Port
	Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Region 0: I/O ports at b400 [disabled] [size=8]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04) (prog-if 10 [OHCI])
	Subsystem: Creative Labs SB Audigy FireWire Port
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (500ns min, 1000ns max), Cache Line Size 08
	Interrupt: pin B routed to IRQ 16
	Region 0: Memory at ee800000 (32-bit, non-prefetchable) [size=2K]
	Region 1: Memory at ee000000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME+

00:0e.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01)
	Subsystem: Hercules: Unknown device a010
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (1000ns min, 6000ns max)
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at ed800000 (32-bit, non-prefetchable) [size=4K]
	Region 1: Memory at ed000000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
	Subsystem: ASUSTeK Computer Inc. A7V8X-X motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
	Subsystem: ASUSTeK Computer Inc. A7V8X-X motherboard rev. 1.01
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin A routed to IRQ 255
	Region 4: I/O ports at a000 [size=16]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
	Subsystem: ASUSTeK Computer Inc. A7V8X-X Motherboard
	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin C routed to IRQ 22
	Region 0: I/O ports at e000 [size=256]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
	Subsystem: ASUSTeK Computer Inc. A7V8X-X Motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (750ns min, 2000ns max), Cache Line Size 08
	Interrupt: pin A routed to IRQ 23
	Region 0: I/O ports at 9800 [size=256]
	Region 1: Memory at ec000000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] (prog-if 00 [VGA])
	Subsystem: PC Partner Limited: Unknown device 7c28
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (2000ns min), Cache Line Size 08
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M]
	Region 1: I/O ports at d800 [size=256]
	Region 2: Memory at ef000000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at effe0000 [disabled] [size=128K]
	Capabilities: [58] AGP version 2.0
		Status: RQ=48 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4
		Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.11-rc3-RT-V0.7.38-03
# Mon Feb  7 20:49:10 2005
#
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_FUTEX=y
CONFIG_EPOLL=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
# CONFIG_TINY_SHMEM is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT_DESKTOP is not set
CONFIG_PREEMPT_RT=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_SOFTIRQS=y
CONFIG_PREEMPT_HARDIRQS=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
# CONFIG_X86_MCE_P4THERMAL is not set
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_HAVE_DEC_LOCK=y
CONFIG_REGPARM=y

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_INTERPRETER=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
# CONFIG_ACPI_VIDEO is not set
CONFIG_ACPI_FAN=m
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_BUS=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set

#
# APM (Advanced Power Management) BIOS Support
#
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
# CONFIG_APM_CPU_IDLE is not set
# CONFIG_APM_DISPLAY_BLANK is not set
CONFIG_APM_RTC_IS_GMT=y
# CONFIG_APM_ALLOW_INTS is not set
CONFIG_APM_REAL_MODE_POWER_OFF=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY_PROC=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PC-card bridges
#
CONFIG_PCMCIA_PROBE=y

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
# CONFIG_DEBUG_DRIVER is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#
CONFIG_PNP=y
# CONFIG_PNP_DEBUG is not set

#
# Protocols
#
CONFIG_ISAPNP=y
CONFIG_PNPBIOS=y
# CONFIG_PNPBIOS_PROC_FS is not set
CONFIG_PNPACPI=y

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_LBD is not set
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=128
# CONFIG_CDROM_PKTCDVD_WCACHE is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_ATA_OVER_ETH is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
CONFIG_BLK_DEV_IDESCSI=m
CONFIG_IDE_TASK_IOCTL=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_IDE_ARM is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_SCSI=m
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y

#
# SCSI Transport Attributes
#
# CONFIG_SCSI_SPI_ATTRS is not set
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_IN2000 is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_SCSI_SATA is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_EATA_PIO is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_PSI240I is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_ISP is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA2XXX=m
# CONFIG_SCSI_QLA21XX is not set
# CONFIG_SCSI_QLA22XX is not set
# CONFIG_SCSI_QLA2300 is not set
# CONFIG_SCSI_QLA2322 is not set
# CONFIG_SCSI_QLA6312 is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
# CONFIG_SCSI_DEBUG is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_IEEE1394=m

#
# Subsystem Options
#
# CONFIG_IEEE1394_VERBOSEDEBUG is not set
CONFIG_IEEE1394_OUI_DB=y
CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y
CONFIG_IEEE1394_CONFIG_ROM_IP1394=y

#
# Device Drivers
#
# CONFIG_IEEE1394_PCILYNX is not set
CONFIG_IEEE1394_OHCI1394=m

#
# Protocol Drivers
#
CONFIG_IEEE1394_VIDEO1394=m
CONFIG_IEEE1394_SBP2=m
# CONFIG_IEEE1394_SBP2_PHYS_DMA is not set
CONFIG_IEEE1394_ETH1394=m
CONFIG_IEEE1394_DV1394=m
CONFIG_IEEE1394_RAWIO=m
CONFIG_IEEE1394_CMP=m
CONFIG_IEEE1394_AMDTP=m

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Networking support
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK_DEV=y
CONFIG_UNIX=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
CONFIG_IP_TCPDIAG=y
# CONFIG_IP_TCPDIAG_IPV6 is not set
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set
CONFIG_XFRM=y
# CONFIG_XFRM_USER is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
CONFIG_LLC2=y
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
# CONFIG_NET_CLS_ROUTE is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
CONFIG_ETHERTAP=y
# CONFIG_NET_SB1000 is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set

#
# Tulip family network device support
#
# CONFIG_NET_TULIP is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
# CONFIG_B44 is not set
# CONFIG_FORCEDETH is not set
# CONFIG_CS89x0 is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
# CONFIG_E100 is not set
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=y
CONFIG_VIA_RHINE_MMIO=y
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SK98LIN is not set
# CONFIG_VIA_VELOCITY is not set
# CONFIG_TIGON3 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1152
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=864
CONFIG_INPUT_JOYDEV=m
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input I/O drivers
#
CONFIG_GAMEPORT=m
CONFIG_SOUND_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_GAMEPORT_EMU10K1 is not set
# CONFIG_GAMEPORT_VORTEX is not set
# CONFIG_GAMEPORT_FM801 is not set
CONFIG_GAMEPORT_CS461X=m
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=m
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_ANALOG=m
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
# CONFIG_JOYSTICK_TWIDDLER is not set
# CONFIG_JOYSTICK_JOYDUMP is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
CONFIG_INPUT_UINPUT=m

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
# CONFIG_SERIAL_8250_ACPI is not set
CONFIG_SERIAL_8250_NR_UARTS=2
CONFIG_SERIAL_8250_EXTENDED=y
# CONFIG_SERIAL_8250_MANY_PORTS is not set
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
# CONFIG_SERIAL_8250_MULTIPORT is not set
# CONFIG_SERIAL_8250_RSA is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
CONFIG_NVRAM=m
CONFIG_RTC=y
CONFIG_RTC_HISTOGRAM=y
CONFIG_BLOCKER=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_INTEL_MCH is not set
# CONFIG_AGP_NVIDIA is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_SWORKS is not set
CONFIG_AGP_VIA=y
# CONFIG_AGP_EFFICEON is not set
CONFIG_DRM=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=y
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_HPET_MMAP=y
# CONFIG_HANGCHECK_TIMER is not set

#
# I2C support
#
CONFIG_I2C=m
CONFIG_I2C_CHARDEV=m

#
# I2C Algorithms
#
# CONFIG_I2C_ALGOBIT is not set
# CONFIG_I2C_ALGOPCF is not set
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_ELEKTOR is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_I810 is not set
CONFIG_I2C_ISA=m
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_PROSAVAGE is not set
# CONFIG_I2C_SAVAGE4 is not set
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_VIA is not set
CONFIG_I2C_VIAPRO=m
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Hardware Sensors Chip support
#
CONFIG_I2C_SENSOR=m
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_GL518SM is not set
CONFIG_SENSORS_IT87=m
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83627HF is not set

#
# Other I2C Chip support
#
CONFIG_SENSORS_EEPROM=m
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_RTC8564 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_FB is not set
CONFIG_VIDEO_SELECT=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
CONFIG_SOUND=m

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_RTCTIMER=m
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set

#
# Generic devices
#
CONFIG_SND_MPU401_UART=m
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
# CONFIG_SND_MTPAV is not set
CONFIG_SND_SERIAL_U16550=m
CONFIG_SND_MPU401=m

#
# ISA devices
#
# CONFIG_SND_AD1816A is not set
# CONFIG_SND_AD1848 is not set
# CONFIG_SND_CS4231 is not set
# CONFIG_SND_CS4232 is not set
# CONFIG_SND_CS4236 is not set
# CONFIG_SND_ES968 is not set
# CONFIG_SND_ES1688 is not set
# CONFIG_SND_ES18XX is not set
# CONFIG_SND_GUSCLASSIC is not set
# CONFIG_SND_GUSEXTREME is not set
# CONFIG_SND_GUSMAX is not set
# CONFIG_SND_INTERWAVE is not set
# CONFIG_SND_INTERWAVE_STB is not set
# CONFIG_SND_OPTI92X_AD1848 is not set
# CONFIG_SND_OPTI92X_CS4231 is not set
# CONFIG_SND_OPTI93X is not set
# CONFIG_SND_SB8 is not set
# CONFIG_SND_SB16 is not set
# CONFIG_SND_SBAWE is not set
# CONFIG_SND_WAVEFRONT is not set
# CONFIG_SND_ALS100 is not set
# CONFIG_SND_AZT2320 is not set
# CONFIG_SND_CMI8330 is not set
# CONFIG_SND_DT019X is not set
# CONFIG_SND_OPL3SA2 is not set
# CONFIG_SND_SGALAXY is not set
# CONFIG_SND_SSCAPE is not set

#
# PCI devices
#
CONFIG_SND_AC97_CODEC=m
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
CONFIG_SND_CS46XX=m
CONFIG_SND_CS46XX_NEW_DSP=y
# CONFIG_SND_CS4281 is not set
CONFIG_SND_EMU10K1=m
# CONFIG_SND_EMU10K1X is not set
CONFIG_SND_CA0106=m
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_HDSP is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_YMFPCI is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
# CONFIG_SND_INTEL8X0 is not set
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_SONICVIBES is not set
CONFIG_SND_VIA82XX=m
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VX222 is not set

#
# USB devices
#
CONFIG_SND_USB_AUDIO=m
# CONFIG_SND_USB_USX2Y is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y

#
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_SPLIT_ISO=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_SL811_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_AUDIO=m
# CONFIG_USB_BLUETOOTH_TTY is not set
CONFIG_USB_MIDI=m
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_RW_DETECT=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_HP8200e=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y

#
# USB Input Devices
#
CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y
# CONFIG_HID_FF is not set
CONFIG_USB_HIDDEV=y

#
# USB HID Boot Protocol drivers
#
# CONFIG_USB_KBD is not set
# CONFIG_USB_MOUSE is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_EGALAX is not set
CONFIG_USB_XPAD=m
# CONFIG_USB_ATI_REMOTE is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set

#
# USB port drivers
#

#
# USB Serial Converter support
#
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
CONFIG_USB_SERIAL_BELKIN=m
CONFIG_USB_SERIAL_WHITEHEAT=m
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
CONFIG_USB_SERIAL_IPAQ=m
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
CONFIG_USB_SERIAL_KEYSPAN=m
CONFIG_USB_SERIAL_KEYSPAN_MPR=y
CONFIG_USB_SERIAL_KEYSPAN_USA28=y
CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
CONFIG_USB_SERIAL_KEYSPAN_USA19=y
CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
CONFIG_USB_SERIAL_KLSI=m
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
CONFIG_USB_SERIAL_MCT_U232=m
CONFIG_USB_SERIAL_PL2303=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OMNINET is not set
CONFIG_USB_EZUSB=y

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
CONFIG_USB_EMI26=m
# CONFIG_USB_AUERSWALD is not set
CONFIG_USB_RIO500=m
# CONFIG_USB_LEGOTOWER is not set
CONFIG_USB_LCD=m
CONFIG_USB_LED=m
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_TEST is not set

#
# USB ATM/DSL drivers
#

#
# USB Gadget Support
#
CONFIG_USB_GADGET=m
# CONFIG_USB_GADGET_DEBUG_FILES is not set
CONFIG_USB_GADGET_NET2280=y
CONFIG_USB_NET2280=m
# CONFIG_USB_GADGET_PXA2XX is not set
# CONFIG_USB_GADGET_GOKU is not set
# CONFIG_USB_GADGET_SA1100 is not set
# CONFIG_USB_GADGET_LH7A40X is not set
# CONFIG_USB_GADGET_DUMMY_HCD is not set
# CONFIG_USB_GADGET_OMAP is not set
CONFIG_USB_GADGET_DUALSPEED=y
# CONFIG_USB_ZERO is not set
# CONFIG_USB_ETH is not set
# CONFIG_USB_GADGETFS is not set
# CONFIG_USB_FILE_STORAGE is not set
CONFIG_USB_G_SERIAL=m

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#
# CONFIG_INFINIBAND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y

#
# XFS support
#
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
CONFIG_ROMFS_FS=m
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=m
CONFIG_AUTOFS4_FS=m

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS_XATTR=y
CONFIG_DEVPTS_FS_SECURITY=y
CONFIG_TMPFS=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_SECURITY=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
CONFIG_AFFS_FS=m
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V4 is not set
# CONFIG_NFSD_TCP is not set
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_SUNRPC=m
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
CONFIG_SMB_FS=m
# CONFIG_SMB_NLS_DEFAULT is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
CONFIG_ATARI_PARTITION=y
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
# CONFIG_BSD_DISKLABEL is not set
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
CONFIG_LDM_PARTITION=y
# CONFIG_LDM_DEBUG is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_EFI_PARTITION is not set

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=m
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=m
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=m

#
# Profiling support
#
CONFIG_PROFILING=y
# CONFIG_OPROFILE is not set

#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_DEBUG_SLAB is not set
CONFIG_DEBUG_PREEMPT=y
CONFIG_WAKEUP_TIMING=y
CONFIG_PREEMPT_TRACE=y
# CONFIG_CRITICAL_PREEMPT_TIMING is not set
# CONFIG_CRITICAL_IRQSOFF_TIMING is not set
CONFIG_LATENCY_TIMING=y
# CONFIG_LATENCY_TRACE is not set
CONFIG_RT_DEADLOCK_DETECT=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_FS is not set
CONFIG_USE_FRAME_POINTER=y
CONFIG_FRAME_POINTER=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_KPROBES is not set
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_4KSTACKS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y

#
# Security options
#
# CONFIG_KEYS is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_CAPABILITIES=m
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_SECLVL=m
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
# CONFIG_SECURITY_SELINUX_DISABLE is not set
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
# CONFIG_SECURITY_SELINUX_MLS is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_AES_586=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_TEST=m

#
# Hardware crypto devices
#
# CONFIG_CRYPTO_DEV_PADLOCK is not set

#
# Library routines
#
CONFIG_CRC_CCITT=m
CONFIG_CRC32=y
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_PC=y

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Real-Time Preemption and UML?
  2005-02-08 21:44                 ` Ingo Molnar
@ 2005-02-08 23:02                   ` Esben Nielsen
  0 siblings, 0 replies; 125+ messages in thread
From: Esben Nielsen @ 2005-02-08 23:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jeff Dike, linux-kernel


On Tue, 8 Feb 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <simlo@phys.au.dk> wrote:
> 
> > Now I don't really know who I am responding to. But both up()s now
> > changed to complete()s are in something looking very much like an
> > interrupt handler. But again, as I said, I didn't analyze the code in
> > detail, I just made it compile and checked that it worked in bare
> > 2.6.11-rc2 UML - which I am not too sure how to set up and use to
> > begin with!
> 
> btw., UML is really easy to begin with: after you've compiled you get a
> 'linux' binary in the toplevel directory - just execute it via './linux'
> and you'll see a Linux kernel booting - that's all you need!
> 
> Add a filesystem image via a root= parameter to that command and the UML
> kernel will start booting that filesystem image. (if you are adventurous
> you can even boot a real partition, but for the first user this is
> strongly discouraged.) There are a number of UML-ready filesystem images
> downloadable from the net.
> 
Thanks, I managed to get that far after googling a bit. I have had some 
problems with the filesystem though. Fixed now (I forgot to compile ext3
in *blush*.) But you might still be interessted in this trace (2.6.11-rc2
with or without my changes):

line_ioctl: tty0: ioctl KDSIGACCEPT called
Debug: sleeping function called from invalid context at
include/asm/arch/semaphore.h:107
in_atomic():0, irqs_disabled():1
Call Trace: 
a08639e0:  [<a003071b>] __might_sleep+0x9b/0xb8
a0863a10:  [<a001d364>] uml_console_write+0x20/0x54
a0863a30:  [<a00348cc>] __call_console_drivers+0x50/0x58
a0863a60:  [<a00349c1>] call_console_drivers+0x7d/0x124
a0863a90:  [<a0034f97>] release_console_sem+0xa3/0x25c
a0863aa0:  [<a0034fb0>] release_console_sem+0xbc/0x25c
a0863ac0:  [<a0034d3b>] vprintk+0x193/0x2d0
a0863ae0:  [<a0034ba6>] printk+0x12/0x14
a0863b00:  [<a001e996>] line_ioctl+0x8e/0x94
a0863b24:  [<a001e908>] line_ioctl+0x0/0x94
a0863b30:  [<a012e031>] tty_ioctl+0xfd/0x680
a0863b80:  [<a00a253b>] do_ioctl+0x3f/0x64
a0863bb0:  [<a00a2b7d>] sys_ioctl+0x13d/0x350
a0863bd0:  [<a008971b>] sys_open+0x5b/0x74
a0863be0:  [<a008970c>] sys_open+0x4c/0x74
a0863c00:  [<a0018e8d>] execute_syscall_tt+0xa1/0xe0
a0863c1c:  [<a01a9357>] sigemptyset+0x17/0x30
a0863c70:  [<a0014eb2>] record_syscall_start+0x4e/0x58
a0863c90:  [<a0018f0b>] syscall_handler_tt+0x3f/0x74
a0863cc0:  [<a001a170>] sig_handler_common_tt+0x90/0x108
a0863cd0:  [<a001a1d1>] sig_handler_common_tt+0xf1/0x108
a0863d00:  [<a0028c13>] sig_handler+0x1f/0x38
a0863d20:  [<a01a9058>] __restore+0x0/0x8

It could look like a semaphore which should be replaced by a spinlock
(which will become a mutex in preempt-realtime :-)


Esben

> 	Ingo


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-08 21:58 ` William Weston
@ 2005-02-09 11:51   ` Ingo Molnar
  2005-02-10  2:13     ` William Weston
  2005-02-09 12:48   ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-09 11:51 UTC (permalink / raw)
  To: William Weston; +Cc: linux-kernel


* William Weston <weston@sysex.net> wrote:

> Jackd (-R -P64 -dalsa -dhw:0 -r44100 -p64 -n3 -i2 -o2) w/ one
> soft-synth client (using 15% to 30% of the CPU) will run for over 12
> hours without any xruns, even during kernel compiles and nightly
> updatedb runs.
> 
> Running wmcube (an impractical, greedy, little CPU meter), even when
> niced, causes lots of xruns.  It may be good for worst-case-scenario
> desktop load testing.

this phenomenon is very weird.

Firstly, make sure that all relevant threads (including the soundcard
IRQ thread, jackd threads, jack client thread, etc.) have higher RT
priority than any other, latency-irrelevant threads in the system.

If everything looks OK on the priority-administration side, could you
enable wakeup-latency tracing via:

 CONFIG_WAKEUP_TIMING=y
 CONFIG_PREEMPT_TRACE=y
 # CONFIG_CRITICAL_PREEMPT_TIMING is not set
 # CONFIG_CRITICAL_IRQSOFF_TIMING is not set
 CONFIG_LATENCY_TIMING=y
 CONFIG_LATENCY_TRACE=y

It should look like this in the Kernel Hacking menu of menuconfig:

       [*] Wakeup latency timing
       [ ] Non-preemptible critical section latency timing
       [ ] Interrupts-off critical section latency timing
       [*]   Latency tracing

what is the longest wakeup latency the tracer shows? You can start the
measurement anew via:

	echo 0 > /proc/sys/kernel/preempt_max_latency

every new maximum-latency event will be logged by the kernel, and the
trace of the latest worst-case latency path can be found under
/proc/latency_trace.

(If the trace is very long then most of the time it's OK to just send
the first 25 and last 10 lines. Putting the trace up to a website is a
good solution too.)

it should not matter how 'greedy' wmcube is. Does it do alot of graphics
activity (perhaps 3D too?) - that could in theory cause hardware
latencies - the latency traces will tell.

> MIDI playback through any MPU-401 interface triggers the following
> BUG, reported once for each outgoing MIDI event (non MPU-401 hw
> interfaces and sw interfaces not affected):

the patch below should fix this. (also included in -38-06 and later
kernels.)

	Ingo

--- linux/sound/drivers/mpu401/mpu401_uart.c.orig
+++ linux/sound/drivers/mpu401/mpu401_uart.c
@@ -316,12 +316,12 @@ static void snd_mpu401_uart_input_trigge
 		/* read data in advance */
 		/* prevent double enter via rawmidi->event callback */
 		if (atomic_dec_and_test(&mpu->rx_loop)) {
-			local_irq_save(flags);
+			local_irq_save_nort(flags);
 			if (spin_trylock(&mpu->input_lock)) {
 				snd_mpu401_uart_input_read(mpu);
 				spin_unlock(&mpu->input_lock);
 			}
-			local_irq_restore(flags);
+			local_irq_restore_nort(flags);
 		}
 		atomic_inc(&mpu->rx_loop);
 	} else {
@@ -407,12 +407,12 @@ static void snd_mpu401_uart_output_trigg
 		/* output pending data */
 		/* prevent double enter via rawmidi->event callback */
 		if (atomic_dec_and_test(&mpu->tx_loop)) {
-			local_irq_save(flags);
+			local_irq_save_nort(flags);
 			if (spin_trylock(&mpu->output_lock)) {
 				snd_mpu401_uart_output_write(mpu);
 				spin_unlock(&mpu->output_lock);
 			}
-			local_irq_restore(flags);
+			local_irq_restore_nort(flags);
 		}
 		atomic_inc(&mpu->tx_loop);
 	} else {


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-08 21:58 ` William Weston
  2005-02-09 11:51   ` Ingo Molnar
@ 2005-02-09 12:48   ` Stephen Smalley
  2005-02-10  2:20     ` William Weston
  1 sibling, 1 reply; 125+ messages in thread
From: Stephen Smalley @ 2005-02-09 12:48 UTC (permalink / raw)
  To: William Weston; +Cc: Ingo Molnar, lkml, James Morris

On Tue, 2005-02-08 at 16:58, William Weston wrote:
> Hi Ingo,
> 
> Great work on the -RT kernel!  Here's a status report from my Athlon box
> w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, 
> alsa-1.0.8, and latencytest-0.5.5:
<snip>
> A couple BUGs are being logged (see below), but without any ill effect
> other than taking up space on my /var.
<snip>
> Network interface (via rhine) startup triggers these two BUGs:
> 
> BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
> kernel/rt.c:1448
> in_atomic():1 [00000001], irqs_disabled():0
>  [<c0103e77>] dump_stack+0x17/0x20 (12)
>  [<c0119f89>] __might_sleep+0xd9/0xf0 (40)
>  [<c0134816>] __spin_lock+0x36/0x50 (24)
>  [<c0147914>] kmem_cache_alloc+0x34/0x120 (44)
>  [<c01d3143>] sel_netif_lookup+0x63/0x150 (28)
>  [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28)
>  [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144)

I'm not sure I understand, as sel_netif_lookup passes GFP_ATOMIC to
kmalloc.

>  [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28)
>  [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100)
>  [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32)
>  [<c02dc595>] ip_rcv+0x215/0x3f0 (56)
>  [<c02c201c>] netif_receive_skb+0x12c/0x160 (40)
>  [<c02c20ce>] process_backlog+0x7e/0x110 (32)
>  [<c02c21d2>] net_rx_action+0x72/0x130 (24)
>  [<c0122428>] ___do_softirq+0x48/0xd0 (40)
>  [<c012254b>] _do_softirq+0x1b/0x30 (8)
>  [<c0122920>] ksoftirqd+0xa0/0xf0 (28)
>  [<c01312fb>] kthread+0x8b/0xc0 (36)
>  [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692)
> ---------------------------
> | preempt count: 00000002 ]
> | 2-level deep critical section nesting:
> ----------------------------------------
> .. [<c013dd3f>] .... __do_IRQ+0xef/0x180
> .....[<c0105306>] ..   ( <= do_IRQ+0x56/0xa0)
> .. [<c0135240>] .... print_traces+0x10/0x40
> .....[<c0103e77>] ..   ( <= dump_stack+0x17/0x20)

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-09 11:51   ` Ingo Molnar
@ 2005-02-10  2:13     ` William Weston
  2005-02-10  7:52       ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: William Weston @ 2005-02-10  2:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Wed, 9 Feb 2005, Ingo Molnar wrote:

> > Running wmcube (an impractical, greedy, little CPU meter), even when
> > niced, causes lots of xruns.  It may be good for worst-case-scenario
> > desktop load testing.
> 
> this phenomenon is very weird.
> 
> Firstly, make sure that all relevant threads (including the soundcard
> IRQ thread, jackd threads, jack client thread, etc.) have higher RT
> priority than any other, latency-irrelevant threads in the system.

Thanks for the tip.  I have schedtool installed, and all audio/MIDI IRQ
threads, jack threads, and jack clients are now run with higher priorities
than everything else.  Before I adjusted priorities, I was getting a bunch 
of these when running latencytest (which have since disappeared):

rtc: lost some interrupts at 8192Hz.
bug in rtc_read(): called in state S_IDLE!

IRQ 8 (RTC) is still giving me some issues, even after adjusting 
priorities:

`IRQ 8'[232] is being piggy. need_resched=0, cpu=0
Read missed before next interrupt

Should the RTC IRQ be given a new priority?  If so, should it be lower, 
higher, or equal to the audio/MIDI/jack priorities?

> If everything looks OK on the priority-administration side, could you
> enable wakeup-latency tracing via:
> 
>  CONFIG_WAKEUP_TIMING=y
>  CONFIG_PREEMPT_TRACE=y
>  # CONFIG_CRITICAL_PREEMPT_TIMING is not set
>  # CONFIG_CRITICAL_IRQSOFF_TIMING is not set
>  CONFIG_LATENCY_TIMING=y
>  CONFIG_LATENCY_TRACE=y

<snip>

> what is the longest wakeup latency the tracer shows? You can start the
> measurement anew via:
> 
> 	echo 0 > /proc/sys/kernel/preempt_max_latency

Max latency is in the realm of 13-18 after runs of jack_test4.1.

> every new maximum-latency event will be logged by the kernel, and the
> trace of the latest worst-case latency path can be found under
> /proc/latency_trace.
> 
> (If the trace is very long then most of the time it's OK to just send
> the first 25 and last 10 lines. Putting the trace up to a website is a
> good solution too.)

See http://www.sysex.net/testing/ for the all of the test results and 
system info on a 2.6.11-rc3-RT-V0.7.38-06 kernel.

This is from my most recent run of jack_test4.1 with wmcube and kernel 
compilation running (check /testing/dmesg for more):

(            sshd-5940 |#0): new 4 s maximum-latency wakeup.
(          IRQ 16-1803 |#0): new 5 s maximum-latency wakeup.
(            make-28375|#0): new 6 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 6 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 7 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 8 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 8 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 9 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 10 s maximum-latency wakeup.
(     ksoftirqd/0-2    |#0): new 10 s maximum-latency wakeup.
(           jackd-29348|#0): new 12 s maximum-latency wakeup.
(           jackd-29348|#0): new 14 s maximum-latency wakeup.
(           jackd-29348|#0): new 15 s maximum-latency wakeup.

> it should not matter how 'greedy' wmcube is. Does it do alot of graphics
> activity (perhaps 3D too?) - that could in theory cause hardware
> latencies - the latency traces will tell.

Wmcube displays a 3D spinning cube, which spins faster (actually performs
larger rotations between updates) when CPU usage goes up.  When running
niced, wmcube uses about 1% to 4% of the CPU, adds about 1000 context
switches per second, and increases X load by 1% to 3% of the total CPU.  

Now that the priorities are tuned, I get no xruns while running wmcube, 
compiling a kernel, and running latencytest or jack_test4.1.

> > MIDI playback through any MPU-401 interface triggers the following
> > BUG, reported once for each outgoing MIDI event (non MPU-401 hw
> > interfaces and sw interfaces not affected):
> 
> the patch below should fix this. (also included in -38-06 and later
> kernels.)
> 
> 	Ingo
> 
> --- linux/sound/drivers/mpu401/mpu401_uart.c.orig
> +++ linux/sound/drivers/mpu401/mpu401_uart.c
> @@ -316,12 +316,12 @@ static void snd_mpu401_uart_input_trigge
>  		/* read data in advance */
>  		/* prevent double enter via rawmidi->event callback */
>  		if (atomic_dec_and_test(&mpu->rx_loop)) {
> -			local_irq_save(flags);
> +			local_irq_save_nort(flags);
>  			if (spin_trylock(&mpu->input_lock)) {
>  				snd_mpu401_uart_input_read(mpu);
>  				spin_unlock(&mpu->input_lock);
>  			}
> -			local_irq_restore(flags);
> +			local_irq_restore_nort(flags);
>  		}
>  		atomic_inc(&mpu->rx_loop);
>  	} else {
> @@ -407,12 +407,12 @@ static void snd_mpu401_uart_output_trigg
>  		/* output pending data */
>  		/* prevent double enter via rawmidi->event callback */
>  		if (atomic_dec_and_test(&mpu->tx_loop)) {
> -			local_irq_save(flags);
> +			local_irq_save_nort(flags);
>  			if (spin_trylock(&mpu->output_lock)) {
>  				snd_mpu401_uart_output_write(mpu);
>  				spin_unlock(&mpu->output_lock);
>  			}
> -			local_irq_restore(flags);
> +			local_irq_restore_nort(flags);
>  		}
>  		atomic_inc(&mpu->tx_loop);
>  	} else {

This patch does fix the MIDI playback BUG I was seeing.


Best Regards,
--William Weston <weston at sysex.net>


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-09 12:48   ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley
@ 2005-02-10  2:20     ` William Weston
  0 siblings, 0 replies; 125+ messages in thread
From: William Weston @ 2005-02-10  2:20 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: Ingo Molnar, lkml, James Morris

Two more of these sel_netif_lookup related BUGs were found with
-RT-2.6.11-rc3-V0.7.38-06:

BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
kernel/rt.c:1448
in_atomic():1 [00000001], irqs_disabled():0
 [<c0104183>] dump_stack+0x23/0x30 (20)
 [<c011be08>] __might_sleep+0xd8/0xf0 (36)
 [<c0139008>] __spin_lock+0x38/0x60 (24)
 [<c013904d>] _spin_lock+0x1d/0x20 (16)
 [<c015089f>] kmem_cache_alloc+0x3f/0x140 (44)
 [<c01ea1e9>] sel_netif_lookup+0x69/0x160 (40)
 [<c01ea3a8>] sel_netif_sids+0x38/0xd0 (40)
 [<c01e6c13>] selinux_socket_sock_rcv_skb+0xc3/0x2a0 (152)
 [<c032da2a>] udp_queue_rcv_skb+0xca/0x2d0 (40)
 [<c032e168>] udp_rcv+0x1c8/0x430 (96)
 [<c030ab3c>] ip_local_deliver+0x6c/0x210 (36)
 [<c030af19>] ip_rcv+0x239/0x430 (56)
 [<c02ed257>] netif_receive_skb+0x147/0x180 (48)
 [<c02ed30f>] process_backlog+0x7f/0x110 (28)
 [<c02ed41c>] net_rx_action+0x7c/0x130 (32)
 [<c0124e37>] ___do_softirq+0x57/0xf0 (40)
 [<c0124f75>] _do_softirq+0x25/0x30 (8)
 [<c0125395>] ksoftirqd+0xa5/0x100 (28)
 [<c0135676>] kthread+0xa6/0xe0 (48)
 [<c0101329>] kernel_thread_helper+0x5/0xc (537116692)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c0145d5b>] .... __do_IRQ+0xfb/0x1a0
.....[<c01058df>] ..   ( <= do_IRQ+0x6f/0xb0)
.. [<c013c5eb>] .... print_traces+0x1b/0x60
.....[<c0104183>] ..   ( <= dump_stack+0x23/0x30)

BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
kernel/rt.c:1448
in_atomic():1 [00000001], irqs_disabled():0
 [<c0104183>] dump_stack+0x23/0x30 (20)
 [<c011be08>] __might_sleep+0xd8/0xf0 (36)
 [<c0139008>] __spin_lock+0x38/0x60 (24)
 [<c013904d>] _spin_lock+0x1d/0x20 (16)
 [<c015089f>] kmem_cache_alloc+0x3f/0x140 (44)
 [<c01ea1e9>] sel_netif_lookup+0x69/0x160 (40)
 [<c01ea3a8>] sel_netif_sids+0x38/0xd0 (40)
 [<c01e6c13>] selinux_socket_sock_rcv_skb+0xc3/0x2a0 (152)
 [<c0326c72>] tcp_v4_rcv+0x502/0x950 (76)
 [<c030ab3c>] ip_local_deliver+0x6c/0x210 (36)
 [<c030af19>] ip_rcv+0x239/0x430 (56)
 [<c02ed257>] netif_receive_skb+0x147/0x180 (48)
 [<c02ed30f>] process_backlog+0x7f/0x110 (28)
 [<c02ed41c>] net_rx_action+0x7c/0x130 (32)
 [<c0124e37>] ___do_softirq+0x57/0xf0 (40)
 [<c0124f75>] _do_softirq+0x25/0x30 (8)
 [<c0125395>] ksoftirqd+0xa5/0x100 (28)
 [<c0135676>] kthread+0xa6/0xe0 (48)
 [<c0101329>] kernel_thread_helper+0x5/0xc (537116692)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c0145d5b>] .... __do_IRQ+0xfb/0x1a0
.....[<c01058df>] ..   ( <= do_IRQ+0x6f/0xb0)
.. [<c013c5eb>] .... print_traces+0x1b/0x60
.....[<c0104183>] ..   ( <= dump_stack+0x23/0x30)


Additional info about the system/kernel/config can be found at 
http://www.sysex.net/testing/


Best Regards,
--William Weston <weston at sysex.net>


On Wed, 9 Feb 2005, Stephen Smalley wrote:

> On Tue, 2005-02-08 at 16:58, William Weston wrote:
> > Hi Ingo,
> > 
> > Great work on the -RT kernel!  Here's a status report from my Athlon box
> > w/ kernel -RT-2.6.11-rc3-V0.7.38-03, realtime-lsm-0.8.5, jack-0.99.48, 
> > alsa-1.0.8, and latencytest-0.5.5:
> <snip>
> > A couple BUGs are being logged (see below), but without any ill effect
> > other than taking up space on my /var.
> <snip>
> > Network interface (via rhine) startup triggers these two BUGs:
> > 
> > BUG: sleeping function called from invalid context ksoftirqd/0(2) at 
> > kernel/rt.c:1448
> > in_atomic():1 [00000001], irqs_disabled():0
> >  [<c0103e77>] dump_stack+0x17/0x20 (12)
> >  [<c0119f89>] __might_sleep+0xd9/0xf0 (40)
> >  [<c0134816>] __spin_lock+0x36/0x50 (24)
> >  [<c0147914>] kmem_cache_alloc+0x34/0x120 (44)
> >  [<c01d3143>] sel_netif_lookup+0x63/0x150 (28)
> >  [<c01d32cd>] sel_netif_sids+0x2d/0xb0 (28)
> >  [<c01d01bc>] selinux_socket_sock_rcv_skb+0xac/0x230 (144)
> 
> I'm not sure I understand, as sel_netif_lookup passes GFP_ATOMIC to
> kmalloc.
> 
> >  [<c02fd248>] udp_queue_rcv_skb+0xb8/0x280 (28)
> >  [<c02fd8e2>] udp_rcv+0x192/0x3e0 (100)
> >  [<c02dc224>] ip_local_deliver+0x64/0x1c0 (32)
> >  [<c02dc595>] ip_rcv+0x215/0x3f0 (56)
> >  [<c02c201c>] netif_receive_skb+0x12c/0x160 (40)
> >  [<c02c20ce>] process_backlog+0x7e/0x110 (32)
> >  [<c02c21d2>] net_rx_action+0x72/0x130 (24)
> >  [<c0122428>] ___do_softirq+0x48/0xd0 (40)
> >  [<c012254b>] _do_softirq+0x1b/0x30 (8)
> >  [<c0122920>] ksoftirqd+0xa0/0xf0 (28)
> >  [<c01312fb>] kthread+0x8b/0xc0 (36)
> >  [<c01012f5>] kernel_thread_helper+0x5/0x10 (537116692)
> > ---------------------------
> > | preempt count: 00000002 ]
> > | 2-level deep critical section nesting:
> > ----------------------------------------
> > .. [<c013dd3f>] .... __do_IRQ+0xef/0x180
> > .....[<c0105306>] ..   ( <= do_IRQ+0x56/0xa0)
> > .. [<c0135240>] .... print_traces+0x10/0x40
> > .....[<c0103e77>] ..   ( <= dump_stack+0x17/0x20)
> 
> -- 
> Stephen Smalley <sds@epoch.ncsc.mil>
> National Security Agency

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10  2:13     ` William Weston
@ 2005-02-10  7:52       ` Ingo Molnar
  2005-02-10 20:21         ` George Anzinger
  2005-03-03 19:36         ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints
  0 siblings, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-10  7:52 UTC (permalink / raw)
  To: William Weston; +Cc: linux-kernel


* William Weston <weston@lysdexia.org> wrote:

> > what is the longest wakeup latency the tracer shows? You can start the
> > measurement anew via:
> > 
> > 	echo 0 > /proc/sys/kernel/preempt_max_latency
> 
> Max latency is in the realm of 13-18 after runs of jack_test4.1.

that's 13-18 microsecond worst-case delay from point of wakeup to the
point the woken up task has been context-switched to - pretty good for a
generic OS ;-)

> See http://www.sysex.net/testing/ for the all of the test results and
> system info on a 2.6.11-rc3-RT-V0.7.38-06 kernel.

your latency traces look perfectly fine, and the jack_test results look
good too.

> Now that the priorities are tuned, I get no xruns while running
> wmcube, compiling a kernel, and running latencytest or jack_test4.1.

ah, very good! Now that the setup is properly tuned for audio latencies,
you might want to try to push up the number of jack_test clients again,
to see how far you can go. Right now there's a ~50% DSP load with 14
clients, so maybe you can push it up to 20 clients. (for this test
you'll likely want to turn off all options in the 'Kernel Hacking' menu
- they increase overhead. Otherwise you probably want to run with the
current options, so that you can send me BUG and latency traces ;) )

> This patch does fix the MIDI playback BUG I was seeing.

ok.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10  7:52       ` Ingo Molnar
@ 2005-02-10 20:21         ` George Anzinger
  2005-02-10 20:40           ` Ingo Molnar
  2005-02-11  0:09           ` Sven Dietrich
  2005-03-03 19:36         ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints
  1 sibling, 2 replies; 125+ messages in thread
From: George Anzinger @ 2005-02-10 20:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: William Weston, linux-kernel

If I want to write a patch that will work with or without the RT patch applied 
is the following enough?

#ifndef RAW_SPIN_LOCK_UNLOCKED
typedef raw_spinlock_t spinlock_t
#define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
#endif


-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10 20:21         ` George Anzinger
@ 2005-02-10 20:40           ` Ingo Molnar
  2005-02-10 21:05             ` George Anzinger
  2005-02-11  0:09           ` Sven Dietrich
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-10 20:40 UTC (permalink / raw)
  To: George Anzinger; +Cc: William Weston, linux-kernel


* George Anzinger <george@mvista.com> wrote:

> If I want to write a patch that will work with or without the RT patch 
> applied is the following enough?
> 
> #ifndef RAW_SPIN_LOCK_UNLOCKED
> typedef raw_spinlock_t spinlock_t
> #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
> #endif

yeah. (but you should rather use DEFINE_SPINLOCK/DEFINE_RAW_SPINLOCK)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10 20:40           ` Ingo Molnar
@ 2005-02-10 21:05             ` George Anzinger
  2005-02-11  8:34               ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: George Anzinger @ 2005-02-10 21:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: William Weston, linux-kernel

I am seeing:
kernel/built-in.o(.text+0x4974): In function `copy_mm':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/fork.c:493: undefined 
reference to `__spin_is_locked'
kernel/built-in.o(.text+0x9f5a): In function `next_thread':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/exit.c:877: undefined 
reference to `__raw_rwlock_is_locked'
net/built-in.o(.text+0x1258): In function `__sock_create':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/socket.c:175: undefined 
reference to `__spin_is_locked'
net/built-in.o(.text+0x16b54): In function `dev_deactivate':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/sched/sch_generic.c:594: 
undefined reference to `__spin_is_locked'
make[1]: *** [vmlinux] Error 1
make: *** [bzImage] Error 2


Possibly from:
define __raw_spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
#define __raw_spin_unlock_wait(x) \
	do { barrier(); } while(__spin_is_locked(x))
in asm/spinlock.h

should that be __raw_spin_is_locked(x) instead?
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 125+ messages in thread

* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10 20:21         ` George Anzinger
  2005-02-10 20:40           ` Ingo Molnar
@ 2005-02-11  0:09           ` Sven Dietrich
  2005-02-11  6:01             ` George Anzinger
  2005-02-11  8:28             ` Ingo Molnar
  1 sibling, 2 replies; 125+ messages in thread
From: Sven Dietrich @ 2005-02-11  0:09 UTC (permalink / raw)
  To: george, 'Ingo Molnar'; +Cc: 'William Weston', linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]


Hi George,

you may want to use this for reference.

This patch adds a config option to allow you to select whether timer IRQ runs in thread or not.

I'm not totally happy with the #ifdefs, but it may make witching back and forth easier.

Sven


> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org 
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
> George Anzinger
> Sent: Thursday, February 10, 2005 12:21 PM
> To: Ingo Molnar
> Cc: William Weston; linux-kernel@vger.kernel.org
> Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
> 
> 
> If I want to write a patch that will work with or without the 
> RT patch applied 
> is the following enough?
> 
> #ifndef RAW_SPIN_LOCK_UNLOCKED
> typedef raw_spinlock_t spinlock_t
> #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
> #endif
> 
> 
> -- 
> George Anzinger   george@mvista.com
> High-res-timers:  http://sourceforge.net/projects/high-res-timers/
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: common_timer_irqthread.patch --]
[-- Type: application/octet-stream, Size: 1956 bytes --]

Index: linux-2.6.10-Omap1710/include/linux/time.h
===================================================================
--- linux-2.6.10-Omap1710.orig/include/linux/time.h	2005-02-03 09:06:40.378530238 +0000
+++ linux-2.6.10-Omap1710/include/linux/time.h	2005-02-03 09:20:37.703894461 +0000
@@ -80,7 +80,20 @@
 
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
-extern raw_seqlock_t xtime_lock;
+
+#ifndef ARCH_HAVE_XTIME_LOCK
+
+ #ifdef PREEMPT_TIMER_IRQ
+  #define XTIME_LOCK_T seqlock_t
+  #define DECLARE_XTIME_LOCK DECLARE_SEQLOCK(xtime_lock)
+ #else
+  #define XTIME_LOCK_T raw_seqlock_t
+  #define DECLARE_XTIME_LOCK DECLARE_RAW_SEQLOCK(xtime_lock)
+ #endif 
+
+extern XTIME_LOCK_T xtime_lock;
+
+#endif
 
 static inline unsigned long get_seconds(void)
 { 
Index: linux-2.6.10-Omap1710/kernel/timer.c
===================================================================
--- linux-2.6.10-Omap1710.orig/kernel/timer.c	2005-02-03 09:06:40.379529900 +0000
+++ linux-2.6.10-Omap1710/kernel/timer.c	2005-02-03 09:52:42.418866172 +0000
@@ -943,7 +943,7 @@
  * playing with xtime and avenrun.
  */
 #ifndef ARCH_HAVE_XTIME_LOCK
-DECLARE_RAW_SEQLOCK(xtime_lock);
+DECLARE_XTIME_LOCK;
 
 EXPORT_SYMBOL(xtime_lock);
 #endif
Index: linux-2.6.10-Omap1710/lib/Kconfig.RT
===================================================================
--- linux-2.6.10-Omap1710.orig/lib/Kconfig.RT	2005-02-03 09:06:40.379529900 +0000
+++ linux-2.6.10-Omap1710/lib/Kconfig.RT	2005-02-03 09:06:49.185545306 +0000
@@ -119,6 +119,14 @@
 
 	  Say N if you are unsure.
 
+config PREEMPT_TIMER_IRQ
+	bool "Run timer IRQ in a thread"
+       	default y
+	depends on PREEMPT_HARDIRQS && ARM
+	help
+	This declares the xtime_lock as a mutex and allows 
+        running the timer interrupt in a thread.
+
 config SPINLOCK_BKL
 	bool "Old-Style Big Kernel Lock"
 	depends on (PREEMPT || SMP) && !PREEMPT_RT

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  0:09           ` Sven Dietrich
@ 2005-02-11  6:01             ` George Anzinger
  2005-02-11  8:28             ` Ingo Molnar
  1 sibling, 0 replies; 125+ messages in thread
From: George Anzinger @ 2005-02-11  6:01 UTC (permalink / raw)
  To: Sven Dietrich
  Cc: 'Ingo Molnar', 'William Weston', linux-kernel

Sven Dietrich wrote:
> Hi George,
> 
> you may want to use this for reference.
> 
> This patch adds a config option to allow you to select whether timer IRQ runs in thread or not.
> 
> I'm not totally happy with the #ifdefs, but it may make witching back and forth easier.

Thanks, but...

You are addressing a different problem than I.  I want to code the VST patch to 
work in a system with or without the RT patch (it is easy to work with the RT 
option on or off).  The problem is setting up the spin locks it needs.  My 
solution assumes that RAW_SPIN_LOCK_UNLOCKED will not be defined unless the RT 
patch is applied.

As to your patch, in most archs the timer interrupt does accounting which 
requires input on just who was interrupted on the interrupt.  This is lost when 
threading the timer IRQ.  I think it was problems of this sort that caused Ingo 
to back away...

George

PS
By the way, your mailer (Microsoft Outlook????) set up your attachment in such a 
way that my mailer would not inline it.  You might want to look into this.
> 
> Sven
> 
> 
> 
>>-----Original Message-----
>>From: linux-kernel-owner@vger.kernel.org 
>>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>>George Anzinger
>>Sent: Thursday, February 10, 2005 12:21 PM
>>To: Ingo Molnar
>>Cc: William Weston; linux-kernel@vger.kernel.org
>>Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
>>
>>
>>If I want to write a patch that will work with or without the 
>>RT patch applied 
>>is the following enough?
>>
>>#ifndef RAW_SPIN_LOCK_UNLOCKED
>>typedef raw_spinlock_t spinlock_t
>>#define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
>>#endif
>>
>>
>>-- 
>>George Anzinger   george@mvista.com
>>High-res-timers:  http://sourceforge.net/projects/high-res-timers/
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe 
>>linux-kernel" in the body of a message to 
>>majordomo@vger.kernel.org More majordomo info at  
>>http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at  http://www.tux.org/lkml/
>>

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  0:09           ` Sven Dietrich
  2005-02-11  6:01             ` George Anzinger
@ 2005-02-11  8:28             ` Ingo Molnar
  2005-02-11  9:53               ` Sven Dietrich
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-11  8:28 UTC (permalink / raw)
  To: Sven Dietrich; +Cc: george, 'William Weston', linux-kernel


* Sven Dietrich <sdietrich@mvista.com> wrote:

> This patch adds a config option to allow you to select whether timer
> IRQ runs in thread or not.

this patch only changes xtime_lock back and forth - it does in no way
impact the 'threadedness' of the timer IRQ. (it does not move the timer
IRQ into an interrupt thread.)

nor do we really want to make it configurable - it's non-threaded right
now and we'll see what effect this has on the worst-case latencies. 

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-10 21:05             ` George Anzinger
@ 2005-02-11  8:34               ` Ingo Molnar
  2005-02-11  9:38                 ` Sven Dietrich
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-11  8:34 UTC (permalink / raw)
  To: George Anzinger; +Cc: William Weston, linux-kernel


* George Anzinger <george@mvista.com> wrote:

> Possibly from:
> define __raw_spin_is_locked(x)	(*(volatile signed char *)(&(x)->lock) <= 0)
> #define __raw_spin_unlock_wait(x) \
> 	do { barrier(); } while(__spin_is_locked(x))
> in asm/spinlock.h
> 
> should that be __raw_spin_is_locked(x) instead?

yeah. Is this in the ARM patch? I havent applied the ARM patch yet,
waiting to see Thomas Gleixner's generic-hardirq based one. (which is
more compelling from an architectural and long-term maintainance POV -
but also more work to address all of RMK's concerns.)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  8:34               ` Ingo Molnar
@ 2005-02-11  9:38                 ` Sven Dietrich
  2005-02-11  9:42                   ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Sven Dietrich @ 2005-02-11  9:38 UTC (permalink / raw)
  To: 'Ingo Molnar', 'George Anzinger'
  Cc: 'William Weston', linux-kernel


No, this is not in arm. Here is the patch.

Index: linux-2.6.10/include/asm-i386/spinlock.h
===================================================================
--- linux-2.6.10.orig/include/asm-i386/spinlock.h      2005-02-11 09:25:39.224240321 +0000
+++ linux-2.6.10/include/asm-i386/spinlock.h   2005-02-11 09:25:58.006812173 +0000
@@ -30,7 +30,7 @@

 #define __raw_spin_is_locked(x)        (*(volatile signed char *)(&(x)->lock) <= 0)
 #define __raw_spin_unlock_wait(x) \
-       do { barrier(); } while(__spin_is_locked(x))
+       do { barrier(); } while(__raw_spin_is_locked(x))

 #define spin_lock_string \
        "\n1:\t" \




> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org 
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Ingo Molnar
> Sent: Friday, February 11, 2005 12:34 AM
> To: George Anzinger
> Cc: William Weston; linux-kernel@vger.kernel.org
> Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
> 
> 
> 
> * George Anzinger <george@mvista.com> wrote:
> 
> > Possibly from:
> > define __raw_spin_is_locked(x)	(*(volatile signed char 
> *)(&(x)->lock) <= 0)
> > #define __raw_spin_unlock_wait(x) \
> > 	do { barrier(); } while(__spin_is_locked(x))
> > in asm/spinlock.h
> > 
> > should that be __raw_spin_is_locked(x) instead?
> 
> yeah. Is this in the ARM patch? I havent applied the ARM 
> patch yet, waiting to see Thomas Gleixner's generic-hardirq 
> based one. (which is more compelling from an architectural 
> and long-term maintainance POV - but also more work to 
> address all of RMK's concerns.)
> 
> 	Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  9:38                 ` Sven Dietrich
@ 2005-02-11  9:42                   ` Ingo Molnar
  0 siblings, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-11  9:42 UTC (permalink / raw)
  To: Sven Dietrich
  Cc: 'George Anzinger', 'William Weston', linux-kernel


* Sven Dietrich <sdietrich@mvista.com> wrote:

> No, this is not in arm. Here is the patch.
> 
> Index: linux-2.6.10/include/asm-i386/spinlock.h

what version do you have? The current released patch is
2.6.11-rc3-V0.7.38-10.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  8:28             ` Ingo Molnar
@ 2005-02-11  9:53               ` Sven Dietrich
  2005-02-11 10:04                 ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Sven Dietrich @ 2005-02-11  9:53 UTC (permalink / raw)
  To: 'Ingo Molnar'; +Cc: george, 'William Weston', linux-kernel



Ingo wrote:

> 
> * Sven Dietrich <sdietrich@mvista.com> wrote:
> 
> > This patch adds a config option to allow you to select 
> whether timer 
> > IRQ runs in thread or not.
> 
> this patch only changes xtime_lock back and forth - it does 
> in no way impact the 'threadedness' of the timer IRQ. (it 
> does not move the timer IRQ into an interrupt thread.)
> 
> nor do we really want to make it configurable - it's 
> non-threaded right now and we'll see what effect this has on 
> the worst-case latencies. 
> 
> 	Ingo
> 

Its clear that there are all sorts of issues 
with process accounting and other race conditions
associated with running the timer in a thread.

The timer IRQ does have a noticable impact 
especially on the slower CPUS. In this domain,
precise process time accounting may not be 
all that important, as long as the scheduler
does not get confused, and that lone NODELAY
IRQ doesn't get delayed (as much).

It would be nice if some of the process 
accounting could be pipelined or deferred,
but I don't have those answers right now.

Sven


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11  9:53               ` Sven Dietrich
@ 2005-02-11 10:04                 ` Ingo Molnar
  2005-02-11 21:49                   ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-11 10:04 UTC (permalink / raw)
  To: Sven Dietrich; +Cc: george, 'William Weston', linux-kernel


* Sven Dietrich <sdietrich@mvista.com> wrote:

> > this patch only changes xtime_lock back and forth - it does 
> > in no way impact the 'threadedness' of the timer IRQ. (it 
> > does not move the timer IRQ into an interrupt thread.)
> > 
> > nor do we really want to make it configurable - it's 
> > non-threaded right now and we'll see what effect this has on 
> > the worst-case latencies. 
> 
> Its clear that there are all sorts of issues with process accounting
> and other race conditions associated with running the timer in a
> thread.
> 
> The timer IRQ does have a noticable impact especially on the slower
> CPUS. In this domain, precise process time accounting may not be all
> that important, as long as the scheduler does not get confused, and
> that lone NODELAY IRQ doesn't get delayed (as much).

well, i saved the delta when i removed threaded timer IRQs, find the
patch below, apply it with -R to -RT-V0.7.37-00 to get threaded irqs
back on x86.

Right now i dont plan to reintroduce threaded timer IRQs because it
causes architecture merging problems (e.g. on x64 and MIPS) and also
caused artifacts. So the complexity vs. latency benefit is not all that
clear, especially at this stage. Also note that there were unsolved
problems wrt. time handling in the threaded setup.

(we can try it again later on. But if we do so it will have to be an
all-or-nothing item - #ifdef hell and behavioral divergence is to be
avoided.)

	Ingo

--- linux.old/Makefile	
+++ linux.new/Makefile	
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION =-rc2-RT-V0.7.36-06
+EXTRAVERSION =-rc2-RT-V0.7.37-00
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
--- linux.old/arch/i386/kernel/irq.c	
+++ linux.new/arch/i386/kernel/irq.c	
@@ -70,8 +70,6 @@ fastcall notrace unsigned int do_IRQ(str
 		}
 	}
 #endif
-	if (unlikely(!irq))
-		direct_timer_interrupt(regs);
 
 #ifdef CONFIG_4KSTACKS
 
--- linux.old/arch/i386/kernel/time.c	
+++ linux.new/arch/i386/kernel/time.c	
@@ -82,7 +82,7 @@ unsigned long cpu_khz;	/* Detected as we
 
 extern unsigned long wall_jiffies;
 
-DEFINE_SPINLOCK(rtc_lock);
+DEFINE_RAW_SPINLOCK(rtc_lock);
 
 #include <asm/i8253.h>
 
@@ -217,19 +217,6 @@ unsigned long notrace profile_pc(struct 
 EXPORT_SYMBOL(profile_pc);
 #endif
 
-#ifdef CONFIG_PREEMPT_HARDIRQS
-
-/*
- * If the timer is redirected then this is the minimal
- * interrupt-context processing we have to do:
- */
-void direct_timer_interrupt(struct pt_regs *regs)
-{
-	do_timer_interrupt_hook(regs);
-}
-
-#endif
-
 /*
  * timer_interrupt() needs to keep up the real-time clock,
  * as well as call the "do_timer()" routine every clocktick
@@ -254,9 +241,7 @@ static inline void do_timer_interrupt(in
 	}
 #endif
 
-#ifndef CONFIG_PREEMPT_HARDIRQS
 	do_timer_interrupt_hook(regs);
-#endif
 
 	/*
 	 * If we have an externally synchronized Linux clock, then update
@@ -313,7 +298,6 @@ irqreturn_t timer_interrupt(int irq, voi
 	write_seqlock(&xtime_lock);
 
 	cur_timer->mark_offset();
-	do_timer(regs);
  
 	do_timer_interrupt(irq, NULL, regs);
 
--- linux.old/arch/i386/mach-default/setup.c	
+++ linux.new/arch/i386/mach-default/setup.c	
@@ -71,7 +71,7 @@ void __init trap_init_hook(void)
 {
 }
 
-static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL};
+static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT | SA_NODELAY, CPU_MASK_NONE, "timer", NULL, NULL};
 
 /**
  * time_init_hook - do any specific initialisations for the system timer.
--- linux.old/drivers/char/rtc.c	
+++ linux.new/drivers/char/rtc.c	
@@ -380,6 +380,8 @@ static inline void rtc_close_event(void)
 
 irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
+	int mod;
+
 	/*
 	 *	Can be an alarm interrupt, update complete interrupt,
 	 *	or a periodic interrupt. We store the status in the
@@ -401,10 +403,13 @@ irqreturn_t rtc_interrupt(int irq, void 
 		rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0);
 	}
 
+	mod = 0;
 	if (rtc_status & RTC_TIMER_ON)
-		mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
+		mod = 1;
 
 	spin_unlock (&rtc_lock);
+	if (mod)
+		mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
 
 	/* Now do the rest of the actions */
 	spin_lock(&rtc_task_lock);
@@ -569,8 +574,8 @@ static int rtc_do_ioctl(unsigned int cmd
 		if (rtc_status & RTC_TIMER_ON) {
 			spin_lock_irq (&rtc_lock);
 			rtc_status &= ~RTC_TIMER_ON;
-			del_timer(&rtc_irq_timer);
 			spin_unlock_irq (&rtc_lock);
+			del_timer(&rtc_irq_timer);
 		}
 		return 0;
 	}
@@ -588,9 +593,9 @@ static int rtc_do_ioctl(unsigned int cmd
 		if (!(rtc_status & RTC_TIMER_ON)) {
 			spin_lock_irq (&rtc_lock);
 			rtc_irq_timer.expires = jiffies + HZ/rtc_freq + 2*HZ/100;
-			add_timer(&rtc_irq_timer);
 			rtc_status |= RTC_TIMER_ON;
 			spin_unlock_irq (&rtc_lock);
+			add_timer(&rtc_irq_timer);
 		}
 		set_rtc_irq_bit(RTC_PIE);
 		return 0;
@@ -882,6 +887,7 @@ static int rtc_release(struct inode *ino
 {
 #ifdef RTC_IRQ
 	unsigned char tmp;
+	int del;
 
 	if (rtc_has_irq == 0)
 		goto no_irq;
@@ -900,11 +906,14 @@ static int rtc_release(struct inode *ino
 		CMOS_WRITE(tmp, RTC_CONTROL);
 		CMOS_READ(RTC_INTR_FLAGS);
 	}
+	del = 0;
 	if (rtc_status & RTC_TIMER_ON) {
 		rtc_status &= ~RTC_TIMER_ON;
-		del_timer(&rtc_irq_timer);
+		del = 1;
 	}
 	spin_unlock_irq(&rtc_lock);
+	if (del)
+		del_timer(&rtc_irq_timer);
 
 	if (file->f_flags & FASYNC) {
 		rtc_fasync (-1, file, 0);
@@ -981,6 +990,7 @@ int rtc_unregister(rtc_task_t *task)
 	return -EIO;
 #else
 	unsigned char tmp;
+	int del;
 
 	spin_lock_irq(&rtc_lock);
 	spin_lock(&rtc_task_lock);
@@ -1000,12 +1010,15 @@ int rtc_unregister(rtc_task_t *task)
 		CMOS_WRITE(tmp, RTC_CONTROL);
 		CMOS_READ(RTC_INTR_FLAGS);
 	}
+	del = 0;
 	if (rtc_status & RTC_TIMER_ON) {
 		rtc_status &= ~RTC_TIMER_ON;
-		del_timer(&rtc_irq_timer);
+		del = 1;
 	}
 	rtc_status &= ~RTC_IS_OPEN;
 	spin_unlock(&rtc_task_lock);
+	if (del)
+		del_timer(&rtc_irq_timer);
 	spin_unlock_irq(&rtc_lock);
 	return 0;
 #endif
@@ -1254,6 +1267,7 @@ module_exit(rtc_exit);
 static void rtc_dropped_irq(unsigned long data)
 {
 	unsigned long freq;
+	int mod;
 
 	spin_lock_irq (&rtc_lock);
 
@@ -1263,8 +1277,9 @@ static void rtc_dropped_irq(unsigned lon
 	}
 
 	/* Just in case someone disabled the timer from behind our back... */
+	mod = 0;
 	if (rtc_status & RTC_TIMER_ON)
-		mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
+		mod = 1;
 
 	rtc_irq_data += ((rtc_freq/HZ)<<8);
 	rtc_irq_data &= ~0xff;
@@ -1273,6 +1288,8 @@ static void rtc_dropped_irq(unsigned lon
 	freq = rtc_freq;
 
 	spin_unlock_irq(&rtc_lock);
+	if (mod)
+		mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);
 
 	printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n", freq);
 
--- linux.old/include/asm-i386/mach-default/do_timer.h	
+++ linux.new/include/asm-i386/mach-default/do_timer.h	
@@ -16,6 +16,7 @@
 
 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
+	do_timer(regs);
 #ifndef CONFIG_SMP
 	update_process_times(user_mode(regs));
 #endif
--- linux.old/include/linux/mc146818rtc.h	
+++ linux.new/include/linux/mc146818rtc.h	
@@ -17,7 +17,7 @@
 
 #ifdef __KERNEL__
 #include <linux/spinlock.h>		/* spinlock_t */
-extern spinlock_t rtc_lock;		/* serialize CMOS RAM access */
+extern raw_spinlock_t rtc_lock;		/* serialize CMOS RAM access */
 #endif
 
 /**********************************************************************
--- linux.old/include/linux/sched.h	
+++ linux.new/include/linux/sched.h	
@@ -39,10 +39,8 @@ extern int softirq_preemption;
 #endif
 #ifdef CONFIG_PREEMPT_HARDIRQS
 extern int hardirq_preemption;
-extern void direct_timer_interrupt(struct pt_regs *regs);
 #else
 # define hardirq_preemption 0
-# define direct_timer_interrupt(regs) do { } while (0)
 #endif
 
 #ifdef CONFIG_PREEMPT_BKL
--- linux.old/include/linux/time.h	
+++ linux.new/include/linux/time.h	
@@ -80,7 +80,7 @@ mktime (unsigned int year, unsigned int 
 
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
-extern seqlock_t xtime_lock;
+extern raw_seqlock_t xtime_lock;
 
 static inline unsigned long get_seconds(void)
 { 
--- linux.old/kernel/timer.c	
+++ linux.new/kernel/timer.c	
@@ -852,14 +852,7 @@ void update_process_times(int user_tick)
  */
 static unsigned long count_active_tasks(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-	/*
-	 * -1 for the timer IRQ thread:
-	 */
-	return (nr_running() - 1 + nr_uninterruptible()) * FIXED_1;
-#else
 	return (nr_running() + nr_uninterruptible()) * FIXED_1;
-#endif
 }
 
 /*
@@ -899,7 +892,7 @@ unsigned long wall_jiffies = INITIAL_JIF
  * playing with xtime and avenrun.
  */
 #ifndef ARCH_HAVE_XTIME_LOCK
-DECLARE_SEQLOCK(xtime_lock);
+DECLARE_RAW_SEQLOCK(xtime_lock);
 
 EXPORT_SYMBOL(xtime_lock);
 #endif

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11 10:04                 ` Ingo Molnar
@ 2005-02-11 21:49                   ` Steven Rostedt
  2005-02-13 12:59                     ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-02-11 21:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Ingo,

Here's a trivial patch to help others from freaking out when they see on
a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. 

Index: kernel/sched.c
===================================================================
--- kernel/sched.c	(revision 75)
+++ kernel/sched.c	(working copy)
@@ -4489,7 +4489,7 @@
 	task_t *relative;
 	unsigned state;
 	unsigned long free = 0;
-	static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" };
+	static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", "X" };
 
 	printk("%-13.13s [%p]", p->comm, p);
 	state = p->state ? __ffs(p->state) + 1 : 0;


I figure that "M" would be a good fit for TASK_RUNNING_MUTEX.

-- Steve



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-11 21:49                   ` Steven Rostedt
@ 2005-02-13 12:59                     ` Ingo Molnar
  2005-02-13 15:11                       ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-13 12:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> Here's a trivial patch to help others from freaking out when they see
> on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. 

thanks, applied it to -39-00.

> -	static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" };
> +	static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", "X" };

> I figure that "M" would be a good fit for TASK_RUNNING_MUTEX.

yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c
case.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-13 12:59                     ` Ingo Molnar
@ 2005-02-13 15:11                       ` Steven Rostedt
  0 siblings, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-02-13 15:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

On Sun, 2005-02-13 at 13:59 +0100, Ingo Molnar wrote:

> yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c
> case.
> 

You also missed the kernel/rt.c case :-)

-- Steve


Index: kernel/rt.c
===================================================================
--- kernel/rt.c	(revision 75)
+++ kernel/rt.c	(working copy)
@@ -207,6 +207,7 @@
 {
 	switch (p->state) {
 	case TASK_RUNNING:		printk("R"); break;
+	case TASK_RUNNING_MUTEX:	printk("M"); break;
 	case TASK_INTERRUPTIBLE:	printk("s"); break;
 	case TASK_UNINTERRUPTIBLE:	printk("D"); break;
 	case TASK_STOPPED:		printk("T"); break;


This is still from the 38-06.




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
                   ` (4 preceding siblings ...)
  2005-02-08 21:58 ` William Weston
@ 2005-02-19  5:08 ` Lee Revell
  2005-02-19  6:47   ` Lee Revell
                     ` (2 more replies)
  2005-03-11  9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar
  6 siblings, 3 replies; 125+ messages in thread
From: Lee Revell @ 2005-02-19  5:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote:
>   http://redhat.com/~mingo/realtime-preempt/
> 

Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.

preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02
--------------------------------------------------------------------
 latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
    -----------------
    | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
    -----------------

                 _------=> CPU#        
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /        
               |||||     delay        
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /        
kjournal-2478  0dn.4    0µs!: <756f6a6b> (<6c616e72>)
kjournal-2478  0dn.4    0µs : __trace_start_sched_wakeup (try_to_wake_up)
kjournal-2478  0dn.3    0µs : preempt_schedule (try_to_wake_up)
kjournal-2478  0dn.3    0µs : try_to_wake_up <<...>-2> (69 73): 
kjournal-2478  0dn.2    0µs : preempt_schedule (try_to_wake_up)
kjournal-2478  0dn.2    0µs : wake_up_process (do_softirq)
kjournal-2478  0dn.1    1µs < (1)

The repeating pattern is 8 of these:

kjournal-2478  0.n.1    1µs : inverted_lock (journal_commit_transaction)
kjournal-2478  0.n.1    1µs : __journal_unfile_buffer (journal_commit_transaction)
kjournal-2478  0.n.1    1µs : journal_remove_journal_head (journal_commit_transaction)
kjournal-2478  0.n.1    1µs : __journal_remove_journal_head (journal_remove_journal_head)
kjournal-2478  0.n.1    1µs : __brelse (__journal_remove_journal_head)
kjournal-2478  0.n.1    1µs : journal_free_journal_head (journal_remove_journal_head)
kjournal-2478  0.n.1    2µs : kmem_cache_free (journal_free_journal_head)

and one of these:

kjournal-2478  0dn.1    9µs : cache_flusharray (kmem_cache_free)
kjournal-2478  0dn.2    9µs : free_block (cache_flusharray)
kjournal-2478  0dn.1   11µs : preempt_schedule (cache_flusharray)
kjournal-2478  0dn.1   11µs : memmove (cache_flusharray)
kjournal-2478  0dn.1   11µs : memcpy (memmove)

etc.  Finally:

kjournal-2478  0dn.1  704µs : cache_flusharray (kmem_cache_free)
kjournal-2478  0dn.2  704µs+: free_block (cache_flusharray)
kjournal-2478  0dn.1  707µs : preempt_schedule (cache_flusharray)
kjournal-2478  0dn.1  707µs : memmove (cache_flusharray)
kjournal-2478  0dn.1  707µs : memcpy (memmove)
kjournal-2478  0.n.1  708µs : inverted_lock (journal_commit_transaction)
kjournal-2478  0.n.1  708µs : __journal_unfile_buffer (journal_commit_transaction)
kjournal-2478  0.n.1  709µs : journal_remove_journal_head (journal_commit_transaction)
kjournal-2478  0.n.1  709µs : __journal_remove_journal_head (journal_remove_journal_head)
kjournal-2478  0.n.1  709µs : __brelse (__journal_remove_journal_head)
kjournal-2478  0.n.1  709µs : journal_free_journal_head (journal_remove_journal_head)
kjournal-2478  0.n.1  709µs : kmem_cache_free (journal_free_journal_head)
kjournal-2478  0.n..  710µs : preempt_schedule (journal_commit_transaction)
kjournal-2478  0dn..  710µs : __schedule (preempt_schedule)
kjournal-2478  0dn..  710µs : profile_hit (__schedule)
kjournal-2478  0dn.1  710µs : sched_clock (__schedule)
kjournal-2478  0dn.2  711µs : dequeue_task (__schedule)
kjournal-2478  0dn.2  711µs : recalc_task_prio (__schedule)
kjournal-2478  0dn.2  711µs : effective_prio (recalc_task_prio)
kjournal-2478  0dn.2  711µs : enqueue_task (__schedule)
   <...>-2     0d..2  712µs : __switch_to (__schedule)
   <...>-2     0d..2  712µs : __schedule <kjournal-2478> (73 69):
   <...>-2     0d..2  712µs : finish_task_switch (__schedule)
   <...>-2     0d..1  712µs : trace_stop_sched_switched (finish_task_switch)
   <...>-2     0d..1  712µs : trace_stop_sched_switched <<...>-2> (69 0):
   <...>-2     0d..1  713µs : trace_stop_sched_switched (finish_task_switch)

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  5:08 ` Lee Revell
@ 2005-02-19  6:47   ` Lee Revell
  2005-02-19  9:00   ` Ingo Molnar
  2005-03-10  9:37   ` Steven Rostedt
  2 siblings, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-02-19  6:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sat, 2005-02-19 at 00:08 -0500, Lee Revell wrote:
> On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote:
> >   http://redhat.com/~mingo/realtime-preempt/
> > 
> 
> Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.

If I mount all filesystems with 'data=writeback', it works perfectly.  I
can run 'dbench 64', JACK with Hydrogen at 32 frames and have been
unable to produce a single xrun.  The maximum wakeup latency I have seen
is 139us.  With 'data=ordered', just launching a web browser can produce
an xrun.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  5:08 ` Lee Revell
  2005-02-19  6:47   ` Lee Revell
@ 2005-02-19  9:00   ` Ingo Molnar
  2005-02-19  9:03     ` Ingo Molnar
  2005-03-10  9:37   ` Steven Rostedt
  2 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-02-19  9:00 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel, Andrew Morton


* Lee Revell <rlrevell@joe-job.com> wrote:

> On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote:
> >   http://redhat.com/~mingo/realtime-preempt/
> > 
> 
> Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.

could you send me the full trace?

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  9:00   ` Ingo Molnar
@ 2005-02-19  9:03     ` Ingo Molnar
  2005-02-19 20:45       ` Lee Revell
  2005-02-23  2:22       ` Lee Revell
  0 siblings, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-02-19  9:03 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
> 
> could you send me the full trace?

just in case the system in question is still running - could you also do 
a 'verbose' trace via:

	echo 1 > /proc/sys/kernel/trace_verbose

and then copying /proc/latency_trace again? (so that we can see the
precise function call offsets - journal_commit_transaction() is a long
function.)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  9:03     ` Ingo Molnar
@ 2005-02-19 20:45       ` Lee Revell
  2005-02-20  0:19         ` Lee Revell
  2005-03-17 16:33         ` Lee Revell
  2005-02-23  2:22       ` Lee Revell
  1 sibling, 2 replies; 125+ messages in thread
From: Lee Revell @ 2005-02-19 20:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
> > 
> > could you send me the full trace?
> 
> just in case the system in question is still running - could you also do 
> a 'verbose' trace via:
> 
> 	echo 1 > /proc/sys/kernel/trace_verbose

OK, here is a 2912us verbose latency trace with "data=ordered", gzipped.
dbench 32 or 64 is the easiest way to trigger these.

I have not tried "data=journal".  As previously stated "data=writeback"
works perfectly - I ran JACK overnight while stressing the fs and did
not get one xrun.

Lee

[-- Attachment #2: 2912us.gz --]
[-- Type: application/x-gzip, Size: 56838 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19 20:45       ` Lee Revell
@ 2005-02-20  0:19         ` Lee Revell
  2005-03-17 16:33         ` Lee Revell
  1 sibling, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-02-20  0:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton

On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote:
> I have not tried "data=journal".  As previously stated "data=writeback"
> works perfectly - I ran JACK overnight while stressing the fs and did
> not get one xrun.

"data=journal" has the same good performance as "data=writeback".  Only
the ordered data mode is affected.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  9:03     ` Ingo Molnar
  2005-02-19 20:45       ` Lee Revell
@ 2005-02-23  2:22       ` Lee Revell
  1 sibling, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-02-23  2:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton

On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
> > 
> > could you send me the full trace?
> 

On my other machine this 333us trace is the longest latency reported in
the first few minutes with PREEMPT_DESKTOP.  It seems to be a regression
from earlier versions.  If I read the trace right copy_pte_range is the
problem.

Lee

preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02
--------------------------------------------------------------------
 latency: 333 µs, #63/63, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
    -----------------
    | task: XFree86-2593 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
(T1/#0)             dpkg  4362 0 5 00000006 00000000 [0000380181315825] 0.000ms (+3550398.796ms): <676b7064> (<00746500>)
(T1/#2)             dpkg  4362 0 5 00000006 00000002 [0000380181316227] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 <c012cbe6> (try_to_wake_up+0x81/0x150 <c010f911>)
(T1/#3)             dpkg  4362 0 5 00000004 00000003 [0000380181316766] 0.001ms (+0.001ms): wake_up_state+0x1e/0x30 <c010fa5e> (signal_wake_up+0x2d/0x30 <c011f7bd>)
(T1/#4)             dpkg  4362 0 5 00000000 00000004 [0000380181317637] 0.003ms (+0.000ms): __wake_up+0xe/0x70 <c011059e> (mousedev_event+0xd8/0x140 <c0223ac8>)
(T1/#5)             dpkg  4362 0 5 00000001 00000005 [0000380181318080] 0.003ms (+0.001ms): __wake_up_common+0xb/0x70 <c011052b> (__wake_up+0x3b/0x70 <c01105cb>)
(T1/#6)             dpkg  4362 0 5 00000000 00000006 [0000380181318983] 0.005ms (+0.002ms): usb_submit_urb+0xe/0x2c0 <dcabaefe> (hid_irq_in+0x4e/0xe0 <dca7335e>)
(T1/#7)             dpkg  4362 0 5 00000000 00000007 [0000380181320688] 0.008ms (+0.001ms): hcd_submit_urb+0xe/0x200 <dcaba57e> (usb_submit_urb+0x1c6/0x2c0 <dcabb0b6>)
(T1/#8)             dpkg  4362 0 5 00000001 00000008 [0000380181321463] 0.009ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (hcd_submit_urb+0x1a9/0x200 <dcaba719>)
(T1/#9)             dpkg  4362 0 5 00000001 00000009 [0000380181321943] 0.010ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>)
(T1/#10)             dpkg  4362 0 5 00000001 0000000a [0000380181322283] 0.010ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>)
(T1/#11)             dpkg  4362 0 5 00000001 0000000b [0000380181322691] 0.011ms (+0.001ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>)
(T1/#12)             dpkg  4362 0 5 00000000 0000000c [0000380181323295] 0.012ms (+0.000ms): usb_get_urb+0x9/0x20 <dcabaed9> (hcd_submit_urb+0xc6/0x200 <dcaba636>)
(T1/#13)             dpkg  4362 0 5 00000000 0000000d [0000380181323566] 0.012ms (+0.001ms): kref_get+0x9/0x60 <c01d8339> (usb_get_urb+0x16/0x20 <dcabaee6>)
(T1/#14)             dpkg  4362 0 5 00000000 0000000e [0000380181324216] 0.013ms (+0.000ms): uhci_urb_enqueue+0xe/0x290 <dca6bf4e> (hcd_submit_urb+0x123/0x200 <dcaba693>)
(T1/#15)             dpkg  4362 0 5 00000001 0000000f [0000380181324743] 0.014ms (+0.000ms): uhci_find_urb_ep+0xe/0xb0 <dca6be9e> (uhci_urb_enqueue+0x7a/0x290 <dca6bfba>)
(T1/#16)             dpkg  4362 0 5 00000001 00000010 [0000380181325251] 0.015ms (+0.000ms): uhci_alloc_urb_priv+0xb/0x80 <dca6aebb> (uhci_urb_enqueue+0x87/0x290 <dca6bfc7>)
(T1/#17)             dpkg  4362 0 5 00000001 00000011 [0000380181325582] 0.016ms (+0.001ms): kmem_cache_alloc+0xb/0x70 <c013dc6b> (uhci_alloc_urb_priv+0x1c/0x80 <dca6aecc>)
(T1/#18)             dpkg  4362 0 5 00000001 00000012 [0000380181326332] 0.017ms (+0.000ms): usb_check_bandwidth+0xc/0x140 <dcaba2fc> (uhci_urb_enqueue+0x200/0x290 <dca6c140>)
(T1/#19)             dpkg  4362 0 5 00000001 00000013 [0000380181326926] 0.018ms (+0.001ms): usb_calc_bus_time+0x9/0x270 <dcaba089> (usb_check_bandwidth+0x6b/0x140 <dcaba35b>)
(T1/#20)             dpkg  4362 0 5 00000001 00000014 [0000380181327893] 0.020ms (+0.001ms): uhci_submit_common+0xe/0x380 <dca6b77e> (uhci_urb_enqueue+0x239/0x290 <dca6c179>)
(T1/#21)             dpkg  4362 0 5 00000001 00000015 [0000380181328984] 0.021ms (+0.001ms): uhci_alloc_td+0xb/0x80 <dca6a5bb> (uhci_submit_common+0xf0/0x380 <dca6b860>)
(T1/#22)             dpkg  4362 0 5 00000001 00000016 [0000380181329685] 0.023ms (+0.002ms): dma_pool_alloc+0xe/0x1a0 <c02051fe> (uhci_alloc_td+0x20/0x80 <dca6a5d0>)
(T1/#23)             dpkg  4362 0 5 00000001 00000017 [0000380181331207] 0.025ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (uhci_alloc_td+0x69/0x80 <dca6a619>)
(T1/#24)             dpkg  4362 0 5 00000001 00000018 [0000380181331544] 0.026ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>)
(T1/#25)             dpkg  4362 0 5 00000001 00000019 [0000380181331882] 0.026ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>)
(T1/#26)             dpkg  4362 0 5 00000001 0000001a [0000380181332215] 0.027ms (+0.000ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>)
(T1/#27)             dpkg  4362 0 5 00000001 0000001b [0000380181332606] 0.027ms (+0.001ms): uhci_add_td_to_urb+0x9/0x30 <dca6af39> (uhci_submit_common+0x10b/0x380 <dca6b87b>)
(T1/#28)             dpkg  4362 0 5 00000001 0000001c [0000380181333448] 0.029ms (+0.000ms): uhci_alloc_qh+0xb/0x70 <dca6a89b> (uhci_submit_common+0x1d7/0x380 <dca6b947>)
(T1/#29)             dpkg  4362 0 5 00000001 0000001d [0000380181333880] 0.030ms (+0.001ms): dma_pool_alloc+0xe/0x1a0 <c02051fe> (uhci_alloc_qh+0x20/0x70 <dca6a8b0>)
(T1/#30)             dpkg  4362 0 5 00000001 0000001e [0000380181334888] 0.031ms (+0.000ms): usb_get_dev+0x9/0x30 <dcab5939> (uhci_alloc_qh+0x60/0x70 <dca6a8f0>)
(T1/#31)             dpkg  4362 0 5 00000001 0000001f [0000380181335311] 0.032ms (+0.000ms): get_device+0x8/0x30 <c02012d8> (usb_get_dev+0x19/0x30 <dcab5949>)
(T1/#32)             dpkg  4362 0 5 00000001 00000020 [0000380181335644] 0.033ms (+0.000ms): kobject_get+0x9/0x30 <c01d7869> (get_device+0x1a/0x30 <c02012ea>)
(T1/#33)             dpkg  4362 0 5 00000001 00000021 [0000380181335972] 0.033ms (+0.000ms): kref_get+0x9/0x60 <c01d8339> (kobject_get+0x19/0x30 <c01d7879>)
(T1/#34)             dpkg  4362 0 5 00000001 00000022 [0000380181336517] 0.034ms (+0.000ms): uhci_insert_tds_in_qh+0xb/0x60 <dca6a76b> (uhci_submit_common+0x1f7/0x380 <dca6b967>)
(T1/#35)             dpkg  4362 0 5 00000001 00000023 [0000380181337025] 0.035ms (+0.001ms): uhci_insert_qh+0xb/0x90 <dca6a9ab> (uhci_submit_common+0x235/0x380 <dca6b9a5>)
(T1/#36)             dpkg  4362 0 5 00000001 00000024 [0000380181337741] 0.036ms (+0.001ms): usb_claim_bandwidth+0x8/0x40 <dcaba438> (uhci_urb_enqueue+0x178/0x290 <dca6c0b8>)
(T1/#37)             dpkg  4362 0 5 00000000 00000025 [0000380181338690] 0.038ms (+0.000ms): usb_free_urb+0x8/0x20 <dcabaeb8> (uhci_finish_urb+0x40/0x60 <dca6c9b0>)
(T1/#38)             dpkg  4362 0 5 00000000 00000026 [0000380181339041] 0.038ms (+0.001ms): kref_put+0xa/0xb0 <c01d839a> (usb_free_urb+0x1a/0x20 <dcabaeca>)
(T1/#39)             dpkg  4362 0 5 00000000 00000027 [0000380181339653] 0.039ms (+0.000ms): __wake_up+0xe/0x70 <c011059e> (uhci_irq+0x1cd/0x200 <dca6cc5d>)
(T1/#40)             dpkg  4362 0 5 00000001 00000028 [0000380181340175] 0.040ms (+0.001ms): __wake_up_common+0xb/0x70 <c011052b> (__wake_up+0x3b/0x70 <c01105cb>)
(T1/#41)             dpkg  4362 0 5 00000001 00000029 [0000380181341026] 0.042ms (+0.000ms): note_interrupt+0xb/0x90 <c01341db> (__do_IRQ+0x148/0x160 <c0133938>)
(T1/#42)             dpkg  4362 0 5 00000001 0000002a [0000380181341399] 0.042ms (+0.000ms): end_8259A_irq+0x8/0x40 <c0107c38> (__do_IRQ+0x110/0x160 <c0133900>)
(T1/#43)             dpkg  4362 0 5 00000001 0000002b [0000380181341746] 0.043ms (+0.002ms): enable_8259A_irq+0xb/0x80 <c0107d1b> (__do_IRQ+0x110/0x160 <c0133900>)
(T1/#44)             dpkg  4362 0 7 00000002 0000002c [0000380181343089] 0.045ms (+0.001ms): irq_exit+0x8/0x50 <c0119fb8> (do_IRQ+0x60/0x80 <c01041f0>)
(T6/#45)     dpkg-4362  0dn.2   46µs!< (1)
(T1/#46)             dpkg  4362 0 2 00000001 0000002e [0000380181504494] 0.314ms (+0.000ms): preempt_schedule+0xa/0x70 <c027d0ca> (copy_pte_range+0xb7/0x1c0 <c0142ad7>)
(T1/#47)             dpkg  4362 0 2 00000001 0000002f [0000380181504953] 0.315ms (+0.000ms): __cond_resched_raw_spinlock+0x8/0x50 <c0111398> (copy_pte_range+0xa7/0x1c0 <c0142ac7>)
(T1/#48)             dpkg  4362 0 2 00000000 00000030 [0000380181505442] 0.316ms (+0.001ms): __cond_resched+0x9/0x70 <c0111329> (__cond_resched_raw_spinlock+0x3d/0x50 <c01113cd>)
(T1/#49)             dpkg  4362 0 3 00000000 00000031 [0000380181506068] 0.317ms (+0.000ms): __schedule+0xe/0x630 <c027c98e> (__cond_resched+0x45/0x70 <c0111365>)
(T1/#50)             dpkg  4362 0 3 00000000 00000032 [0000380181506442] 0.317ms (+0.001ms): profile_hit+0x9/0x50 <c0115749> (__schedule+0x3a/0x630 <c027c9ba>)
(T1/#51)             dpkg  4362 0 3 00000001 00000033 [0000380181507130] 0.318ms (+0.001ms): sched_clock+0xe/0xe0 <c010c3ae> (__schedule+0x62/0x630 <c027c9e2>)
(T1/#52)             dpkg  4362 0 3 00000002 00000034 [0000380181508079] 0.320ms (+0.000ms): dequeue_task+0xa/0x50 <c010f4ea> (__schedule+0x1ab/0x630 <c027cb2b>)
(T1/#53)             dpkg  4362 0 3 00000002 00000035 [0000380181508503] 0.321ms (+0.000ms): recalc_task_prio+0xc/0x1a0 <c010f64c> (__schedule+0x1c5/0x630 <c027cb45>)
(T1/#54)             dpkg  4362 0 3 00000002 00000036 [0000380181509011] 0.321ms (+0.000ms): effective_prio+0x8/0x50 <c010f5f8> (recalc_task_prio+0xa6/0x1a0 <c010f6e6>)
(T1/#55)             dpkg  4362 0 3 00000002 00000037 [0000380181509402] 0.322ms (+0.001ms): enqueue_task+0xa/0x80 <c010f53a> (__schedule+0x1cc/0x630 <c027cb4c>)
(T4/#56) [ =>             dpkg ] 0.324ms (+0.001ms)
(T1/#57)            <...>  2593 0 1 00000002 00000039 [0000380181511577] 0.326ms (+0.002ms): __switch_to+0xb/0x1a0 <c0100f5b> (__schedule+0x2bd/0x630 <c027cc3d>)
(T3/#58)    <...>-2593  0d..2  328µs : __schedule+0x2ea/0x630 <c027cc6a> <dpkg-4362> (75 73): 
(T1/#59)            <...>  2593 0 1 00000002 0000003b [0000380181513468] 0.329ms (+0.000ms): finish_task_switch+0xc/0x90 <c010fdec> (__schedule+0x2f6/0x630 <c027cc76>)
(T1/#60)            <...>  2593 0 1 00000001 0000003c [0000380181513919] 0.330ms (+0.000ms): trace_stop_sched_switched+0xa/0x150 <c012cc1a> (finish_task_switch+0x43/0x90 <c010fe23>)
(T3/#61)    <...>-2593  0d..1  330µs : trace_stop_sched_switched+0x42/0x150 <c012cc52> <<...>-2593> (73 0): 
(T1/#62)            <...>  2593 0 1 00000001 0000003e [0000380181515016] 0.331ms (+0.000ms): trace_stop_sched_switched+0xfe/0x150 <c012cd0e> (finish_task_switch+0x43/0x90 <c010fe23>)


vim:ft=help


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, deactivate() scheduling issue
  2005-02-10  7:52       ` Ingo Molnar
  2005-02-10 20:21         ` George Anzinger
@ 2005-03-03 19:36         ` Eugeny S. Mints
  2005-03-03 22:32           ` Esben Nielsen
  2005-03-29  8:45           ` Ingo Molnar
  1 sibling, 2 replies; 125+ messages in thread
From: Eugeny S. Mints @ 2005-03-03 19:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2025 bytes --]

please consider the following scenario for full RT kernel.

Task A is running then an irq is occured which in turn wakes up irq 
related thread (B) of a higher priority than A.

my current understanding that actual context switch between A and B will 
occure at preempt_schedule_irq() on the "return form irq " path.

in this case the following "if" statement in __schedule() always returns 
false since  preempt_schedule_irq() always sets up  PREEMPT_ACTIVE 
before __schedule() call.

         if ((prev->state & ~TASK_RUNNING_MUTEX) &&
                         !(preempt_count() & PREEMPT_ACTIVE)) {

as result the deactivate() is never called for preempted task A in this 
scenario. BUt if the task A is preempted while not in TASK_RUNNING state 
such behaviour seems incorrect since we get a task in not TASK_RUNNING 
state linked into a run queue.

An example:

drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)

         spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other 
tx operations */
         while (dev->tx_buff.len > 0) {    /* wait until tx idle */
                 spin_unlock_irqrestore(&dev->tx_lock, flags);
76:             set_current_state(TASK_UNINTERRUPTIBLE);
                 schedule_timeout(msecs_to_jiffies(10));
                 spin_lock_irqsave(&dev->tx_lock, flags);
         }

At  line 76 irqs are enabled, preemption is enabled.
Let assume the task A executes this code and gets preempted right after 
line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be 
deactevated. Of cource this is the bug in set_current_state() 
utilization in this particular driver but schedule stuff should be 
robust to such bugs I believe. There are a lot such bugs in the kernel I 
believe.

Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) 
   condition is but if it's just a sort of optimization (not remove a 
task from run queue if it was preemped in TASK_RUNNING state) then 
probably it should be removed in order to save correctness. Patch attached.

	Eugeny


[-- Attachment #2: sched.c.deactivate.patch --]
[-- Type: text/plain, Size: 503 bytes --]

--- sched.c.orig	2005-03-03 22:35:16.000000000 +0300
+++ sched.c	2005-03-03 22:34:58.000000000 +0300
@@ -2891,8 +2891,7 @@
 	spin_lock_irq(&rq->lock);
 
 	switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat
-	if ((prev->state & ~TASK_RUNNING_MUTEX) &&
-                       !(preempt_count() & PREEMPT_ACTIVE)) {
+	if ((prev->state & ~TASK_RUNNING_MUTEX)) {
  		switch_count = &prev->nvcsw;
 		if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
 				unlikely(signal_pending(prev))))

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, deactivate() scheduling issue
  2005-03-03 19:36         ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints
@ 2005-03-03 22:32           ` Esben Nielsen
  2005-03-04 11:56             ` Eugeny S. Mints
  2005-03-29  8:45           ` Ingo Molnar
  1 sibling, 1 reply; 125+ messages in thread
From: Esben Nielsen @ 2005-03-03 22:32 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: Ingo Molnar, linux-kernel

As I read the code the driver task (A) should _not_ be removed from the
runqueue. It has to be waken up to call schedule_timeout() such it gets
back on the runqueue after 10 ms. If it is taken out of the runqueue at
line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE
state!

As I read the use PREEMPT_ACTIVE, it is there to test on wether this
rescheduling is volentery or forced (a preemption). If it is forced the
task shall ofcourse not go off the runqueue but stay there to run again
when it gets the highest priority. That is why PREEMPT_ACTIVE is set in
preempt_schedule() and preempt_schedule_irq(). On the other hand if the
task itself has called schedule() or schedule_timeout() it has to go out
of the runqueue and wait for some event to wake it up.

Yes there will be tasks in state other that TASK_RUNNING on the runqueue.
The "bug" as I see it is in the scheduler interface: There is no way to
set the task state and call schedule() or schedule_timeout() atomicly.
Therefore you can be preempted while the state is not TASK_RUNNING.

Esben


On Thu, 3 Mar 2005, Eugeny S. Mints wrote:

> please consider the following scenario for full RT kernel.
> 
> Task A is running then an irq is occured which in turn wakes up irq 
> related thread (B) of a higher priority than A.
> 
> my current understanding that actual context switch between A and B will 
> occure at preempt_schedule_irq() on the "return form irq " path.
> 
> in this case the following "if" statement in __schedule() always returns 
> false since  preempt_schedule_irq() always sets up  PREEMPT_ACTIVE 
> before __schedule() call.
> 
>          if ((prev->state & ~TASK_RUNNING_MUTEX) &&
>                          !(preempt_count() & PREEMPT_ACTIVE)) {
> 
> as result the deactivate() is never called for preempted task A in this 
> scenario. BUt if the task A is preempted while not in TASK_RUNNING state 
> such behaviour seems incorrect since we get a task in not TASK_RUNNING 
> state linked into a run queue.
> 
> An example:
> 
> drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)
> 
>          spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other 
> tx operations */
>          while (dev->tx_buff.len > 0) {    /* wait until tx idle */
>                  spin_unlock_irqrestore(&dev->tx_lock, flags);
> 76:             set_current_state(TASK_UNINTERRUPTIBLE);
>                  schedule_timeout(msecs_to_jiffies(10));
>                  spin_lock_irqsave(&dev->tx_lock, flags);
>          }
> 
> At  line 76 irqs are enabled, preemption is enabled.
> Let assume the task A executes this code and gets preempted right after 
> line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be 
> deactevated. Of cource this is the bug in set_current_state() 
> utilization in this particular driver but schedule stuff should be 
> robust to such bugs I believe. There are a lot such bugs in the kernel I 
> believe.
> 
> Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) 
>    condition is but if it's just a sort of optimization (not remove a 
> task from run queue if it was preemped in TASK_RUNNING state) then 
> probably it should be removed in order to save correctness. Patch attached.
> 
> 	Eugeny
> 
> 


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, deactivate() scheduling issue
  2005-03-03 22:32           ` Esben Nielsen
@ 2005-03-04 11:56             ` Eugeny S. Mints
  2005-03-04 15:45               ` George Anzinger
  0 siblings, 1 reply; 125+ messages in thread
From: Eugeny S. Mints @ 2005-03-04 11:56 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel

Esben Nielsen wrote:
> As I read the code the driver task (A) should _not_ be removed from the
> runqueue. It has to be waken up to call schedule_timeout() such it gets
> back on the runqueue after 10 ms. If it is taken out of the runqueue at
> line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE
> state!
Exactly. This is definilty the bug in the driver code - a developer just
didn;t care about proper utilization of set_current_state(). The driver 
works
just because as you have described - his fortune
that scheduler doesn't remove task in not TASK_RUNNING state from a run 
queue.
And my main question was - does everybody think it's ok have task in not 
TASK_RUNNING state in run queue. My current feeling is that this should 
not be allowed.
> As I read the use PREEMPT_ACTIVE, it is there to test on wether this
> rescheduling is volentery or forced (a preemption). If it is forced the
> task shall ofcourse not go off the runqueue but stay there to run again
> when it gets the highest priority. That is why PREEMPT_ACTIVE is set in
> preempt_schedule() and preempt_schedule_irq(). On the other hand if the
> task itself has called schedule() or schedule_timeout() it has to go out
> of the runqueue and wait for some event to wake it up.
You right - it works perfectly - but not for  my test case - I believe 
task in not TASK_RUNNING state should be removed from a run queue by the 
first (any - volontery or forced) execution of the schedule() which 
detects the task state is not TASK_RUNNIG.
> 
> Yes there will be tasks in state other that TASK_RUNNING on the runqueue.
> The "bug" as I see it is in the scheduler interface: There is no way to
> set the task state and call schedule() or schedule_timeout() atomicly.
> Therefore you can be preempted while the state is not TASK_RUNNING.
Exactly. IMO this interface is weird and needs rework. I don;t undestand 
what the reason to set task state before schedule_timeout() call but not 
inside, right before the schedule(). The actual task state may be passed 
as a parameter.

As to tasks in not TASK_RUNNING state into a run queue - I always 
believe the definition of a run queue is - queue of tasks ready to run, 
i.e. in TASK_RUNNING state.
	
	Eugeny
> 
> Esben
> 
> 
> On Thu, 3 Mar 2005, Eugeny S. Mints wrote:
> 
> 
>>please consider the following scenario for full RT kernel.
>>
>>Task A is running then an irq is occured which in turn wakes up irq 
>>related thread (B) of a higher priority than A.
>>
>>my current understanding that actual context switch between A and B will 
>>occure at preempt_schedule_irq() on the "return form irq " path.
>>
>>in this case the following "if" statement in __schedule() always returns 
>>false since  preempt_schedule_irq() always sets up  PREEMPT_ACTIVE 
>>before __schedule() call.
>>
>>         if ((prev->state & ~TASK_RUNNING_MUTEX) &&
>>                         !(preempt_count() & PREEMPT_ACTIVE)) {
>>
>>as result the deactivate() is never called for preempted task A in this 
>>scenario. BUt if the task A is preempted while not in TASK_RUNNING state 
>>such behaviour seems incorrect since we get a task in not TASK_RUNNING 
>>state linked into a run queue.
>>
>>An example:
>>
>>drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)
>>
>>         spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other 
>>tx operations */
>>         while (dev->tx_buff.len > 0) {    /* wait until tx idle */
>>                 spin_unlock_irqrestore(&dev->tx_lock, flags);
>>76:             set_current_state(TASK_UNINTERRUPTIBLE);
>>                 schedule_timeout(msecs_to_jiffies(10));
>>                 spin_lock_irqsave(&dev->tx_lock, flags);
>>         }
>>
>>At  line 76 irqs are enabled, preemption is enabled.
>>Let assume the task A executes this code and gets preempted right after 
>>line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be 
>>deactevated. Of cource this is the bug in set_current_state() 
>>utilization in this particular driver but schedule stuff should be 
>>robust to such bugs I believe. There are a lot such bugs in the kernel I 
>>believe.
>>
>>Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) 
>>   condition is but if it's just a sort of optimization (not remove a 
>>task from run queue if it was preemped in TASK_RUNNING state) then 
>>probably it should be removed in order to save correctness. Patch attached.
>>
>>	Eugeny
>>
>>
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, deactivate() scheduling issue
  2005-03-04 11:56             ` Eugeny S. Mints
@ 2005-03-04 15:45               ` George Anzinger
  0 siblings, 0 replies; 125+ messages in thread
From: George Anzinger @ 2005-03-04 15:45 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel

Eugeny S. Mints wrote:
> Esben Nielsen wrote:
> 
>> As I read the code the driver task (A) should _not_ be removed from the
>> runqueue. It has to be waken up to call schedule_timeout() such it gets
>> back on the runqueue after 10 ms. If it is taken out of the runqueue at
>> line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE
>> state!
> 
> Exactly. This is definilty the bug in the driver code - a developer just
> didn;t care about proper utilization of set_current_state(). The driver 
> works
> just because as you have described - his fortune
> that scheduler doesn't remove task in not TASK_RUNNING state from a run 
> queue.
> And my main question was - does everybody think it's ok have task in not 
> TASK_RUNNING state in run queue. My current feeling is that this should 
> not be allowed.

This is the normal and specified way to handle this sort of thing.  There is a 
race issue that coding in this way avoids.  The coding sequence is:
a) set the task state to some state other than TASK_RUNNING.
b) do what ever triggers the wake up.  This may be several things, for example, 
an interrupt from some device OR a timeout.
c) call schedule to wait.

The race is getting to the schedule call before the wake up happens.  If, for 
some reason, the wake up condition happens prior to the schedule call, it will 
set the task state back to TASK_RUNNING so that when the schedule() call is made 
the scheduler will just return which is the right thing (tm) to do as the 
condition being waited on has happened.  We also note that disabling interrupts 
or preemption will NOT avoid the race unless you disable interrupts on ALL cpus, 
which is a VERY expensive cross cpu call.
> 
>> As I read the use PREEMPT_ACTIVE, it is there to test on whether this
>> rescheduling is voluntary or forced (a preemption). If it is forced the
>> task shall of course not go off the runqueue but stay there to run again
>> when it gets the highest priority. That is why PREEMPT_ACTIVE is set in
>> preempt_schedule() and preempt_schedule_irq(). On the other hand if the
>> task itself has called schedule() or schedule_timeout() it has to go out
>> of the runqueue and wait for some event to wake it up.
> 
> You right - it works perfectly - but not for  my test case - I believe 
> task in not TASK_RUNNING state should be removed from a run queue by the 
> first (any - voluntary or forced) execution of the schedule() which 
> detects the task state is not TASK_RUNNIG.

This would cause the task to loose control prior to its setting up the needed 
wakeup events.
> 
>>
>> Yes there will be tasks in state other that TASK_RUNNING on the runqueue.
>> The "bug" as I see it is in the scheduler interface: There is no way to
>> set the task state and call schedule() or schedule_timeout() atomicly.
>> Therefore you can be preempted while the state is not TASK_RUNNING.
> 
> Exactly. IMO this interface is weird and needs rework. I don;t understand 
> what the reason to set task state before schedule_timeout() call but not 
> inside, right before the schedule(). The actual task state may be passed 
> as a parameter.

You are assuming that the task ONLY wants to do a timeout.  Most of the time the 
timeout indicates an error condition.   The timeout bounds the wait for what is 
really desired, i.e. a device interrupt, some other task signaling, or some such.

Surly this is covered in the various driver writing guides...
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19  5:08 ` Lee Revell
  2005-02-19  6:47   ` Lee Revell
  2005-02-19  9:00   ` Ingo Molnar
@ 2005-03-10  9:37   ` Steven Rostedt
  2005-03-10  9:54     ` Steven Rostedt
  2 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-10  9:37 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, linux-kernel


Hi Ingo,

I notice a problem with the bit_spin_locks that would probably explain the
kjournald latency problems. I'm working on a custom kernel based on your's
and I needed to temporarily remove the scheduler_tick from
update_process_times to implement some special scheduling needs.  This
caused kjournal to go into an infinite loop.

Here's your bit_spin_lock:

static inline void bit_spin_lock(int bitnum, unsigned long *addr)
{
	/*
	 * Assuming the lock is uncontended, this never enters
	 * the body of the outer loop. If it is contended, then
	 * within the inner loop a non-atomic test is used to
	 * busywait with less bus contention for a good time to
	 * attempt to acquire the lock bit.
	 */
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) ||
defined(CONFIG_PREEMPT)
	while (test_and_set_bit(bitnum, addr))
		while (test_bit(bitnum, addr))
			cpu_relax();
#endif
	__acquire(bitlock);
}


You removed the preempt disable and added the CONFIG_PREEMPT. What happens
if a lower priority process gets the bit lock and gets preempted by a
higher priority process that then tries to get this lock. It spins until
it's quota runs out.  This is what is happening to kjournald. A lower
priority process gets the bit lock and kjournald preempts it causing
kjournald to spin until it's quota is up to let the other process
release the lock.  Now, luckly your kernel kjournald is not realtime
FIFO. If it were, you would than have a deadlock, try it. I just set
kjournald (using your kernel) to FIFO prio 42 (prio 58 inside the kernel),
and with a non-rt task, I did a build of the kernel.  After a minute or
two, all processes under the priority of kjournald were starved out of the
CPU, and kjournald was spinning.  Make sure your kjournald has a lower
prioirty than your interrupt threads.

The culprit is jbd_lock_bh_state and jbd_lock_bh_journal_head which call
bit_spin_lock.

Example of long latency: (or deadlock)

journal_refile_buffer
   --> spin_lock(&journal->j_list_lock);
   --> journal_remove_journal_head(bh);
         --> jbd_lock_bh_journal_head(bh);
               --> bit_spin_lock(BH_JournalHead, &bh->b_state);

The short term fix is probably to put back the preempt_disables, the long
term is to get rid of these stupid bit_spin_lock busy loops.

-- Steve


On Sat, 19 Feb 2005, Lee Revell wrote:

> On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote:
> >   http://redhat.com/~mingo/realtime-preempt/
> >
>
> Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
>
> preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02
> --------------------------------------------------------------------
>  latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1)
>     -----------------
>     | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0)
>     -----------------
>
>                  _------=> CPU#
>                 / _-----=> irqs-off
>                | / _----=> need-resched
>                || / _---=> hardirq/softirq
>                ||| / _--=> preempt-depth
>                |||| /
>                |||||     delay
>    cmd     pid ||||| time  |   caller
>       \   /    |||||   \   |   /
> kjournal-2478  0dn.4    0µs!: <756f6a6b> (<6c616e72>)
> kjournal-2478  0dn.4    0µs : __trace_start_sched_wakeup (try_to_wake_up)
> kjournal-2478  0dn.3    0µs : preempt_schedule (try_to_wake_up)
> kjournal-2478  0dn.3    0µs : try_to_wake_up <<...>-2> (69 73):
> kjournal-2478  0dn.2    0µs : preempt_schedule (try_to_wake_up)
> kjournal-2478  0dn.2    0µs : wake_up_process (do_softirq)
> kjournal-2478  0dn.1    1µs < (1)
>
> The repeating pattern is 8 of these:
>
> kjournal-2478  0.n.1    1µs : inverted_lock (journal_commit_transaction)
> kjournal-2478  0.n.1    1µs : __journal_unfile_buffer (journal_commit_transaction)
> kjournal-2478  0.n.1    1µs : journal_remove_journal_head (journal_commit_transaction)
> kjournal-2478  0.n.1    1µs : __journal_remove_journal_head (journal_remove_journal_head)
> kjournal-2478  0.n.1    1µs : __brelse (__journal_remove_journal_head)
> kjournal-2478  0.n.1    1µs : journal_free_journal_head (journal_remove_journal_head)
> kjournal-2478  0.n.1    2µs : kmem_cache_free (journal_free_journal_head)
>
> and one of these:
>
> kjournal-2478  0dn.1    9µs : cache_flusharray (kmem_cache_free)
> kjournal-2478  0dn.2    9µs : free_block (cache_flusharray)
> kjournal-2478  0dn.1   11µs : preempt_schedule (cache_flusharray)
> kjournal-2478  0dn.1   11µs : memmove (cache_flusharray)
> kjournal-2478  0dn.1   11µs : memcpy (memmove)
>
> etc.  Finally:
>
> kjournal-2478  0dn.1  704µs : cache_flusharray (kmem_cache_free)
> kjournal-2478  0dn.2  704µs+: free_block (cache_flusharray)
> kjournal-2478  0dn.1  707µs : preempt_schedule (cache_flusharray)
> kjournal-2478  0dn.1  707µs : memmove (cache_flusharray)
> kjournal-2478  0dn.1  707µs : memcpy (memmove)
> kjournal-2478  0.n.1  708µs : inverted_lock (journal_commit_transaction)
> kjournal-2478  0.n.1  708µs : __journal_unfile_buffer (journal_commit_transaction)
> kjournal-2478  0.n.1  709µs : journal_remove_journal_head (journal_commit_transaction)
> kjournal-2478  0.n.1  709µs : __journal_remove_journal_head (journal_remove_journal_head)
> kjournal-2478  0.n.1  709µs : __brelse (__journal_remove_journal_head)
> kjournal-2478  0.n.1  709µs : journal_free_journal_head (journal_remove_journal_head)
> kjournal-2478  0.n.1  709µs : kmem_cache_free (journal_free_journal_head)
> kjournal-2478  0.n..  710µs : preempt_schedule (journal_commit_transaction)
> kjournal-2478  0dn..  710µs : __schedule (preempt_schedule)
> kjournal-2478  0dn..  710µs : profile_hit (__schedule)
> kjournal-2478  0dn.1  710µs : sched_clock (__schedule)
> kjournal-2478  0dn.2  711µs : dequeue_task (__schedule)
> kjournal-2478  0dn.2  711µs : recalc_task_prio (__schedule)
> kjournal-2478  0dn.2  711µs : effective_prio (recalc_task_prio)
> kjournal-2478  0dn.2  711µs : enqueue_task (__schedule)
>    <...>-2     0d..2  712µs : __switch_to (__schedule)
>    <...>-2     0d..2  712µs : __schedule <kjournal-2478> (73 69):
>    <...>-2     0d..2  712µs : finish_task_switch (__schedule)
>    <...>-2     0d..1  712µs : trace_stop_sched_switched (finish_task_switch)
>    <...>-2     0d..1  712µs : trace_stop_sched_switched <<...>-2> (69 0):
>    <...>-2     0d..1  713µs : trace_stop_sched_switched (finish_task_switch)
>
> Lee
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-10  9:37   ` Steven Rostedt
@ 2005-03-10  9:54     ` Steven Rostedt
  2005-03-11  9:57       ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-10  9:54 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, linux-kernel


On Thu, 10 Mar 2005, Steven Rostedt wrote:

> The short term fix is probably to put back the preempt_disables, the long
> term is to get rid of these stupid bit_spin_lock busy loops.
>

Doing a quick search on the kernel, it looks like only kjournald uses the
bit_spin_locks. I'll start converting them to spinlocks. The use seems to
be more of a hack, since it is using bits in the state field for locking,
and these bits aren't used for anything else.

-- Steve

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00
  2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
                   ` (5 preceding siblings ...)
  2005-02-19  5:08 ` Lee Revell
@ 2005-03-11  9:28 ` Ingo Molnar
  2005-03-11 12:10   ` Andrew Walrond
  6 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-11  9:28 UTC (permalink / raw)
  To: linux-kernel


i have released the -V0.7.40-00 Real-Time Preemption patch, which can be
downloaded from the usual place:

  http://redhat.com/~mingo/realtime-preempt/

this is a merge to 2.6.11-final.

to create a -V0.7.40-00 tree from scratch, the patching order is:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2
  http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.11-final-V0.7.40-00

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-10  9:54     ` Steven Rostedt
@ 2005-03-11  9:57       ` Ingo Molnar
  2005-03-11 10:15         ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-11  9:57 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Lee Revell, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > The short term fix is probably to put back the preempt_disables, the long
> > term is to get rid of these stupid bit_spin_lock busy loops.
> 
> Doing a quick search on the kernel, it looks like only kjournald uses
> the bit_spin_locks. I'll start converting them to spinlocks. The use
> seems to be more of a hack, since it is using bits in the state field
> for locking, and these bits aren't used for anything else.

yeah. bit-spinlocks are really a hack.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11  9:57       ` Ingo Molnar
@ 2005-03-11 10:15         ` Steven Rostedt
  2005-03-11 10:17           ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 10:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, linux-kernel


On Fri, 11 Mar 2005, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > The short term fix is probably to put back the preempt_disables, the long
> > > term is to get rid of these stupid bit_spin_lock busy loops.
> >
> > Doing a quick search on the kernel, it looks like only kjournald uses
> > the bit_spin_locks. I'll start converting them to spinlocks. The use
> > seems to be more of a hack, since it is using bits in the state field
> > for locking, and these bits aren't used for anything else.
>
> yeah. bit-spinlocks are really a hack.
>
> 	Ingo
>

And this really sucks too!  I've been looking into a fix for this and have
yet to get something stable.  As you probably already know, you can't just
put back the preempt_disable since your spinlocks now schedule. So I've
been looking into finding a way to get rid of these.

I've tried making two global spinlocks, one for the state bit and one for
the journal head bit use.  But this deadlocks with j_state_lock. The
journal head lock seems to be ok to be global, but the state lock needs to
have one for every buffer head.  I'm now hacking away to do this without
touching the actual buffer head. But I'm not sure what some of the
side effects this is having.  I'll keep you posted when I get something
working.  I'm now having a crash course in how kjournal and friends work.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 10:15         ` Steven Rostedt
@ 2005-03-11 10:17           ` Ingo Molnar
  2005-03-11 10:24             ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-11 10:17 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Lee Revell, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > > Doing a quick search on the kernel, it looks like only kjournald uses
> > > the bit_spin_locks. I'll start converting them to spinlocks. The use
> > > seems to be more of a hack, since it is using bits in the state field
> > > for locking, and these bits aren't used for anything else.
> >
> > yeah. bit-spinlocks are really a hack.
> 
> And this really sucks too!  I've been looking into a fix for this and
> have yet to get something stable.  As you probably already know, you
> can't just put back the preempt_disable since your spinlocks now
> schedule. So I've been looking into finding a way to get rid of these.
> 
> I've tried making two global spinlocks, one for the state bit and one
> for the journal head bit use.  But this deadlocks with j_state_lock.
> The journal head lock seems to be ok to be global, but the state lock
> needs to have one for every buffer head.  I'm now hacking away to do
> this without touching the actual buffer head. But I'm not sure what
> some of the side effects this is having.  I'll keep you posted when I
> get something working.  I'm now having a crash course in how kjournal
> and friends work.

did you try the canonical way of putting a spinlock into every
buffer_head?

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 10:17           ` Ingo Molnar
@ 2005-03-11 10:24             ` Steven Rostedt
  2005-03-11 10:43               ` Andrew Morton
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 10:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, linux-kernel


On Fri, 11 Mar 2005, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:

> > I've tried making two global spinlocks, one for the state bit and one
> > for the journal head bit use.  But this deadlocks with j_state_lock.
> > The journal head lock seems to be ok to be global, but the state lock
> > needs to have one for every buffer head.  I'm now hacking away to do
> > this without touching the actual buffer head. But I'm not sure what
> > some of the side effects this is having.  I'll keep you posted when I
> > get something working.  I'm now having a crash course in how kjournal
> > and friends work.
>
> did you try the canonical way of putting a spinlock into every
> buffer_head?
>

No, I'll try that now. I just didn't want to modify the buffer head struct
just for journaling.  But if it is the quickest and easiest fix, then I'll
submit it and we can change it later.

-- Steve

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 10:24             ` Steven Rostedt
@ 2005-03-11 10:43               ` Andrew Morton
  2005-03-11 10:53                 ` Steven Rostedt
  2005-03-11 14:40                 ` Steven Rostedt
  0 siblings, 2 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-11 10:43 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, rlrevell, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > did you try the canonical way of putting a spinlock into every
>  > buffer_head?
>  >
> 
>  No, I'll try that now. I just didn't want to modify the buffer head struct
>  just for journaling.  But if it is the quickest and easiest fix, then I'll
>  submit it and we can change it later.

You'll need two spinlocks.  jbd_lock_bh_state() and jbd_lock_bh_journal_head().

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 10:43               ` Andrew Morton
@ 2005-03-11 10:53                 ` Steven Rostedt
  2005-03-11 14:40                 ` Steven Rostedt
  1 sibling, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Fri, 11 Mar 2005, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >  No, I'll try that now. I just didn't want to modify the buffer head struct
> >  just for journaling.  But if it is the quickest and easiest fix, then I'll
> >  submit it and we can change it later.
>
> You'll need two spinlocks.  jbd_lock_bh_state() and jbd_lock_bh_journal_head().
>

Yep, already did that. Now I need to reboot the new kernel and give it a
try.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00
  2005-03-11  9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar
@ 2005-03-11 12:10   ` Andrew Walrond
  2005-03-14 20:19     ` Tom Rini
  0 siblings, 1 reply; 125+ messages in thread
From: Andrew Walrond @ 2005-03-11 12:10 UTC (permalink / raw)
  To: linux-kernel

On Friday 11 March 2005 09:28, Ingo Molnar wrote:
> i have released the -V0.7.40-00 Real-Time Preemption patch, which can be
> downloaded from the usual place:
>

I've lost the thread a little; Is this still x86 only?

Andrew Walrond

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 10:43               ` Andrew Morton
  2005-03-11 10:53                 ` Steven Rostedt
@ 2005-03-11 14:40                 ` Steven Rostedt
  2005-03-11 15:08                   ` Steven Rostedt
  2005-03-11 15:38                   ` Ingo Molnar
  1 sibling, 2 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 14:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



Here's the patch. It's probably more of an overkill wrt buffer heads, but
it seems to be the easiest solution.

I also put back some of the changes you made for the
bit_spin_locks, so that they act the same as the vanilla kernel if
PREEMPT_RT is not defined.  Now I only tested this with PREEMPT_RT
configured so I hope others can test it with it off. If I get time I'll do
that as well.

I patched this against linux-2.6.11-rc4-V0.7.39-02, so I hope it goes
easily into .40.

Lee,

 Could you see what the latencies are with kjournal with this patch
applied.

Thanks,

 -- Steve


diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c
--- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c	2005-02-12 22:06:54.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c	2005-03-11 07:48:04.000000000 -0500
@@ -3002,6 +3002,10 @@
 		preempt_disable();
 		__get_cpu_var(bh_accounting).nr++;
 		recalc_bh_state();
+#ifdef CONFIG_PREEMPT_RT
+		spin_lock_init(&ret->b_jstate_lock);
+		spin_lock_init(&ret->b_jhead_lock);
+#endif
 		preempt_enable();
 	}
 	return ret;
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h	2005-02-12 22:05:10.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h	2005-03-11 07:59:44.000000000 -0500
@@ -62,6 +62,14 @@
 	bh_end_io_t *b_end_io;		/* I/O completion */
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
+
+#ifdef CONFIG_PREEMPT_RT
+	/*
+	 * Fixme: This should be in the journal code.
+	 */
+	spinlock_t b_jstate_lock;	/* lock for journal state. */
+	spinlock_t b_jhead_lock;	/* lock for journal head. */
+#endif
 };

 /*
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h	2005-02-12 22:07:18.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h	2005-03-11 07:57:47.000000000 -0500
@@ -314,6 +314,12 @@
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)

+#ifdef CONFIG_PREEMPT_RT
+#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock)
+#else
+#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state);
+#endif
+
 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
 	return jh->b_bh;
@@ -326,33 +332,34 @@

 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(lock,BH_State,jstate);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(trylock,BH_State,jstate);
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(is_locked,BH_State,jstate);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(unlock,BH_State,jstate);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+	PICK_SPIN_LOCK(lock,BH_JournalHead,jhead);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+	PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead);
 }
+#undef PICK_SPIN_LOCK

 struct jbd_revoke_table_s;

diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h	2005-03-10 08:47:25.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h	2005-03-11 09:06:26.254317378 -0500
@@ -774,6 +774,10 @@
 }))


+#ifndef CONFIG_PREEMPT_RT
+
+/* These are just plain evil! */
+
 /*
  *  bit-based spin_lock()
  *
@@ -789,10 +793,15 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr)) {
+			preempt_enable();
 			cpu_relax();
+			preempt_disable();
+		}
+	}
 #endif
 	__acquire(bitlock);
 }
@@ -802,9 +811,12 @@
  */
 static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	if (test_and_set_bit(bitnum, addr)) {
+		preempt_enable();
 		return 0;
+	}
 #endif
 	__acquire(bitlock);
 	return 1;
@@ -815,11 +827,12 @@
  */
 static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	BUG_ON(!test_bit(bitnum, addr));
 	smp_mb__before_clear_bit();
 	clear_bit(bitnum, addr);
 #endif
+	preempt_enable();
 	__release(bitlock);
 }

@@ -828,12 +841,15 @@
  */
 static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
+#elif defined CONFIG_PREEMPT
+	return preempt_count();
 #else
 	return 1;
 #endif
 }
+#endif /* CONFIG_PREEMPT_RT */

 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 14:40                 ` Steven Rostedt
@ 2005-03-11 15:08                   ` Steven Rostedt
  2005-03-11 15:30                     ` K.R. Foley
  2005-03-11 15:38                   ` Ingo Molnar
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 15:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel


>
> +#ifdef CONFIG_PREEMPT_RT
> +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock)
> +#else
> +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state);
> +#endif
> +

Oops, extra semicolon on the non RT side.


I'll try again.

-- Steve

diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c
--- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c	2005-02-12 22:06:54.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c	2005-03-11 07:48:04.000000000 -0500
@@ -3002,6 +3002,10 @@
 		preempt_disable();
 		__get_cpu_var(bh_accounting).nr++;
 		recalc_bh_state();
+#ifdef CONFIG_PREEMPT_RT
+		spin_lock_init(&ret->b_jstate_lock);
+		spin_lock_init(&ret->b_jhead_lock);
+#endif
 		preempt_enable();
 	}
 	return ret;
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h	2005-02-12 22:05:10.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h	2005-03-11 07:59:44.000000000 -0500
@@ -62,6 +62,14 @@
 	bh_end_io_t *b_end_io;		/* I/O completion */
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
+
+#ifdef CONFIG_PREEMPT_RT
+	/*
+	 * Fixme: This should be in the journal code.
+	 */
+	spinlock_t b_jstate_lock;	/* lock for journal state. */
+	spinlock_t b_jhead_lock;	/* lock for journal head. */
+#endif
 };

 /*
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h	2005-02-12 22:07:18.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h	2005-03-11 07:57:47.000000000 -0500
@@ -314,6 +314,12 @@
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)

+#ifdef CONFIG_PREEMPT_RT
+#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock)
+#else
+#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state)
+#endif
+
 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
 	return jh->b_bh;
@@ -326,33 +332,34 @@

 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(lock,BH_State,jstate);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(trylock,BH_State,jstate);
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(is_locked,BH_State,jstate);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(unlock,BH_State,jstate);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+	PICK_SPIN_LOCK(lock,BH_JournalHead,jhead);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+	PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead);
 }
+#undef PICK_SPIN_LOCK

 struct jbd_revoke_table_s;

diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h	2005-03-10 08:47:25.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h	2005-03-11 09:06:26.254317378 -0500
@@ -774,6 +774,10 @@
 }))


+#ifndef CONFIG_PREEMPT_RT
+
+/* These are just plain evil! */
+
 /*
  *  bit-based spin_lock()
  *
@@ -789,10 +793,15 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr)) {
+			preempt_enable();
 			cpu_relax();
+			preempt_disable();
+		}
+	}
 #endif
 	__acquire(bitlock);
 }
@@ -802,9 +811,12 @@
  */
 static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	if (test_and_set_bit(bitnum, addr)) {
+		preempt_enable();
 		return 0;
+	}
 #endif
 	__acquire(bitlock);
 	return 1;
@@ -815,11 +827,12 @@
  */
 static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	BUG_ON(!test_bit(bitnum, addr));
 	smp_mb__before_clear_bit();
 	clear_bit(bitnum, addr);
 #endif
+	preempt_enable();
 	__release(bitlock);
 }

@@ -828,12 +841,15 @@
  */
 static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
+#elif defined CONFIG_PREEMPT
+	return preempt_count();
 #else
 	return 1;
 #endif
 }
+#endif /* CONFIG_PREEMPT_RT */

 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 15:08                   ` Steven Rostedt
@ 2005-03-11 15:30                     ` K.R. Foley
  0 siblings, 0 replies; 125+ messages in thread
From: K.R. Foley @ 2005-03-11 15:30 UTC (permalink / raw)
  To: rostedt; +Cc: Andrew Morton, mingo, rlrevell, linux-kernel

Steven Rostedt wrote:
>>+#ifdef CONFIG_PREEMPT_RT
>>+#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock)
>>+#else
>>+#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state);
>>+#endif
>>+
> 
> 
> Oops, extra semicolon on the non RT side.
> 
> 
> I'll try again.
> 
> -- Steve

Haven't tried it yet, but does apply cleanly to 2.6.11-final-V0.7.40-00.

kr

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 14:40                 ` Steven Rostedt
  2005-03-11 15:08                   ` Steven Rostedt
@ 2005-03-11 15:38                   ` Ingo Molnar
  2005-03-11 16:01                     ` Steven Rostedt
  2005-03-11 20:39                     ` Steven Rostedt
  1 sibling, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-11 15:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, rlrevell, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Here's the patch. It's probably more of an overkill wrt buffer heads,
> but it seems to be the easiest solution.

isnt there some ext3-private journal structure (journal-bh) linked off 
the bh? If the lock is in that structure then the overhead would only 
affect ext3.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 15:38                   ` Ingo Molnar
@ 2005-03-11 16:01                     ` Steven Rostedt
  2005-03-11 20:39                     ` Steven Rostedt
  1 sibling, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 16:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel


On Fri, 11 Mar 2005, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Here's the patch. It's probably more of an overkill wrt buffer heads,
> > but it seems to be the easiest solution.
>
> isnt there some ext3-private journal structure (journal-bh) linked off
> the bh? If the lock is in that structure then the overhead would only
> affect ext3.
>

Yes, there is, and I was trying to use it before you mentioned trying this
(which works for now).  The locks are called before and after the private
pointer of the bh is set and removed.  The journal_head lock, I was going
to make global, and the state lock would go on this structure. I would
have to do some hack in journal.c to flag the state lock when it was
removing the journal head so that it didn't do the remove there, but did
it after the state lock was released. But this still had a few crashes.

The journal_head lock was used to lock when to add or remove the private
data from the bh, so you can see why this structure can't be used for this
purpose. But the state lock seemed to be ok for this. I need to know more
about the journaling system.

 I'll look into doing this too, but this fix should due for now.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 15:38                   ` Ingo Molnar
  2005-03-11 16:01                     ` Steven Rostedt
@ 2005-03-11 20:39                     ` Steven Rostedt
  2005-03-11 20:46                       ` Lee Revell
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-11 20:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel



On Fri, 11 Mar 2005, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Here's the patch. It's probably more of an overkill wrt buffer heads,
> > but it seems to be the easiest solution.
>
> isnt there some ext3-private journal structure (journal-bh) linked off
> the bh? If the lock is in that structure then the overhead would only
> affect ext3.
>

OK, here it is (Yuck!).  I was able to use the journal head (private data
of the buffer head) for the state lock.  I just decided to have the
journal head lock be one global lock for all buffer heads, since it is
used to add and remove the journal private data from the buffer head, and
thus can't be stored in the journal private data.

The state lock is now in the journal private data but we must be careful
not to free this data before we unlock it. So here's what I've done.

  static inline void jbd_lock_bh_state(struct buffer_head *bh)
  {
	BUG_ON(!bh->b_private);
	atomic_inc(&bh2jh(bh)->b_state_wait_count);
	spin_lock(&bh2jh(bh)->b_state_lock);
  }

I have a counter of those that want/have the lock, and this informs the
journal_remove_journal_head that it should not free the jh.

  static void __journal_remove_journal_head(struct buffer_head *bh)
  {
	struct journal_head *jh = bh2jh(bh);

	J_ASSERT_JH(jh, jh->b_jcount >= 0);

	get_bh(bh);
	if (jh->b_jcount == 0) {
		if (jh->b_transaction == NULL &&
				jh->b_next_transaction == NULL &&
				jh->b_cp_transaction == NULL) {
  #ifdef CONFIG_PREEMPT_RT
			if (atomic_read(&jh->b_state_wait_count)) {
				BUG_ON(buffer_journalhead(bh));
				set_buffer_journalhead(bh);
			} else
  #endif
                        {


Here the state_wait_count is checked, and if > 0, then using the bit that
was originally used for locking the journal head, is set to inform the
unlocking of the state lock that it needs to be removed.

  static inline void jbd_unlock_bh_state(struct buffer_head *bh)
  {
	int rmjh = 0;

	BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count));
	atomic_dec(&bh2jh(bh)->b_state_wait_count);

	if (buffer_journalhead(bh)) {
		clear_buffer_journalhead(bh);
		rmjh = 1;
	}

	spin_unlock(&bh2jh(bh)->b_state_lock);

	if (rmjh)
		journal_remove_journal_head(bh);
  }

Now in the unlocking of the state lock, the journal head bit is tested and
if it is set, then the remove journal head function is called.


Maybe this isn't the cleanest solution, but it keeps the overhead on the
buffer heads down, so it's prefered over my last patch.

Once again, this has only been tested with full preemption enabled, but I
tried to keep it from changing the way non PREEMPT_RT works.

I'm leaving now for the weekend, so I won't be able to respond to anyone
till Monday.  I'll also run this patch over the weekend while compiling
the kernel in an endless loop

 while [ 1 ]; do
   make clean; make
 done

With kjournal running FIFO, to see if it survives.

Cheers,


-- Steve

diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c
--- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c	2005-02-12 22:05:29.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c	2005-03-11 14:54:21.000000000 -0500
@@ -80,6 +80,10 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef CONFIG_PREEMPT_RT
+spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED;
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
@@ -1727,6 +1731,9 @@
 		jh = new_jh;
 		new_jh = NULL;		/* We consumed it */
 		set_buffer_jbd(bh);
+#ifdef CONFIG_PREEMPT_RT
+		spin_lock_init(&jh->b_state_lock);
+#endif
 		bh->b_private = jh;
 		jh->b_bh = bh;
 		get_bh(bh);
@@ -1767,26 +1774,34 @@
 		if (jh->b_transaction == NULL &&
 				jh->b_next_transaction == NULL &&
 				jh->b_cp_transaction == NULL) {
-			J_ASSERT_BH(bh, buffer_jbd(bh));
-			J_ASSERT_BH(bh, jh2bh(jh) == bh);
-			BUFFER_TRACE(bh, "remove journal_head");
-			if (jh->b_frozen_data) {
-				printk(KERN_WARNING "%s: freeing "
-						"b_frozen_data\n",
-						__FUNCTION__);
-				kfree(jh->b_frozen_data);
-			}
-			if (jh->b_committed_data) {
-				printk(KERN_WARNING "%s: freeing "
-						"b_committed_data\n",
-						__FUNCTION__);
-				kfree(jh->b_committed_data);
+#ifdef CONFIG_PREEMPT_RT
+			if (atomic_read(&jh->b_state_wait_count)) {
+				BUG_ON(buffer_journalhead(bh));
+				set_buffer_journalhead(bh);
+			} else
+#endif
+			{
+				J_ASSERT_BH(bh, buffer_jbd(bh));
+				J_ASSERT_BH(bh, jh2bh(jh) == bh);
+				BUFFER_TRACE(bh, "remove journal_head");
+				if (jh->b_frozen_data) {
+					printk(KERN_WARNING "%s: freeing "
+					       "b_frozen_data\n",
+					       __FUNCTION__);
+					kfree(jh->b_frozen_data);
+				}
+				if (jh->b_committed_data) {
+					printk(KERN_WARNING "%s: freeing "
+					       "b_committed_data\n",
+					       __FUNCTION__);
+					kfree(jh->b_committed_data);
+				}
+				bh->b_private = NULL;
+				jh->b_bh = NULL;	/* debug, really */
+				clear_buffer_jbd(bh);
+				__brelse(bh);
+				journal_free_journal_head(jh);
 			}
-			bh->b_private = NULL;
-			jh->b_bh = NULL;	/* debug, really */
-			clear_buffer_jbd(bh);
-			__brelse(bh);
-			journal_free_journal_head(jh);
 		} else {
 			BUFFER_TRACE(bh, "journal_head was locked");
 		}
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/transaction.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/transaction.c
--- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/transaction.c	2005-02-12 22:05:50.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/transaction.c	2005-03-11 13:25:49.000000000 -0500
@@ -1207,11 +1207,17 @@

 	BUFFER_TRACE(bh, "entry");

+	/*
+	 * Is it OK to check to see if this isn't a jbd buffer outside of
+	 * locks? Now that jbd_lock_bh_state only works with jbd buffers
+	 * I sure hope so.
+	 */
+	if (!buffer_jbd(bh))
+		goto not_jbd;
+
 	jbd_lock_bh_state(bh);
 	spin_lock(&journal->j_list_lock);

-	if (!buffer_jbd(bh))
-		goto not_jbd;
 	jh = bh2jh(bh);

 	/* Critical error: attempting to delete a bitmap buffer, maybe?
@@ -1219,7 +1225,7 @@
 	if (!J_EXPECT_JH(jh, !jh->b_committed_data,
 			 "inconsistent data on disk")) {
 		err = -EIO;
-		goto not_jbd;
+		goto bad_jbd;
 	}

 	if (jh->b_transaction == handle->h_transaction) {
@@ -1274,9 +1280,11 @@
 		}
 	}

-not_jbd:
+
+bad_jbd:
 	spin_unlock(&journal->j_list_lock);
 	jbd_unlock_bh_state(bh);
+not_jbd:
 	__brelse(bh);
 	return err;
 }
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h	2005-02-12 22:07:18.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h	2005-03-11 14:55:31.000000000 -0500
@@ -313,6 +313,7 @@
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(JournalHead,journalhead)

 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
@@ -324,6 +325,66 @@
 	return bh->b_private;
 }

+void journal_remove_journal_head(struct buffer_head *bh);
+
+#ifdef CONFIG_PREEMPT_RT
+
+extern spinlock_t jbd_journal_head_lock;
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	BUG_ON(!bh->b_private);
+	atomic_inc(&bh2jh(bh)->b_state_wait_count);
+	spin_lock(&bh2jh(bh)->b_state_lock);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	int ret;
+
+	BUG_ON(!bh->b_private);
+
+	if ((ret = spin_trylock(&bh2jh(bh)->b_state_lock)))
+		atomic_inc(&bh2jh(bh)->b_state_wait_count);
+
+	return ret;
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return bh2jh(bh) ? spin_is_locked(&bh2jh(bh)->b_state_lock) : 0;
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	int rmjh = 0;
+
+	BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count));
+	atomic_dec(&bh2jh(bh)->b_state_wait_count);
+
+	if (buffer_journalhead(bh)) {
+		clear_buffer_journalhead(bh);
+		rmjh = 1;
+	}
+
+	spin_unlock(&bh2jh(bh)->b_state_lock);
+
+	if (rmjh)
+		journal_remove_journal_head(bh);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_lock(&jbd_journal_head_lock);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_unlock(&jbd_journal_head_lock);
+}
+
+#else /* !CONFIG_PREEMPT_RT */
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);
@@ -354,6 +415,8 @@
 	bit_spin_unlock(BH_JournalHead, &bh->b_state);
 }

+#endif /* CONFIG_PREEMPT_RT */
+
 struct jbd_revoke_table_s;

 /**
@@ -918,7 +981,6 @@
  */
 struct journal_head *journal_add_journal_head(struct buffer_head *bh);
 struct journal_head *journal_grab_journal_head(struct buffer_head *bh);
-void journal_remove_journal_head(struct buffer_head *bh);
 void journal_put_journal_head(struct journal_head *jh);

 /*
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/journal-head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/journal-head.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/journal-head.h	2005-02-12 22:07:39.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/journal-head.h	2005-03-11 15:14:07.774541864 -0500
@@ -80,6 +80,16 @@
 	 * [j_list_lock]
 	 */
 	struct journal_head *b_cpnext, *b_cpprev;
+
+	/*
+	 * Lock the state of the buffer head.
+	 */
+	spinlock_t b_state_lock;
+
+	/*
+	 * Count the processes that want/have the state lock.
+	 */
+	atomic_t b_state_wait_count;
 };

 #endif		/* JOURNAL_HEAD_H_INCLUDED */
diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h
--- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h	2005-03-10 08:47:25.000000000 -0500
+++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h	2005-03-11 09:06:26.000000000 -0500
@@ -774,6 +774,10 @@
 }))


+#ifndef CONFIG_PREEMPT_RT
+
+/* These are just plain evil! */
+
 /*
  *  bit-based spin_lock()
  *
@@ -789,10 +793,15 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr)) {
+			preempt_enable();
 			cpu_relax();
+			preempt_disable();
+		}
+	}
 #endif
 	__acquire(bitlock);
 }
@@ -802,9 +811,12 @@
  */
 static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	if (test_and_set_bit(bitnum, addr)) {
+		preempt_enable();
 		return 0;
+	}
 #endif
 	__acquire(bitlock);
 	return 1;
@@ -815,11 +827,12 @@
  */
 static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	BUG_ON(!test_bit(bitnum, addr));
 	smp_mb__before_clear_bit();
 	clear_bit(bitnum, addr);
 #endif
+	preempt_enable();
 	__release(bitlock);
 }

@@ -828,12 +841,15 @@
  */
 static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
+#elif defined CONFIG_PREEMPT
+	return preempt_count();
 #else
 	return 1;
 #endif
 }
+#endif /* CONFIG_PREEMPT_RT */

 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 20:39                     ` Steven Rostedt
@ 2005-03-11 20:46                       ` Lee Revell
  2005-03-11 22:06                         ` Lee Revell
  0 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-11 20:46 UTC (permalink / raw)
  To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote:
> I'm leaving now for the weekend, so I won't be able to respond to anyone
> till Monday.  I'll also run this patch over the weekend while compiling
> the kernel in an endless loop

I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it
goes.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 20:46                       ` Lee Revell
@ 2005-03-11 22:06                         ` Lee Revell
  2005-03-14  7:37                           ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-11 22:06 UTC (permalink / raw)
  To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote:
> On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote:
> > I'm leaving now for the weekend, so I won't be able to respond to anyone
> > till Monday.  I'll also run this patch over the weekend while compiling
> > the kernel in an endless loop
> 
> I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it
> goes.

Does not seem to work at all with the above settings.  It seemed OK
until I started X.  Then every time I launched an xterm it would
disappear as soon as I typed anything.  I could not switch consoles to
see the Oops.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-11 22:06                         ` Lee Revell
@ 2005-03-14  7:37                           ` Steven Rostedt
  2005-03-14  9:33                             ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-14  7:37 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel


On Fri, 11 Mar 2005, Lee Revell wrote:

> On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote:
> > On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote:
> > > I'm leaving now for the weekend, so I won't be able to respond to anyone
> > > till Monday.  I'll also run this patch over the weekend while compiling
> > > the kernel in an endless loop
> >
> > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it
> > goes.
>
> Does not seem to work at all with the above settings.  It seemed OK
> until I started X.  Then every time I launched an xterm it would
> disappear as soon as I typed anything.  I could not switch consoles to
> see the Oops.
>

Hi Lee,

I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on
my test machine) as data=ordered.  I had no problem getting to X, starting
an xterm and running a make. Actually it was a gnome-term since I didn't
have xterm. But then I su to root, apt-get xterm, ran xterm, and did a
make there with no problems.

Did you patch this against 39-02 or -40-X?

I haven't had time to upgrade to 40 yet.  Maybe, I'll work on that today.

Maybe your crash has something else to do with.  My test machine has a
serial hookup that I can look at even if the term goes down. I'll see if
40 gives me problems.

-- Steve

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-14  7:37                           ` Steven Rostedt
@ 2005-03-14  9:33                             ` Steven Rostedt
  2005-03-14 10:10                               ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-14  9:33 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel



On Mon, 14 Mar 2005, Steven Rostedt wrote:

>
> > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it
> > > goes.
> >
> > Does not seem to work at all with the above settings.  It seemed OK
> > until I started X.  Then every time I launched an xterm it would
> > disappear as soon as I typed anything.  I could not switch consoles to
> > see the Oops.
> >
>
> Hi Lee,
>
> I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on
> my test machine) as data=ordered.  I had no problem getting to X, starting
> an xterm and running a make. Actually it was a gnome-term since I didn't
> have xterm. But then I su to root, apt-get xterm, ran xterm, and did a
> make there with no problems.
>
> Did you patch this against 39-02 or -40-X?
>
> I haven't had time to upgrade to 40 yet.  Maybe, I'll work on that today.
>

I just downloaded -40 and applied my patch, compiled it with
PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except
I'm getting the following...

BUG: Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
c0213438
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
CPU:    0
EIP:    0060:[<c0213438>]    Not tainted VLI
EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
EIP is at vt_ioctl+0x18/0x1ab0
eax: 00000000   ebx: 00005603   ecx: 00005603   edx: cb6c8780
esi: c0213420   edi: cc956000   ebp: cb613f18   esp: cb613e48
ds: 007b   es: 007b   ss: 0068   preempt: 00000000
Process XFree86 (pid: 4713, threadinfo=cb612000 task=cb5e0a40)
Stack: cb5e0b90 cb612000 cb5e0a40 c034494c cb5e0a40 00000246 cb613e7c
c0117217
       c0344954 00000006 00000001 00000000 00000000 cb613ebc ce0cce24
c13e1800
       cf1279b8 00000000 00000000 cb613ed4 c01707f1 cf1279b8 00000007
00000000
Call Trace:
 [<c0103cdf>] show_stack+0x7f/0xa0 (28)
 [<c0103e95>] show_registers+0x165/0x1d0 (56)
 [<c0104088>] die+0xc8/0x150 (64)
 [<c0115376>] do_page_fault+0x356/0x6c4 (216)
 [<c0103973>] error_code+0x2b/0x30 (268)
 [<c020e91b>] tty_ioctl+0x34b/0x490 (52)
 [<c016837f>] do_ioctl+0x4f/0x70 (32)
 [<c0168582>] vfs_ioctl+0x62/0x1d0 (40)
 [<c0168751>] sys_ioctl+0x61/0x90 (40)
 [<c0102ec3>] syscall_call+0x7/0xb (-8124)
Code: ff ff 8d 05 88 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85


I'll see if this happens without the patch, and if so, then I'll look into
this further.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-14  9:33                             ` Steven Rostedt
@ 2005-03-14 10:10                               ` Steven Rostedt
  2005-03-14 15:50                                 ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-14 10:10 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel



On Mon, 14 Mar 2005, Steven Rostedt wrote:
>
> I just downloaded -40 and applied my patch, compiled it with
> PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except
> I'm getting the following...
>
> BUG: Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
>  printing eip:
> c0213438
> *pde = 00000000

[snip]

>
>
> I'll see if this happens without the patch, and if so, then I'll look into
> this further.
>

Well, I took out my patch and this bug didn't happen, so I guess it's may
fault!  OK, I'll dig into it further.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-14 10:10                               ` Steven Rostedt
@ 2005-03-14 15:50                                 ` Steven Rostedt
  2005-03-14 19:02                                   ` Steven Rostedt
  2005-03-15 11:44                                   ` Steven Rostedt
  0 siblings, 2 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-14 15:50 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel



On Mon, 14 Mar 2005, Steven Rostedt wrote:
>
> On Mon, 14 Mar 2005, Steven Rostedt wrote:
> >
> > I just downloaded -40 and applied my patch, compiled it with
> > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except
> > I'm getting the following...
> >
> > BUG: Unable to handle kernel NULL pointer dereference at virtual address
> > 00000000
> >  printing eip:
> > c0213438
> > *pde = 00000000
>
> [snip]
>
> >
> >
> > I'll see if this happens without the patch, and if so, then I'll look into
> > this further.
> >
>
> Well, I took out my patch and this bug didn't happen, so I guess it's may
> fault!  OK, I'll dig into it further.
>

Here's a new patch. All I did was move BUFFER_FNS(JournalHead,journalhead)
to inside the #ifdef CONFIG_PREEMPT_RT and my oops went away !?!  This
really bothers me since it just declares some functions and is not used
with CONFIG_PREEMPT_RT off.  I have no idea what's going on.

Lee, can you see if this still crashes for you.


Thanks,

-- Steve


diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-14 09:46:41.000000000 -0500
@@ -80,6 +80,10 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef CONFIG_PREEMPT_RT
+spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED;
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
@@ -1727,6 +1731,9 @@
 		jh = new_jh;
 		new_jh = NULL;		/* We consumed it */
 		set_buffer_jbd(bh);
+#ifdef CONFIG_PREEMPT_RT
+		spin_lock_init(&jh->b_state_lock);
+#endif
 		bh->b_private = jh;
 		jh->b_bh = bh;
 		get_bh(bh);
@@ -1767,26 +1774,34 @@
 		if (jh->b_transaction == NULL &&
 				jh->b_next_transaction == NULL &&
 				jh->b_cp_transaction == NULL) {
-			J_ASSERT_BH(bh, buffer_jbd(bh));
-			J_ASSERT_BH(bh, jh2bh(jh) == bh);
-			BUFFER_TRACE(bh, "remove journal_head");
-			if (jh->b_frozen_data) {
-				printk(KERN_WARNING "%s: freeing "
-						"b_frozen_data\n",
-						__FUNCTION__);
-				kfree(jh->b_frozen_data);
-			}
-			if (jh->b_committed_data) {
-				printk(KERN_WARNING "%s: freeing "
-						"b_committed_data\n",
-						__FUNCTION__);
-				kfree(jh->b_committed_data);
+#ifdef CONFIG_PREEMPT_RT
+			if (atomic_read(&jh->b_state_wait_count)) {
+				BUG_ON(buffer_journalhead(bh));
+				set_buffer_journalhead(bh);
+			} else
+#endif
+			{
+				J_ASSERT_BH(bh, buffer_jbd(bh));
+				J_ASSERT_BH(bh, jh2bh(jh) == bh);
+				BUFFER_TRACE(bh, "remove journal_head");
+				if (jh->b_frozen_data) {
+					printk(KERN_WARNING "%s: freeing "
+					       "b_frozen_data\n",
+					       __FUNCTION__);
+					kfree(jh->b_frozen_data);
+				}
+				if (jh->b_committed_data) {
+					printk(KERN_WARNING "%s: freeing "
+					       "b_committed_data\n",
+					       __FUNCTION__);
+					kfree(jh->b_committed_data);
+				}
+				bh->b_private = NULL;
+				jh->b_bh = NULL;	/* debug, really */
+				clear_buffer_jbd(bh);
+				__brelse(bh);
+				journal_free_journal_head(jh);
 			}
-			bh->b_private = NULL;
-			jh->b_bh = NULL;	/* debug, really */
-			clear_buffer_jbd(bh);
-			__brelse(bh);
-			journal_free_journal_head(jh);
 		} else {
 			BUFFER_TRACE(bh, "journal_head was locked");
 		}
diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c	2005-03-02 02:37:53.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c	2005-03-14 09:46:41.000000000 -0500
@@ -1207,11 +1207,17 @@

 	BUFFER_TRACE(bh, "entry");

+	/*
+	 * Is it OK to check to see if this isn't a jbd buffer outside of
+	 * locks? Now that jbd_lock_bh_state only works with jbd buffers
+	 * I sure hope so.
+	 */
+	if (!buffer_jbd(bh))
+		goto not_jbd;
+
 	jbd_lock_bh_state(bh);
 	spin_lock(&journal->j_list_lock);

-	if (!buffer_jbd(bh))
-		goto not_jbd;
 	jh = bh2jh(bh);

 	/* Critical error: attempting to delete a bitmap buffer, maybe?
@@ -1219,7 +1225,7 @@
 	if (!J_EXPECT_JH(jh, !jh->b_committed_data,
 			 "inconsistent data on disk")) {
 		err = -EIO;
-		goto not_jbd;
+		goto bad_jbd;
 	}

 	if (jh->b_transaction == handle->h_transaction) {
@@ -1274,9 +1280,11 @@
 		}
 	}

-not_jbd:
+
+bad_jbd:
 	spin_unlock(&journal->j_list_lock);
 	jbd_unlock_bh_state(bh);
+not_jbd:
 	__brelse(bh);
 	return err;
 }
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-14 09:46:57.000000000 -0500
@@ -324,6 +324,68 @@
 	return bh->b_private;
 }

+void journal_remove_journal_head(struct buffer_head *bh);
+
+#ifdef CONFIG_PREEMPT_RT
+
+BUFFER_FNS(JournalHead,journalhead)
+
+extern spinlock_t jbd_journal_head_lock;
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	BUG_ON(!bh->b_private);
+	atomic_inc(&bh2jh(bh)->b_state_wait_count);
+	spin_lock(&bh2jh(bh)->b_state_lock);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	int ret;
+
+	BUG_ON(!bh->b_private);
+
+	if ((ret = spin_trylock(&bh2jh(bh)->b_state_lock)))
+		atomic_inc(&bh2jh(bh)->b_state_wait_count);
+
+	return ret;
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return bh2jh(bh) ? spin_is_locked(&bh2jh(bh)->b_state_lock) : 0;
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	int rmjh = 0;
+
+	BUG_ON(!atomic_read(&bh2jh(bh)->b_state_wait_count));
+	atomic_dec(&bh2jh(bh)->b_state_wait_count);
+
+	if (buffer_journalhead(bh)) {
+		clear_buffer_journalhead(bh);
+		rmjh = 1;
+	}
+
+	spin_unlock(&bh2jh(bh)->b_state_lock);
+
+	if (rmjh)
+		journal_remove_journal_head(bh);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_lock(&jbd_journal_head_lock);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_unlock(&jbd_journal_head_lock);
+}
+
+#else /* !CONFIG_PREEMPT_RT */
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);
@@ -354,6 +416,8 @@
 	bit_spin_unlock(BH_JournalHead, &bh->b_state);
 }

+#endif /* CONFIG_PREEMPT_RT */
+
 struct jbd_revoke_table_s;

 /**
@@ -918,7 +982,6 @@
  */
 struct journal_head *journal_add_journal_head(struct buffer_head *bh);
 struct journal_head *journal_grab_journal_head(struct buffer_head *bh);
-void journal_remove_journal_head(struct buffer_head *bh);
 void journal_put_journal_head(struct journal_head *jh);

 /*
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/journal-head.h linux-2.6.11-final-V0.7.40-00/include/linux/journal-head.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/journal-head.h	2005-03-02 02:38:25.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/journal-head.h	2005-03-14 09:46:41.000000000 -0500
@@ -80,6 +80,16 @@
 	 * [j_list_lock]
 	 */
 	struct journal_head *b_cpnext, *b_cpprev;
+
+	/*
+	 * Lock the state of the buffer head.
+	 */
+	spinlock_t b_state_lock;
+
+	/*
+	 * Count the processes that want/have the state lock.
+	 */
+	atomic_t b_state_wait_count;
 };

 #endif		/* JOURNAL_HEAD_H_INCLUDED */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h	2005-03-14 06:00:54.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h	2005-03-14 09:46:41.053696484 -0500
@@ -774,6 +774,10 @@
 }))


+#ifndef CONFIG_PREEMPT_RT
+
+/* These are just plain evil! */
+
 /*
  *  bit-based spin_lock()
  *
@@ -789,10 +793,15 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr)) {
+			preempt_enable();
 			cpu_relax();
+			preempt_disable();
+		}
+	}
 #endif
 	__acquire(bitlock);
 }
@@ -802,9 +811,12 @@
  */
 static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	if (test_and_set_bit(bitnum, addr)) {
+		preempt_enable();
 		return 0;
+	}
 #endif
 	__acquire(bitlock);
 	return 1;
@@ -815,11 +827,12 @@
  */
 static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	BUG_ON(!test_bit(bitnum, addr));
 	smp_mb__before_clear_bit();
 	clear_bit(bitnum, addr);
 #endif
+	preempt_enable();
 	__release(bitlock);
 }

@@ -828,12 +841,15 @@
  */
 static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
+#elif defined CONFIG_PREEMPT
+	return preempt_count();
 #else
 	return 1;
 #endif
 }
+#endif /* CONFIG_PREEMPT_RT */

 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-14 15:50                                 ` Steven Rostedt
@ 2005-03-14 19:02                                   ` Steven Rostedt
  2005-03-15 11:44                                   ` Steven Rostedt
  1 sibling, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-14 19:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel


Hi Ingo,

I've found something that is very interesting and I can't explain it.


On Mon, 14 Mar 2005, Steven Rostedt wrote:
>
>
> On Mon, 14 Mar 2005, Steven Rostedt wrote:
> >
> > On Mon, 14 Mar 2005, Steven Rostedt wrote:
> > >
> > > I just downloaded -40 and applied my patch, compiled it with
> > > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except
> > > I'm getting the following...
> > >
> > > BUG: Unable to handle kernel NULL pointer dereference at virtual address
> > > 00000000
> > >  printing eip:
> > > c0213438
> > > *pde = 00000000
> >
> > [snip]
> >
> > >

All I did now was to add this patch to your -40-00 kernel:

diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-14 13:22:04.000000000 -0500
@@ -324,6 +324,8 @@
 	return bh->b_private;
 }

+BUFFER_FNS(JournalHead,journalhead)
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);



And I get the following output:

BUG: Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
c0213118
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
CPU:    0
EIP:    0060:[<c0213118>]    Not tainted VLI
EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
EIP is at vt_ioctl+0x18/0x1ab0
eax: 00000000   ebx: 00005603   ecx: 00005603   edx: cee14d80
esi: c0213100   edi: cb4bd000   ebp: cc03bf18   esp: cc03be48
ds: 007b   es: 007b   ss: 0068   preempt: 00000000
Process XFree86 (pid: 4709, threadinfo=cc03a000 task=cf0d5020)
Stack: cf0d5170 cc03a000 cf0d5020 c03448ec cf0d5020 00000246 cc03be7c
c0117267
       c03448f4 00000006 00000001 00000000 00000000 cc03bebc cf1b81ec
ce820600
       ce94a9b8 00000000 00000000 cc03bed4 c01704f1 ce94a9b8 00000007
00000000
Call Trace:
 [<c0103cdf>] show_stack+0x7f/0xa0 (28)
 [<c0103e95>] show_registers+0x165/0x1d0 (56)
 [<c0104088>] die+0xc8/0x150 (64)
 [<c01153c6>] do_page_fault+0x356/0x6c4 (216)
 [<c0103973>] error_code+0x2b/0x30 (268)
 [<c020e5fb>] tty_ioctl+0x34b/0x490 (52)
 [<c016807f>] do_ioctl+0x4f/0x70 (32)
 [<c0168282>] vfs_ioctl+0x62/0x1d0 (40)
 [<c0168451>] sys_ioctl+0x61/0x90 (40)
 [<c0102ec3>] syscall_call+0x7/0xb (-8124)
Code: ff ff 8d 05 28 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85



I don't know why. BUFFER_FNS is just defined as:

#define BUFFER_FNS(bit, name)						\
static inline void set_buffer_##name(struct buffer_head *bh)		\
{									\
	set_bit(BH_##bit, &(bh)->b_state);				\
}									\
static inline void clear_buffer_##name(struct buffer_head *bh)		\
{									\
	clear_bit(BH_##bit, &(bh)->b_state);				\
}									\
static inline int buffer_##name(const struct buffer_head *bh)		\
{									\
	return test_bit(BH_##bit, &(bh)->b_state);			\
}

So all it does is make three function that are never used.

set_buffer_journalhead(...)
clear_buffer_journalhead(...)
buffer_journalhead(...)

Unless, some macro uses it, but I don't know why adding that line causes
the bug output that I showed.  If I remove that line, I don't get that
output.  And this is consistent. I've recompiled the kernel several
times, and everytime I compile it with this added patch I get that output.
And everytime without it, it runs fine.

Oh, please note that this only happens with PREEMPT_DESKTOP, and not with
PREEMPT_RT.

I really think this is a symptom of something else and not the cause of
the bug. What do you think?


-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00
  2005-03-11 12:10   ` Andrew Walrond
@ 2005-03-14 20:19     ` Tom Rini
  0 siblings, 0 replies; 125+ messages in thread
From: Tom Rini @ 2005-03-14 20:19 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Fri, Mar 11, 2005 at 12:10:52PM +0000, Andrew Walrond wrote:
> On Friday 11 March 2005 09:28, Ingo Molnar wrote:
> > i have released the -V0.7.40-00 Real-Time Preemption patch, which can be
> > downloaded from the usual place:
> >
> 
> I've lost the thread a little; Is this still x86 only?

The patch itself contains i386, x86_64 and MIPS support.  There's been
patches posted for ARM (I _think_ one version which had a stab at
generic hardirq support for ARM and another without, and I kinda-sorta
think Ingo was waiting for the generic hardirq stuff to settle, which is
another issue) as well a PPC32.

-- 
Tom Rini
http://gate.crashing.org/~trini/

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-14 15:50                                 ` Steven Rostedt
  2005-03-14 19:02                                   ` Steven Rostedt
@ 2005-03-15 11:44                                   ` Steven Rostedt
  2005-03-15 12:00                                     ` Ingo Molnar
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-15 11:44 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel



I've realized that my previous patch had too many problems with the way
the journaling system works.  So I went back to my first approach but
added the journal_head lock as one global lock to keep the buffer head
size smaller. I only added the state lock to the buffer head. I've tested
this for some time now, and it works well (for the test at least). I'll
recompile it with PREEMPT_DESKTOP to see if that works too.


-- Steve



diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c linux-2.6.11-final-V0.7.40-00/fs/buffer.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c	2005-03-02 02:38:10.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/buffer.c	2005-03-15 03:41:15.000000000 -0500
@@ -3003,6 +3003,9 @@
 		preempt_disable();
 		__get_cpu_var(bh_accounting).nr++;
 		recalc_bh_state();
+#ifdef CONFIG_PREEMPT_RT
+		spin_lock_init(&ret->b_jstate_lock);
+#endif
 		preempt_enable();
 	}
 	return ret;
diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-15 03:49:10.000000000 -0500
@@ -82,6 +82,8 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+spinlock_t journal_head_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h	2005-03-02 02:37:45.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h	2005-03-15 03:42:22.000000000 -0500
@@ -62,6 +62,13 @@
 	bh_end_io_t *b_end_io;		/* I/O completion */
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
+
+#ifdef CONFIG_PREEMPT_RT
+	/*
+	 * Fixme: This should be in the journal code.
+	 */
+	spinlock_t b_jstate_lock;	/* lock for journal state. */
+#endif
 };

 /*
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-15 03:45:33.000000000 -0500
@@ -314,6 +314,13 @@
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)

+#ifdef CONFIG_PREEMPT_RT
+extern spinlock_t journal_head_lock;
+#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(&bh->b_##name##_lock)
+#else
+#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state);
+#endif
+
 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
 	return jh->b_bh;
@@ -326,24 +333,36 @@

 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(lock,BH_State,jstate);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(trylock,BH_State,jstate);
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return PICK_SPIN_LOCK(is_locked,BH_State,jstate);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	PICK_SPIN_LOCK(unlock,BH_State,jstate);
+}
+#undef PICK_SPIN_LOCK
+
+#ifdef CONFIG_PREEMPT_RT
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_lock(&journal_head_lock);
 }

+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	spin_unlock(&journal_head_lock);
+}
+#else /* !CONFIG_PREEMPT_RT */
 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_JournalHead, &bh->b_state);
@@ -353,6 +372,7 @@
 {
 	bit_spin_unlock(BH_JournalHead, &bh->b_state);
 }
+#endif /* CONFIG_PREEMPT_RT */

 struct jbd_revoke_table_s;

diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h	2005-03-14 06:00:54.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h	2005-03-15 03:40:31.000000000 -0500
@@ -774,6 +774,10 @@
 }))


+#ifndef CONFIG_PREEMPT_RT
+
+/* These are just plain evil! */
+
 /*
  *  bit-based spin_lock()
  *
@@ -789,10 +793,15 @@
 	 * busywait with less bus contention for a good time to
 	 * attempt to acquire the lock bit.
 	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	while (test_and_set_bit(bitnum, addr)) {
+		while (test_bit(bitnum, addr)) {
+			preempt_enable();
 			cpu_relax();
+			preempt_disable();
+		}
+	}
 #endif
 	__acquire(bitlock);
 }
@@ -802,9 +811,12 @@
  */
 static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
+	preempt_disable();
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	if (test_and_set_bit(bitnum, addr)) {
+		preempt_enable();
 		return 0;
+	}
 #endif
 	__acquire(bitlock);
 	return 1;
@@ -815,11 +827,12 @@
  */
 static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	BUG_ON(!test_bit(bitnum, addr));
 	smp_mb__before_clear_bit();
 	clear_bit(bitnum, addr);
 #endif
+	preempt_enable();
 	__release(bitlock);
 }

@@ -828,12 +841,15 @@
  */
 static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
+#elif defined CONFIG_PREEMPT
+	return preempt_count();
 #else
 	return 1;
 #endif
 }
+#endif /* CONFIG_PREEMPT_RT */

 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 11:44                                   ` Steven Rostedt
@ 2005-03-15 12:00                                     ` Ingo Molnar
  2005-03-15 13:07                                       ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-15 12:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> I've realized that my previous patch had too many problems with the
> way the journaling system works.  So I went back to my first approach
> but added the journal_head lock as one global lock to keep the buffer
> head size smaller. I only added the state lock to the buffer head.
> I've tested this for some time now, and it works well (for the test at
> least). I'll recompile it with PREEMPT_DESKTOP to see if that works
> too.

good progress - but the global lock may be a scalability worry on
upstream though. Would it be possible to just mirror much of the current
lock logic, but with spinlocks instead of bitlocks? And there should be
no #ifdefs on PREEMPT_RT.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 12:00                                     ` Ingo Molnar
@ 2005-03-15 13:07                                       ` Steven Rostedt
  2005-03-15 13:35                                         ` Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-15 13:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel



On Tue, 15 Mar 2005, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > I've realized that my previous patch had too many problems with the
> > way the journaling system works.  So I went back to my first approach
> > but added the journal_head lock as one global lock to keep the buffer
> > head size smaller. I only added the state lock to the buffer head.
> > I've tested this for some time now, and it works well (for the test at
> > least). I'll recompile it with PREEMPT_DESKTOP to see if that works
> > too.
>
> good progress - but the global lock may be a scalability worry on
> upstream though. Would it be possible to just mirror much of the current
> lock logic, but with spinlocks instead of bitlocks? And there should be
> no #ifdefs on PREEMPT_RT.
>

The first patch I had just converted the bit spinlocks to spinlocks but I
thought that adding two spinlocks was too much for every buffer head, even
if it wasn't in the ext3 file system. The journal head spinlock is just
used to add and remove the journal heads from the buffer heads, so I'm not
sure how much contention is on them. I only have a dual smp system, so I
can't test the system on large number of CPUs. What do you think, should
we sacrafice memory for speed?

What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it
the same for both.  Since this fix is only to fix spinlocks that schedule,
I figured that it would be better not to waste the memory of those not
using PREEMPT_RT.  Should I use the opposite PREEMPT_DESKTOP?

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 13:07                                       ` Steven Rostedt
@ 2005-03-15 13:35                                         ` Ingo Molnar
  2005-03-15 13:55                                           ` Steven Rostedt
  2005-03-15 18:05                                           ` Steven Rostedt
  0 siblings, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-15 13:35 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > good progress - but the global lock may be a scalability worry on
> > upstream though. Would it be possible to just mirror much of the current
> > lock logic, but with spinlocks instead of bitlocks? And there should be
> > no #ifdefs on PREEMPT_RT.
> 
> The first patch I had just converted the bit spinlocks to spinlocks
> but I thought that adding two spinlocks was too much for every buffer
> head, even if it wasn't in the ext3 file system. The journal head
> spinlock is just used to add and remove the journal heads from the
> buffer heads, so I'm not sure how much contention is on them. I only
> have a dual smp system, so I can't test the system on large number of
> CPUs. What do you think, should we sacrafice memory for speed?

there are two bad effects of global spinlocks: 1) contention 2)
cacheline bouncing. It's #2 that would affect this spinlock. While i'm
not sure this would show up in usual benchmarks, we should rather err on
the side of more scalability. Two spinlocks are just two more machine
words on most architectures, so i dont think it matters all that much,
while it removes a major wart - as long as the two extra locks are for
ext3 buffer-heads only.

> What should we use instead of #ifdef PREEMPT_RT? Or should we just
> keep it the same for both.  Since this fix is only to fix spinlocks
> that schedule, I figured that it would be better not to waste the
> memory of those not using PREEMPT_RT.  Should I use the opposite
> PREEMPT_DESKTOP?

i'd go for removing bit-spinlocks altogether, in the upstream kernel. It
would simplify things, besides making PREEMPT_RT simpler as well. The
memory overhead is not a big issue i believe. (8 more bytes per ext3 bh,
on x86)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 13:35                                         ` Ingo Molnar
@ 2005-03-15 13:55                                           ` Steven Rostedt
  2005-03-15 19:12                                             ` Andrew Morton
  2005-03-15 18:05                                           ` Steven Rostedt
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-15 13:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel



On Tue, 15 Mar 2005, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> > What should we use instead of #ifdef PREEMPT_RT? Or should we just
> > keep it the same for both.  Since this fix is only to fix spinlocks
> > that schedule, I figured that it would be better not to waste the
> > memory of those not using PREEMPT_RT.  Should I use the opposite
> > PREEMPT_DESKTOP?
>
> i'd go for removing bit-spinlocks altogether, in the upstream kernel. It
> would simplify things, besides making PREEMPT_RT simpler as well. The
> memory overhead is not a big issue i believe. (8 more bytes per ext3 bh,
> on x86)
>

The problem here is that it's not ext3 bh's only. They're still the normal
buffer head.  The problem arrises because the ext3 "journal head" is
allocated within these bit spin locks. I tried to monkey with putting the
locks in the journal heads and have checks to see when to free them, but
it wasn't that simple. I started having problems with some of the freeing
transactions, I might have assumed too much.

I'll give it one more try to get it into the journal heads, but after
that, (if I fail) I'll let someone who understands the ext3 system better
handle this.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 13:35                                         ` Ingo Molnar
  2005-03-15 13:55                                           ` Steven Rostedt
@ 2005-03-15 18:05                                           ` Steven Rostedt
  2005-03-15 19:09                                             ` Lee Revell
                                                               ` (2 more replies)
  1 sibling, 3 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-15 18:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel



On Tue, 15 Mar 2005, Ingo Molnar wrote:
>
> i'd go for removing bit-spinlocks altogether, in the upstream kernel. It
> would simplify things, besides making PREEMPT_RT simpler as well. The
> memory overhead is not a big issue i believe. (8 more bytes per ext3 bh,
> on x86)
>

Hi Ingo,

Damn! The answer was right there in front of my eyes! Here's the cleanest
solution. I forgot about wait_on_bit_lock.  I've converted all the locks
to use this instead.  We probably need to get priority inheritence working
on this too someday, but for now it's better than wasting memory or
getting into deadlocks.

-- Steve

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-15 11:58:14.000000000 -0500
@@ -82,6 +82,17 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+/*
+ * Used in the locking of the bh_state and bh_journalhead bit locks.
+ */
+int jbd_lock_bh_sleep(void *notused)
+{
+	schedule();
+	return 0;
+}
+#endif
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-15 11:58:40.000000000 -0500
@@ -324,34 +324,63 @@
 	return bh->b_private;
 }

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+int jbd_lock_bh_sleep(void *notused);
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	if (test_and_set_bit(BH_State, &bh->b_state))
+		return 0;
+#endif
+	__acquire(bitlock);
+	return 1;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	return test_bit(BH_State, &bh->b_state);
+#else
+	return 1;
+#endif
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_State, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_State);
+#endif
+	__release(bitlock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_JournalHead, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_JournalHead);
+#endif
+	__release(bitlock);
 }

 struct jbd_revoke_table_s;
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h	2005-03-14 06:00:54.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h	2005-03-15 12:19:11.032217736 -0500
@@ -774,67 +774,6 @@
 }))


-/*
- *  bit-based spin_lock()
- *
- * Don't use this unless you really need to: spin_lock() and spin_unlock()
- * are significantly faster.
- */
-static inline void bit_spin_lock(int bitnum, unsigned long *addr)
-{
-	/*
-	 * Assuming the lock is uncontended, this never enters
-	 * the body of the outer loop. If it is contended, then
-	 * within the inner loop a non-atomic test is used to
-	 * busywait with less bus contention for a good time to
-	 * attempt to acquire the lock bit.
-	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
-			cpu_relax();
-#endif
-	__acquire(bitlock);
-}
-
-/*
- * Return true if it was acquired
- */
-static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
-		return 0;
-#endif
-	__acquire(bitlock);
-	return 1;
-}
-
-/*
- *  bit-based spin_unlock()
- */
-static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	BUG_ON(!test_bit(bitnum, addr));
-	smp_mb__before_clear_bit();
-	clear_bit(bitnum, addr);
-#endif
-	__release(bitlock);
-}
-
-/*
- * Return true if the lock is held.
- */
-static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	return test_bit(bitnum, addr);
-#else
-	return 1;
-#endif
-}
-
 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 18:05                                           ` Steven Rostedt
@ 2005-03-15 19:09                                             ` Lee Revell
  2005-03-16  7:50                                               ` Steven Rostedt
  2005-03-16  7:31                                             ` Steven Rostedt
  2005-03-16  8:50                                             ` Ingo Molnar
  2 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-15 19:09 UTC (permalink / raw)
  To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote:
> Damn! The answer was right there in front of my eyes! Here's the cleanest
> solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> to use this instead.  We probably need to get priority inheritence working
> on this too someday, but for now it's better than wasting memory or
> getting into deadlocks.
> 

I am still not clear on why this did not hit with earlier kernels +
PREEMPT_DESKTOP.  Were the bitlocks introduced recently?  Or was another
lock-break patch dropped?

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 13:55                                           ` Steven Rostedt
@ 2005-03-15 19:12                                             ` Andrew Morton
  0 siblings, 0 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-15 19:12 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, rlrevell, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The problem here is that it's not ext3 bh's only. They're still the normal
>  buffer head.  The problem arrises because the ext3 "journal head" is
>  allocated within these bit spin locks.

Yes, the locks do want to live inside the buffer_head.

Stephen has pointed out that we might want to remove
jbd_lock_bh_journal_head() altogether some time, just use
jbd_lock_bh_state() for that.

In 2.4 these locks are global (or per-superblock).  Making them a global
spinlock would be acceptable for 2-ways and probably larger.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 18:05                                           ` Steven Rostedt
  2005-03-15 19:09                                             ` Lee Revell
@ 2005-03-16  7:31                                             ` Steven Rostedt
  2005-03-16  8:50                                             ` Ingo Molnar
  2 siblings, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16  7:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Lee Revell, Andrew Morton, linux-kernel



On Tue, 15 Mar 2005, Steven Rostedt wrote:

>
>
> On Tue, 15 Mar 2005, Ingo Molnar wrote:
> >
> > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It
> > would simplify things, besides making PREEMPT_RT simpler as well. The
> > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh,
> > on x86)
> >
>
> Hi Ingo,
>
> Damn! The answer was right there in front of my eyes! Here's the cleanest
> solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> to use this instead.  We probably need to get priority inheritence working
> on this too someday, but for now it's better than wasting memory or
> getting into deadlocks.
>

One bit of caution on these. If we don't have PREEMPT_RT, then don't the
spinlocks on SMP act the same as normal spinlocks, and that we should not
schedule holding a spinlock? I believe that some of this locks are called
within holding spin_locks. So this isn't the right solution for other than
PREEMPT_RT. I also forgot to add might_sleep in the locking calls. Here's
the patch with the might_sleep added.  What should we do for non
PREEPMT_RT?  Maybe put the bit_spinlocks back in for that case?

-- Steve

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-15 11:58:14.000000000 -0500
@@ -82,6 +82,17 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+/*
+ * Used in the locking of the bh_state and bh_journalhead bit locks.
+ */
+int jbd_lock_bh_sleep(void *notused)
+{
+	schedule();
+	return 0;
+}
+#endif
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-16 02:25:31.881251828 -0500
@@ -324,34 +324,65 @@
 	return bh->b_private;
 }

+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+int jbd_lock_bh_sleep(void *notused);
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	might_sleep();
+	wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	if (test_and_set_bit(BH_State, &bh->b_state))
+		return 0;
+#endif
+	__acquire(bitlock);
+	return 1;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	return test_bit(BH_State, &bh->b_state);
+#else
+	return 1;
+#endif
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_State, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_State);
+#endif
+	__release(bitlock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	might_sleep();
+	wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_JournalHead, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_JournalHead);
+#endif
+	__release(bitlock);
 }

 struct jbd_revoke_table_s;
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h	2005-03-14 06:00:54.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h	2005-03-15 12:19:11.000000000 -0500
@@ -774,67 +774,6 @@
 }))


-/*
- *  bit-based spin_lock()
- *
- * Don't use this unless you really need to: spin_lock() and spin_unlock()
- * are significantly faster.
- */
-static inline void bit_spin_lock(int bitnum, unsigned long *addr)
-{
-	/*
-	 * Assuming the lock is uncontended, this never enters
-	 * the body of the outer loop. If it is contended, then
-	 * within the inner loop a non-atomic test is used to
-	 * busywait with less bus contention for a good time to
-	 * attempt to acquire the lock bit.
-	 */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	while (test_and_set_bit(bitnum, addr))
-		while (test_bit(bitnum, addr))
-			cpu_relax();
-#endif
-	__acquire(bitlock);
-}
-
-/*
- * Return true if it was acquired
- */
-static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	if (test_and_set_bit(bitnum, addr))
-		return 0;
-#endif
-	__acquire(bitlock);
-	return 1;
-}
-
-/*
- *  bit-based spin_unlock()
- */
-static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	BUG_ON(!test_bit(bitnum, addr));
-	smp_mb__before_clear_bit();
-	clear_bit(bitnum, addr);
-#endif
-	__release(bitlock);
-}
-
-/*
- * Return true if the lock is held.
- */
-static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
-	return test_bit(bitnum, addr);
-#else
-	return 1;
-#endif
-}
-
 #define DEFINE_SPINLOCK(name) \
 	spinlock_t name __cacheline_aligned_in_smp = _SPIN_LOCK_UNLOCKED(name)


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 19:09                                             ` Lee Revell
@ 2005-03-16  7:50                                               ` Steven Rostedt
  2005-03-16 18:21                                                 ` Lee Revell
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16  7:50 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel



On Tue, 15 Mar 2005, Lee Revell wrote:

> On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote:
> > Damn! The answer was right there in front of my eyes! Here's the cleanest
> > solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> > to use this instead.  We probably need to get priority inheritence working
> > on this too someday, but for now it's better than wasting memory or
> > getting into deadlocks.
> >
>
> I am still not clear on why this did not hit with earlier kernels +
> PREEMPT_DESKTOP.  Were the bitlocks introduced recently?  Or was another
> lock-break patch dropped?
>

When did you start seeing this? This code has been there as far back as
2.6.7 (the earliest 2.6 kernel I still have laying around) and as far
back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't
start picking this up till later, or that you were just lucky that no
contention was happening on that lock.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-15 18:05                                           ` Steven Rostedt
  2005-03-15 19:09                                             ` Lee Revell
  2005-03-16  7:31                                             ` Steven Rostedt
@ 2005-03-16  8:50                                             ` Ingo Molnar
  2005-03-16  9:15                                               ` Andrew Morton
  2 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16  8:50 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Lee Revell, Andrew Morton, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Damn! The answer was right there in front of my eyes! Here's the
> cleanest solution. I forgot about wait_on_bit_lock.  I've converted
> all the locks to use this instead. [...]

ah, indeed, this looks really nifty. Andrew?

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-16  8:50                                             ` Ingo Molnar
@ 2005-03-16  9:15                                               ` Andrew Morton
  2005-03-16  9:51                                                 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Andrew Morton @ 2005-03-16  9:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Damn! The answer was right there in front of my eyes! Here's the
> > cleanest solution. I forgot about wait_on_bit_lock.  I've converted
> > all the locks to use this instead. [...]
> 
> ah, indeed, this looks really nifty. Andrew?
> 

There's a little lock ranking diagram in jbd.h which tells us that these
locks nest inside j_list_lock and j_state_lock.  So I guess you'll need to
turn those into semaphores.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16  9:15                                               ` Andrew Morton
@ 2005-03-16  9:51                                                 ` Ingo Molnar
  2005-03-16  9:53                                                   ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar
  2005-03-16 10:04                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton
  0 siblings, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16  9:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> > > Damn! The answer was right there in front of my eyes! Here's the
> > > cleanest solution. I forgot about wait_on_bit_lock.  I've converted
> > > all the locks to use this instead. [...]
> > 
> > ah, indeed, this looks really nifty. Andrew?
> > 
> 
> There's a little lock ranking diagram in jbd.h which tells us that
> these locks nest inside j_list_lock and j_state_lock.  So I guess
> you'll need to turn those into semaphores.

indeed. I did this (see the three followup patches, against BK-curr),
and it builds/boots/works just fine on an ext3 box. Do we want to try
this in -mm?

one worry would be that while spinlocks are NOP on UP, semaphores are
not. OTOH, this could relax some of the preemptability constraints
within ext3 and could make it more hackable. These patches enabled the
removal of some of the lock-break code for example and could likely
solve some of the remaining ext3 latencies.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [patch 1/3] j_state_lock -> j_state_sem
  2005-03-16  9:51                                                 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar
@ 2005-03-16  9:53                                                   ` Ingo Molnar
  2005-03-16  9:53                                                     ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar
  2005-03-16 10:04                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16  9:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


this patch turns the j_state_lock spinlock into a mutex. 
Builds/boots/works fine on x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/jbd/checkpoint.c.orig
+++ linux/fs/jbd/checkpoint.c
@@ -78,25 +78,24 @@ static int __try_to_free_cp_buf(struct j
 void __log_wait_for_space(journal_t *journal)
 {
 	int nblocks;
-	assert_spin_locked(&journal->j_state_lock);
 
 	nblocks = jbd_space_needed(journal);
 	while (__log_space_left(journal) < nblocks) {
 		if (journal->j_flags & JFS_ABORT)
 			return;
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		down(&journal->j_checkpoint_sem);
 
 		/*
 		 * Test again, another process may have checkpointed while we
 		 * were waiting for the checkpoint lock
 		 */
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 		nblocks = jbd_space_needed(journal);
 		if (__log_space_left(journal) < nblocks) {
-			spin_unlock(&journal->j_state_lock);
+			up(&journal->j_state_sem);
 			log_do_checkpoint(journal);
-			spin_lock(&journal->j_state_lock);
+			down(&journal->j_state_sem);
 		}
 		up(&journal->j_checkpoint_sem);
 	}
@@ -404,7 +403,7 @@ int cleanup_journal_tail(journal_t *jour
 	 * next transaction ID we will write, and where it will
 	 * start. */
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	spin_lock(&journal->j_list_lock);
 	transaction = journal->j_checkpoint_transactions;
 	if (transaction) {
@@ -426,7 +425,7 @@ int cleanup_journal_tail(journal_t *jour
 	/* If the oldest pinned transaction is at the tail of the log
            already then there's not much we can do right now. */
 	if (journal->j_tail_sequence == first_tid) {
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		return 1;
 	}
 
@@ -445,7 +444,7 @@ int cleanup_journal_tail(journal_t *jour
 	journal->j_free += freed;
 	journal->j_tail_sequence = first_tid;
 	journal->j_tail = blocknr;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	if (!(journal->j_flags & JFS_ABORT))
 		journal_update_superblock(journal, 1);
 	return 0;
--- linux/fs/jbd/transaction.c.orig
+++ linux/fs/jbd/transaction.c
@@ -40,7 +40,7 @@
  *	new transaction	and we can't block without protecting against other
  *	processes trying to touch the journal while it is in transition.
  *
- * Called under j_state_lock
+ * Called under j_state_sem
  */
 
 static transaction_t *
@@ -109,21 +109,21 @@ alloc_transaction:
 repeat:
 
 	/*
-	 * We need to hold j_state_lock until t_updates has been incremented,
+	 * We need to hold j_state_sem until t_updates has been incremented,
 	 * for proper journal barrier handling
 	 */
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 repeat_locked:
 	if (is_journal_aborted(journal) ||
 	    (journal->j_errno != 0 && !(journal->j_flags & JFS_ACK_ERR))) {
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		ret = -EROFS; 
 		goto out;
 	}
 
 	/* Wait on the journal's transaction barrier if necessary */
 	if (journal->j_barrier_count) {
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		wait_event(journal->j_wait_transaction_locked,
 				journal->j_barrier_count == 0);
 		goto repeat;
@@ -131,7 +131,7 @@ repeat_locked:
 
 	if (!journal->j_running_transaction) {
 		if (!new_transaction) {
-			spin_unlock(&journal->j_state_lock);
+			up(&journal->j_state_sem);
 			goto alloc_transaction;
 		}
 		get_transaction(journal, new_transaction);
@@ -149,7 +149,7 @@ repeat_locked:
 
 		prepare_to_wait(&journal->j_wait_transaction_locked,
 					&wait, TASK_UNINTERRUPTIBLE);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		schedule();
 		finish_wait(&journal->j_wait_transaction_locked, &wait);
 		goto repeat;
@@ -176,7 +176,7 @@ repeat_locked:
 		prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
 				TASK_UNINTERRUPTIBLE);
 		__log_start_commit(journal, transaction->t_tid);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		schedule();
 		finish_wait(&journal->j_wait_transaction_locked, &wait);
 		goto repeat;
@@ -225,7 +225,7 @@ repeat_locked:
 		  handle, nblocks, transaction->t_outstanding_credits,
 		  __log_space_left(journal));
 	spin_unlock(&transaction->t_handle_lock);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 out:
 	if (new_transaction)
 		kfree(new_transaction);
@@ -321,7 +321,7 @@ int journal_extend(handle_t *handle, int
 
 	result = 1;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 
 	/* Don't extend a locked-down transaction! */
 	if (handle->h_transaction->t_state != T_RUNNING) {
@@ -353,7 +353,7 @@ int journal_extend(handle_t *handle, int
 unlock:
 	spin_unlock(&transaction->t_handle_lock);
 error_out:
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 out:
 	return result;
 }
@@ -392,7 +392,7 @@ int journal_restart(handle_t *handle, in
 	J_ASSERT(transaction->t_updates > 0);
 	J_ASSERT(journal_current_handle() == handle);
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	spin_lock(&transaction->t_handle_lock);
 	transaction->t_outstanding_credits -= handle->h_buffer_credits;
 	transaction->t_updates--;
@@ -403,7 +403,7 @@ int journal_restart(handle_t *handle, in
 
 	jbd_debug(2, "restarting handle %p\n", handle);
 	__log_start_commit(journal, transaction->t_tid);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	handle->h_buffer_credits = nblocks;
 	ret = start_this_handle(journal, handle);
@@ -425,7 +425,7 @@ void journal_lock_updates(journal_t *jou
 {
 	DEFINE_WAIT(wait);
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	++journal->j_barrier_count;
 
 	/* Wait until there are no running updates */
@@ -443,12 +443,12 @@ void journal_lock_updates(journal_t *jou
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 				TASK_UNINTERRUPTIBLE);
 		spin_unlock(&transaction->t_handle_lock);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		schedule();
 		finish_wait(&journal->j_wait_updates, &wait);
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 	}
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	/*
 	 * We have now established a barrier against other normal updates, but
@@ -472,9 +472,9 @@ void journal_unlock_updates (journal_t *
 	J_ASSERT(journal->j_barrier_count != 0);
 
 	up(&journal->j_barrier);
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	--journal->j_barrier_count;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	wake_up(&journal->j_wait_transaction_locked);
 }
 
@@ -1336,7 +1336,7 @@ int journal_stop(handle_t *handle)
 	}
 
 	current->journal_info = NULL;
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	spin_lock(&transaction->t_handle_lock);
 	transaction->t_outstanding_credits -= handle->h_buffer_credits;
 	transaction->t_updates--;
@@ -1366,7 +1366,7 @@ int journal_stop(handle_t *handle)
 					"handle %p\n", handle);
 		/* This is non-blocking */
 		__log_start_commit(journal, transaction->t_tid);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 
 		/*
 		 * Special case: JFS_SYNC synchronous updates require us
@@ -1376,7 +1376,7 @@ int journal_stop(handle_t *handle)
 			err = log_wait_commit(journal, tid);
 	} else {
 		spin_unlock(&transaction->t_handle_lock);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 	}
 
 	jbd_free_handle(handle);
@@ -1739,7 +1739,7 @@ static int journal_unmap_buffer(journal_
 	if (!buffer_jbd(bh))
 		goto zap_buffer_unlocked;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	jbd_lock_bh_state(bh);
 	spin_lock(&journal->j_list_lock);
 
@@ -1776,7 +1776,7 @@ static int journal_unmap_buffer(journal_
 					journal->j_running_transaction);
 			spin_unlock(&journal->j_list_lock);
 			jbd_unlock_bh_state(bh);
-			spin_unlock(&journal->j_state_lock);
+			up(&journal->j_state_sem);
 			journal_put_journal_head(jh);
 			return ret;
 		} else {
@@ -1790,7 +1790,7 @@ static int journal_unmap_buffer(journal_
 					journal->j_committing_transaction);
 				spin_unlock(&journal->j_list_lock);
 				jbd_unlock_bh_state(bh);
-				spin_unlock(&journal->j_state_lock);
+				up(&journal->j_state_sem);
 				journal_put_journal_head(jh);
 				return ret;
 			} else {
@@ -1814,7 +1814,7 @@ static int journal_unmap_buffer(journal_
 		}
 		spin_unlock(&journal->j_list_lock);
 		jbd_unlock_bh_state(bh);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		journal_put_journal_head(jh);
 		return 0;
 	} else {
@@ -1833,7 +1833,7 @@ zap_buffer:
 zap_buffer_no_jh:
 	spin_unlock(&journal->j_list_lock);
 	jbd_unlock_bh_state(bh);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 zap_buffer_unlocked:
 	clear_buffer_dirty(bh);
 	J_ASSERT_BH(bh, !buffer_jbddirty(bh));
--- linux/fs/jbd/commit.c.orig
+++ linux/fs/jbd/commit.c
@@ -144,9 +144,9 @@ static int journal_write_commit_record(j
 			"JBD: barrier-based sync failed on %s - "
 			"disabling barriers\n",
 			bdevname(journal->j_dev, b));
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 		journal->j_flags &= ~JFS_BARRIER;
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 
 		/* And try again, without the barrier */
 		clear_buffer_ordered(bh);
@@ -211,7 +211,7 @@ void journal_commit_transaction(journal_
 	jbd_debug(1, "JBD: starting commit of transaction %d\n",
 			commit_transaction->t_tid);
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	commit_transaction->t_state = T_LOCKED;
 
 	spin_lock(&commit_transaction->t_handle_lock);
@@ -222,9 +222,9 @@ void journal_commit_transaction(journal_
 					TASK_UNINTERRUPTIBLE);
 		if (commit_transaction->t_updates) {
 			spin_unlock(&commit_transaction->t_handle_lock);
-			spin_unlock(&journal->j_state_lock);
+			up(&journal->j_state_sem);
 			schedule();
-			spin_lock(&journal->j_state_lock);
+			down(&journal->j_state_sem);
 			spin_lock(&commit_transaction->t_handle_lock);
 		}
 		finish_wait(&journal->j_wait_updates, &wait);
@@ -291,7 +291,7 @@ void journal_commit_transaction(journal_
 	journal->j_running_transaction = NULL;
 	commit_transaction->t_log_start = journal->j_head;
 	wake_up(&journal->j_wait_transaction_locked);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	jbd_debug (3, "JBD: commit phase 2\n");
 
@@ -806,16 +806,16 @@ restart_loop:
 	/*
 	 * This is a bit sleazy.  We borrow j_list_lock to protect
 	 * journal->j_committing_transaction in __journal_remove_checkpoint.
-	 * Really, __jornal_remove_checkpoint should be using j_state_lock but
+	 * Really, __jornal_remove_checkpoint should be using j_state_sem but
 	 * it's a bit hassle to hold that across __journal_remove_checkpoint
 	 */
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	spin_lock(&journal->j_list_lock);
 	commit_transaction->t_state = T_FINISHED;
 	J_ASSERT(commit_transaction == journal->j_committing_transaction);
 	journal->j_commit_sequence = commit_transaction->t_tid;
 	journal->j_committing_transaction = NULL;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	if (commit_transaction->t_checkpoint_list == NULL) {
 		__journal_drop_transaction(journal, commit_transaction);
--- linux/fs/jbd/journal.c.orig
+++ linux/fs/jbd/journal.c
@@ -148,7 +148,7 @@ int kjournald(void *arg)
 	/*
 	 * And now, wait forever for commit wakeup events.
 	 */
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 
 loop:
 	if (journal->j_flags & JFS_UNMOUNT)
@@ -159,10 +159,10 @@ loop:
 
 	if (journal->j_commit_sequence != journal->j_commit_request) {
 		jbd_debug(1, "OK, requests differ\n");
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		del_timer_sync(journal->j_commit_timer);
 		journal_commit_transaction(journal);
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 		goto loop;
 	}
 
@@ -174,9 +174,9 @@ loop:
 		 * be already stopped.
 		 */
 		jbd_debug(1, "Now suspending kjournald\n");
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		refrigerator(PF_FREEZE);
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 	} else {
 		/*
 		 * We assume on resume that commits are already there,
@@ -194,9 +194,9 @@ loop:
 						transaction->t_expires))
 			should_sleep = 0;
 		if (should_sleep) {
-			spin_unlock(&journal->j_state_lock);
+			up(&journal->j_state_sem);
 			schedule();
-			spin_lock(&journal->j_state_lock);
+			down(&journal->j_state_sem);
 		}
 		finish_wait(&journal->j_wait_commit, &wait);
 	}
@@ -214,7 +214,7 @@ loop:
 	goto loop;
 
 end_loop:
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	del_timer_sync(journal->j_commit_timer);
 	journal->j_task = NULL;
 	wake_up(&journal->j_wait_done_commit);
@@ -230,16 +230,16 @@ static void journal_start_thread(journal
 
 static void journal_kill_thread(journal_t *journal)
 {
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	journal->j_flags |= JFS_UNMOUNT;
 
 	while (journal->j_task) {
 		wake_up(&journal->j_wait_commit);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		wait_event(journal->j_wait_done_commit, journal->j_task == 0);
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 	}
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 }
 
 /*
@@ -408,15 +408,13 @@ repeat:
  *
  * Called with the journal already locked.
  *
- * Called under j_state_lock
+ * Called under j_state_sem
  */
 
 int __log_space_left(journal_t *journal)
 {
 	int left = journal->j_free;
 
-	assert_spin_locked(&journal->j_state_lock);
-
 	/*
 	 * Be pessimistic here about the number of those free blocks which
 	 * might be required for log descriptor control blocks.
@@ -433,7 +431,7 @@ int __log_space_left(journal_t *journal)
 }
 
 /*
- * Called under j_state_lock.  Returns true if a transaction was started.
+ * Called under j_state_sem.  Returns true if a transaction was started.
  */
 int __log_start_commit(journal_t *journal, tid_t target)
 {
@@ -460,9 +458,9 @@ int log_start_commit(journal_t *journal,
 {
 	int ret;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	ret = __log_start_commit(journal, tid);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return ret;
 }
 
@@ -481,7 +479,7 @@ int journal_force_commit_nested(journal_
 	transaction_t *transaction = NULL;
 	tid_t tid;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (journal->j_running_transaction && !current->journal_info) {
 		transaction = journal->j_running_transaction;
 		__log_start_commit(journal, transaction->t_tid);
@@ -489,12 +487,12 @@ int journal_force_commit_nested(journal_
 		transaction = journal->j_committing_transaction;
 
 	if (!transaction) {
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		return 0;	/* Nothing to retry */
 	}
 
 	tid = transaction->t_tid;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	log_wait_commit(journal, tid);
 	return 1;
 }
@@ -507,7 +505,7 @@ int journal_start_commit(journal_t *jour
 {
 	int ret = 0;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (journal->j_running_transaction) {
 		tid_t tid = journal->j_running_transaction->t_tid;
 
@@ -522,7 +520,7 @@ int journal_start_commit(journal_t *jour
 		*ptid = journal->j_committing_transaction->t_tid;
 		ret = 1;
 	}
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return ret;
 }
 
@@ -535,25 +533,25 @@ int log_wait_commit(journal_t *journal, 
 	int err = 0;
 
 #ifdef CONFIG_JBD_DEBUG
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (!tid_geq(journal->j_commit_request, tid)) {
 		printk(KERN_EMERG
 		       "%s: error: j_commit_request=%d, tid=%d\n",
 		       __FUNCTION__, journal->j_commit_request, tid);
 	}
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 #endif
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	while (tid_gt(tid, journal->j_commit_sequence)) {
 		jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
 				  tid, journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		wait_event(journal->j_wait_done_commit,
 				!tid_gt(tid, journal->j_commit_sequence));
-		spin_lock(&journal->j_state_lock);
+		down(&journal->j_state_sem);
 	}
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	if (unlikely(is_journal_aborted(journal))) {
 		printk(KERN_EMERG "journal commit I/O error\n");
@@ -570,7 +568,7 @@ int journal_next_log_block(journal_t *jo
 {
 	unsigned long blocknr;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	J_ASSERT(journal->j_free > 1);
 
 	blocknr = journal->j_head;
@@ -578,7 +576,7 @@ int journal_next_log_block(journal_t *jo
 	journal->j_free--;
 	if (journal->j_head == journal->j_last)
 		journal->j_head = journal->j_first;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return journal_bmap(journal, blocknr, retp);
 }
 
@@ -675,7 +673,7 @@ static journal_t * journal_init_common (
 	init_MUTEX(&journal->j_checkpoint_sem);
 	spin_lock_init(&journal->j_revoke_lock);
 	spin_lock_init(&journal->j_list_lock);
-	spin_lock_init(&journal->j_state_lock);
+	init_MUTEX(&journal->j_state_sem);
 
 	journal->j_commit_interval = (HZ * JBD_DEFAULT_MAX_COMMIT_AGE);
 
@@ -955,14 +953,14 @@ void journal_update_superblock(journal_t
 		goto out;
 	}
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	jbd_debug(1,"JBD: updating superblock (start %ld, seq %d, errno %d)\n",
 		  journal->j_tail, journal->j_tail_sequence, journal->j_errno);
 
 	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
 	sb->s_start    = cpu_to_be32(journal->j_tail);
 	sb->s_errno    = cpu_to_be32(journal->j_errno);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 
 	BUFFER_TRACE(bh, "marking dirty");
 	mark_buffer_dirty(bh);
@@ -976,12 +974,12 @@ out:
 	 * any future commit will have to be careful to update the
 	 * superblock again to re-record the true start of the log. */
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (sb->s_start)
 		journal->j_flags &= ~JFS_FLUSHED;
 	else
 		journal->j_flags |= JFS_FLUSHED;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 }
 
 /*
@@ -1343,7 +1341,7 @@ int journal_flush(journal_t *journal)
 	transaction_t *transaction = NULL;
 	unsigned long old_tail;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 
 	/* Force everything buffered to the log... */
 	if (journal->j_running_transaction) {
@@ -1356,10 +1354,10 @@ int journal_flush(journal_t *journal)
 	if (transaction) {
 		tid_t tid = transaction->t_tid;
 
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 		log_wait_commit(journal, tid);
 	} else {
-		spin_unlock(&journal->j_state_lock);
+		up(&journal->j_state_sem);
 	}
 
 	/* ...and flush everything in the log out to disk. */
@@ -1377,12 +1375,12 @@ int journal_flush(journal_t *journal)
 	 * the magic code for a fully-recovered superblock.  Any future
 	 * commits of data to the journal will restore the current
 	 * s_start value. */
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	old_tail = journal->j_tail;
 	journal->j_tail = 0;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	journal_update_superblock(journal, 1);
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	journal->j_tail = old_tail;
 
 	J_ASSERT(!journal->j_running_transaction);
@@ -1390,7 +1388,7 @@ int journal_flush(journal_t *journal)
 	J_ASSERT(!journal->j_checkpoint_transactions);
 	J_ASSERT(journal->j_head == journal->j_tail);
 	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return err;
 }
 
@@ -1475,12 +1473,12 @@ void __journal_abort_hard(journal_t *jou
 	printk(KERN_ERR "Aborting journal on device %s.\n",
 		journal_dev_name(journal, b));
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	journal->j_flags |= JFS_ABORT;
 	transaction = journal->j_running_transaction;
 	if (transaction)
 		__log_start_commit(journal, transaction->t_tid);
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 }
 
 /* Soft abort: record the abort error status in the journal superblock,
@@ -1565,12 +1563,12 @@ int journal_errno(journal_t *journal)
 {
 	int err;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (journal->j_flags & JFS_ABORT)
 		err = -EROFS;
 	else
 		err = journal->j_errno;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return err;
 }
 
@@ -1585,12 +1583,12 @@ int journal_clear_err(journal_t *journal
 {
 	int err = 0;
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (journal->j_flags & JFS_ABORT)
 		err = -EROFS;
 	else
 		journal->j_errno = 0;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 	return err;
 }
 
@@ -1603,10 +1601,10 @@ int journal_clear_err(journal_t *journal
  */
 void journal_ack_err(journal_t *journal)
 {
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (journal->j_errno)
 		journal->j_flags |= JFS_ACK_ERR;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 }
 
 int journal_blocks_per_page(struct inode *inode)
--- linux/fs/ext3/super.c.orig
+++ linux/fs/ext3/super.c
@@ -1653,12 +1653,12 @@ static void ext3_init_journal_params(str
 	 * interval here, but for now we'll just fall back to the jbd
 	 * default. */
 
-	spin_lock(&journal->j_state_lock);
+	down(&journal->j_state_sem);
 	if (test_opt(sb, BARRIER))
 		journal->j_flags |= JFS_BARRIER;
 	else
 		journal->j_flags &= ~JFS_BARRIER;
-	spin_unlock(&journal->j_state_lock);
+	up(&journal->j_state_sem);
 }
 
 static journal_t *ext3_get_journal(struct super_block *sb, int journal_inum)
--- linux/include/linux/jbd.h.orig
+++ linux/include/linux/jbd.h
@@ -416,16 +416,16 @@ struct handle_s 
  *    j_list_lock
  *      ->jbd_lock_bh_journal_head()	(This is "innermost")
  *
- *    j_state_lock
+ *    j_state_sem
  *    ->jbd_lock_bh_state()
  *
  *    jbd_lock_bh_state()
  *    ->j_list_lock
  *
- *    j_state_lock
+ *    j_state_sem
  *    ->t_handle_lock
  *
- *    j_state_lock
+ *    j_state_sem
  *    ->j_list_lock			(journal_unmap_buffer)
  *
  */
@@ -442,7 +442,7 @@ struct transaction_s 
 	 * Transaction's current state
 	 * [no locking - only kjournald alters this]
 	 * FIXME: needs barriers
-	 * KLUDGE: [use j_state_lock]
+	 * KLUDGE: [use j_state_sem]
 	 */
 	enum {
 		T_RUNNING,
@@ -562,7 +562,7 @@ struct transaction_s 
  * @j_sb_buffer: First part of superblock buffer
  * @j_superblock: Second part of superblock buffer
  * @j_format_version: Version of the superblock format
- * @j_state_lock: Protect the various scalars in the journal
+ * @j_state_sem: Protect the various scalars in the journal
  * @j_barrier_count:  Number of processes waiting to create a barrier lock
  * @j_barrier: The barrier lock itself
  * @j_running_transaction: The current running transaction..
@@ -615,12 +615,12 @@ struct transaction_s 
 
 struct journal_s
 {
-	/* General journaling state flags [j_state_lock] */
+	/* General journaling state flags [j_state_sem] */
 	unsigned long		j_flags;
 
 	/*
 	 * Is there an outstanding uncleared error on the journal (from a prior
-	 * abort)? [j_state_lock]
+	 * abort)? [j_state_sem]
 	 */
 	int			j_errno;
 
@@ -634,10 +634,10 @@ struct journal_s
 	/*
 	 * Protect the various scalars in the journal
 	 */
-	spinlock_t		j_state_lock;
+	struct semaphore	j_state_sem;
 
 	/*
-	 * Number of processes waiting to create a barrier lock [j_state_lock]
+	 * Number of processes waiting to create a barrier lock [j_state_sem]
 	 */
 	int			j_barrier_count;
 
@@ -646,13 +646,13 @@ struct journal_s
 
 	/*
 	 * Transactions: The current running transaction...
-	 * [j_state_lock] [caller holding open handle]
+	 * [j_state_sem] [caller holding open handle]
 	 */
 	transaction_t		*j_running_transaction;
 
 	/*
 	 * the transaction we are pushing to disk
-	 * [j_state_lock] [caller holding open handle]
+	 * [j_state_sem] [caller holding open handle]
 	 */
 	transaction_t		*j_committing_transaction;
 
@@ -688,25 +688,25 @@ struct journal_s
 
 	/*
 	 * Journal head: identifies the first unused block in the journal.
-	 * [j_state_lock]
+	 * [j_state_sem]
 	 */
 	unsigned long		j_head;
 
 	/*
 	 * Journal tail: identifies the oldest still-used block in the journal.
-	 * [j_state_lock]
+	 * [j_state_sem]
 	 */
 	unsigned long		j_tail;
 
 	/*
 	 * Journal free: how many free blocks are there in the journal?
-	 * [j_state_lock]
+	 * [j_state_sem]
 	 */
 	unsigned long		j_free;
 
 	/*
 	 * Journal start and end: the block numbers of the first usable block
-	 * and one beyond the last usable block in the journal. [j_state_lock]
+	 * and one beyond the last usable block in the journal. [j_state_sem]
 	 */
 	unsigned long		j_first;
 	unsigned long		j_last;
@@ -739,24 +739,24 @@ struct journal_s
 	struct inode		*j_inode;
 
 	/*
-	 * Sequence number of the oldest transaction in the log [j_state_lock]
+	 * Sequence number of the oldest transaction in the log [j_state_sem]
 	 */
 	tid_t			j_tail_sequence;
 
 	/*
-	 * Sequence number of the next transaction to grant [j_state_lock]
+	 * Sequence number of the next transaction to grant [j_state_sem]
 	 */
 	tid_t			j_transaction_sequence;
 
 	/*
 	 * Sequence number of the most recently committed transaction
-	 * [j_state_lock].
+	 * [j_state_sem].
 	 */
 	tid_t			j_commit_sequence;
 
 	/*
 	 * Sequence number of the most recent transaction wanting commit
-	 * [j_state_lock]
+	 * [j_state_sem]
 	 */
 	tid_t			j_commit_request;
 
@@ -858,7 +858,7 @@ extern void		__wait_on_journal (journal_
  *
  * We need to lock the journal during transaction state changes so that nobody
  * ever tries to take a handle on the running transaction while we are in the
- * middle of moving it to the commit phase.  j_state_lock does this.
+ * middle of moving it to the commit phase.  j_state_sem does this.
  *
  * Note that the locking is completely interrupt unsafe.  We never touch
  * journal structures from interrupts.
@@ -1039,7 +1039,7 @@ extern int journal_blocks_per_page(struc
 
 /*
  * Return the minimum number of blocks which must be free in the journal
- * before a new transaction may be started.  Must be called under j_state_lock.
+ * before a new transaction may be started.  Must be called under j_state_sem.
  */
 static inline int jbd_space_needed(journal_t *journal)
 {

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [patch 2/3] j_list_lock -> j_list_sem
  2005-03-16  9:53                                                   ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar
@ 2005-03-16  9:53                                                     ` Ingo Molnar
  2005-03-16  9:57                                                       ` [patch 3/3] remove bitlocks Ingo Molnar
  0 siblings, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16  9:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


this patch turns the j_list_lock spinlock into a mutex.
Builds/boots/works fine on x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/jbd/checkpoint.c.orig
+++ linux/fs/jbd/checkpoint.c
@@ -26,7 +26,7 @@
 /*
  * Unlink a buffer from a transaction. 
  *
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  */
 
 static inline void __buffer_unlink(struct journal_head *jh)
@@ -47,7 +47,7 @@ static inline void __buffer_unlink(struc
 /*
  * Try to release a checkpointed buffer from its transaction.
  * Returns 1 if we released it.
- * Requires j_list_lock
+ * Requires j_list_sem
  * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
  */
 static int __try_to_free_cp_buf(struct journal_head *jh)
@@ -102,14 +102,14 @@ void __log_wait_for_space(journal_t *jou
 }
 
 /*
- * We were unable to perform jbd_trylock_bh_state() inside j_list_lock.
+ * We were unable to perform jbd_trylock_bh_state() inside j_list_sem.
  * The caller must restart a list walk.  Wait for someone else to run
  * jbd_unlock_bh_state().
  */
 static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh)
 {
 	get_bh(bh);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_lock_bh_state(bh);
 	jbd_unlock_bh_state(bh);
 	put_bh(bh);
@@ -125,7 +125,7 @@ static void jbd_sync_bh(journal_t *journ
  * checkpoint.  (journal_remove_checkpoint() deletes the transaction when
  * the last checkpoint buffer is cleansed)
  *
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  */
 static int __cleanup_transaction(journal_t *journal, transaction_t *transaction)
 {
@@ -133,7 +133,6 @@ static int __cleanup_transaction(journal
 	struct buffer_head *bh;
 	int ret = 0;
 
-	assert_spin_locked(&journal->j_list_lock);
 	jh = transaction->t_checkpoint_list;
 	if (!jh)
 		return 0;
@@ -145,7 +144,7 @@ static int __cleanup_transaction(journal
 		bh = jh2bh(jh);
 		if (buffer_locked(bh)) {
 			atomic_inc(&bh->b_count);
-			spin_unlock(&journal->j_list_lock);
+			up(&journal->j_list_sem);
 			wait_on_buffer(bh);
 			/* the journal_head may have gone by now */
 			BUFFER_TRACE(bh, "brelse");
@@ -165,7 +164,7 @@ static int __cleanup_transaction(journal
 			transaction_t *t = jh->b_transaction;
 			tid_t tid = t->t_tid;
 
-			spin_unlock(&journal->j_list_lock);
+			up(&journal->j_list_sem);
 			jbd_unlock_bh_state(bh);
 			log_start_commit(journal, tid);
 			log_wait_commit(journal, tid);
@@ -192,7 +191,7 @@ static int __cleanup_transaction(journal
 
 	return ret;
 out_return_1:
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	return 1;
 }
 
@@ -203,9 +202,9 @@ __flush_batch(journal_t *journal, struct
 {
 	int i;
 
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	ll_rw_block(WRITE, *batch_count, bhs);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	for (i = 0; i < *batch_count; i++) {
 		struct buffer_head *bh = bhs[i];
 		clear_buffer_jwrite(bh);
@@ -221,7 +220,7 @@ __flush_batch(journal_t *journal, struct
  * Return 1 if something happened which requires us to abort the current
  * scan of the checkpoint list.  
  *
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
  */
 static int __flush_buffer(journal_t *journal, struct journal_head *jh,
@@ -306,7 +305,7 @@ int log_do_checkpoint(journal_t *journal
 	 * AKPM: check this code.  I had a feeling a while back that it
 	 * degenerates into a busy loop at unmount time.
 	 */
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	while (journal->j_checkpoint_transactions) {
 		transaction_t *transaction;
 		struct journal_head *jh, *last_jh, *next_jh;
@@ -327,15 +326,11 @@ int log_do_checkpoint(journal_t *journal
 			bh = jh2bh(jh);
 			if (!jbd_trylock_bh_state(bh)) {
 				jbd_sync_bh(journal, bh);
-				spin_lock(&journal->j_list_lock);
+				down(&journal->j_list_sem);
 				retry = 1;
 				break;
 			}
 			retry = __flush_buffer(journal, jh, bhs, &batch_count, &drop_count);
-			if (cond_resched_lock(&journal->j_list_lock)) {
-				retry = 1;
-				break;
-			}
 		} while (jh != last_jh && !retry);
 
 		if (batch_count)
@@ -365,7 +360,7 @@ int log_do_checkpoint(journal_t *journal
 		if (journal->j_checkpoint_transactions != transaction)
 			break;
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	result = cleanup_journal_tail(journal);
 	if (result < 0)
 		return result;
@@ -404,7 +399,7 @@ int cleanup_journal_tail(journal_t *jour
 	 * start. */
 
 	down(&journal->j_state_sem);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	transaction = journal->j_checkpoint_transactions;
 	if (transaction) {
 		first_tid = transaction->t_tid;
@@ -419,7 +414,7 @@ int cleanup_journal_tail(journal_t *jour
 		first_tid = journal->j_transaction_sequence;
 		blocknr = journal->j_head;
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	J_ASSERT(blocknr != 0);
 
 	/* If the oldest pinned transaction is at the tail of the log
@@ -459,7 +454,7 @@ int cleanup_journal_tail(journal_t *jour
  * Find all the written-back checkpoint buffers in the journal and release them.
  *
  * Called with the journal locked.
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  * Returns number of bufers reaped (for debug)
  */
 
@@ -519,7 +514,7 @@ out:
  * checkpoint list.  
  *
  * This function is called with the journal locked.
- * This function is called with j_list_lock held.
+ * This function is called with j_list_sem held.
  */
 
 void __journal_remove_checkpoint(struct journal_head *jh)
@@ -573,7 +568,7 @@ out:
  * the log.
  *
  * Called with the journal locked.
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  */
 void __journal_insert_checkpoint(struct journal_head *jh, 
 			       transaction_t *transaction)
@@ -602,12 +597,11 @@ void __journal_insert_checkpoint(struct 
  * point.
  *
  * Called with the journal locked.
- * Called with j_list_lock held.
+ * Called with j_list_sem held.
  */
 
 void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
 {
-	assert_spin_locked(&journal->j_list_lock);
 	if (transaction->t_cpnext) {
 		transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
 		transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
--- linux/fs/jbd/transaction.c.orig
+++ linux/fs/jbd/transaction.c
@@ -485,7 +485,7 @@ void journal_unlock_updates (journal_t *
  * continuing as gracefully as possible.  #
  *
  * The caller should already hold the journal lock and
- * j_list_lock spinlock: most callers will need those anyway
+ * j_list_sem mutex: most callers will need those anyway
  * in order to probe the buffer's journaling state safely.
  */
 static void jbd_unexpected_dirty_buffer(struct journal_head *jh)
@@ -694,9 +694,9 @@ repeat:
 		J_ASSERT_JH(jh, !jh->b_next_transaction);
 		jh->b_transaction = transaction;
 		JBUFFER_TRACE(jh, "file as BJ_Reserved");
-		spin_lock(&journal->j_list_lock);
+		down(&journal->j_list_sem);
 		__journal_file_buffer(jh, transaction, BJ_Reserved);
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 	}
 
 done:
@@ -796,7 +796,7 @@ int journal_get_create_access(handle_t *
 	 * reused here.
 	 */
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	J_ASSERT_JH(jh, (jh->b_transaction == transaction ||
 		jh->b_transaction == NULL ||
 		(jh->b_transaction == journal->j_committing_transaction &&
@@ -813,7 +813,7 @@ int journal_get_create_access(handle_t *
 		JBUFFER_TRACE(jh, "set next transaction");
 		jh->b_next_transaction = transaction;
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_unlock_bh_state(bh);
 
 	/*
@@ -962,7 +962,7 @@ int journal_dirty_data(handle_t *handle,
 	 * about it in this layer.
 	 */
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	if (jh->b_transaction) {
 		JBUFFER_TRACE(jh, "has transaction");
 		if (jh->b_transaction != handle->h_transaction) {
@@ -1018,12 +1018,12 @@ int journal_dirty_data(handle_t *handle,
 			 */
 			if (buffer_dirty(bh)) {
 				get_bh(bh);
-				spin_unlock(&journal->j_list_lock);
+				up(&journal->j_list_sem);
 				jbd_unlock_bh_state(bh);
 				need_brelse = 1;
 				sync_dirty_buffer(bh);
 				jbd_lock_bh_state(bh);
-				spin_lock(&journal->j_list_lock);
+				down(&journal->j_list_sem);
 				/* The buffer may become locked again at any
 				   time if it is redirtied */
 			}
@@ -1055,7 +1055,7 @@ int journal_dirty_data(handle_t *handle,
 		__journal_file_buffer(jh, handle->h_transaction, BJ_SyncData);
 	}
 no_journal:
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_unlock_bh_state(bh);
 	if (need_brelse) {
 		BUFFER_TRACE(bh, "brelse");
@@ -1145,9 +1145,9 @@ int journal_dirty_metadata(handle_t *han
 	J_ASSERT_JH(jh, jh->b_frozen_data == 0);
 
 	JBUFFER_TRACE(jh, "file as BJ_Metadata");
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	__journal_file_buffer(jh, handle->h_transaction, BJ_Metadata);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 out_unlock_bh:
 	jbd_unlock_bh_state(bh);
 out:
@@ -1194,7 +1194,7 @@ int journal_forget (handle_t *handle, st
 	BUFFER_TRACE(bh, "entry");
 
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 
 	if (!buffer_jbd(bh))
 		goto not_jbd;
@@ -1246,7 +1246,7 @@ int journal_forget (handle_t *handle, st
 			journal_remove_journal_head(bh);
 			__brelse(bh);
 			if (!buffer_jbd(bh)) {
-				spin_unlock(&journal->j_list_lock);
+				up(&journal->j_list_sem);
 				jbd_unlock_bh_state(bh);
 				__bforget(bh);
 				goto drop;
@@ -1269,7 +1269,7 @@ int journal_forget (handle_t *handle, st
 	}
 
 not_jbd:
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_unlock_bh_state(bh);
 	__brelse(bh);
 drop:
@@ -1416,7 +1416,7 @@ int journal_force_commit(journal_t *jour
  * Append a buffer to a transaction list, given the transaction's list head
  * pointer.
  *
- * j_list_lock is held.
+ * j_list_sem is held.
  *
  * jbd_lock_bh_state(jh2bh(jh)) is held.
  */
@@ -1440,7 +1440,7 @@ __blist_add_buffer(struct journal_head *
  * Remove a buffer from a transaction list, given the transaction's list
  * head pointer.
  *
- * Called with j_list_lock held, and the journal may not be locked.
+ * Called with j_list_sem held, and the journal may not be locked.
  *
  * jbd_lock_bh_state(jh2bh(jh)) is held.
  */
@@ -1466,7 +1466,7 @@ __blist_del_buffer(struct journal_head *
  * is holding onto a copy of one of thee pointers, it could go bad.
  * Generally the caller needs to re-read the pointer from the transaction_t.
  *
- * Called under j_list_lock.  The journal may not be locked.
+ * Called under j_list_sem.  The journal may not be locked.
  */
 void __journal_unfile_buffer(struct journal_head *jh)
 {
@@ -1476,8 +1476,6 @@ void __journal_unfile_buffer(struct jour
 
 	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
 	transaction = jh->b_transaction;
-	if (transaction)
-		assert_spin_locked(&transaction->t_journal->j_list_lock);
 
 	J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
 	if (jh->b_jlist != BJ_None)
@@ -1525,9 +1523,9 @@ out:
 void journal_unfile_buffer(journal_t *journal, struct journal_head *jh)
 {
 	jbd_lock_bh_state(jh2bh(jh));
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	__journal_unfile_buffer(jh);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_unlock_bh_state(jh2bh(jh));
 }
 
@@ -1549,7 +1547,7 @@ __journal_try_to_free_buffer(journal_t *
 	if (jh->b_next_transaction != 0)
 		goto out;
 
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) {
 		if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) {
 			/* A written-back ordered data buffer */
@@ -1567,7 +1565,7 @@ __journal_try_to_free_buffer(journal_t *
 			__brelse(bh);
 		}
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 out:
 	return;
 }
@@ -1650,7 +1648,7 @@ busy:
  * release it.
  * Returns non-zero if JBD no longer has an interest in the buffer.
  *
- * Called under j_list_lock.
+ * Called under j_list_sem.
  *
  * Called under jbd_lock_bh_state(bh).
  */
@@ -1731,7 +1729,7 @@ static int journal_unmap_buffer(journal_
 	BUFFER_TRACE(bh, "entry");
 
 	/*
-	 * It is safe to proceed here without the j_list_lock because the
+	 * It is safe to proceed here without the j_list_sem because the
 	 * buffers cannot be stolen by try_to_free_buffers as long as we are
 	 * holding the page lock. --sct
 	 */
@@ -1741,7 +1739,7 @@ static int journal_unmap_buffer(journal_
 
 	down(&journal->j_state_sem);
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 
 	jh = journal_grab_journal_head(bh);
 	if (!jh)
@@ -1774,7 +1772,7 @@ static int journal_unmap_buffer(journal_
 			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
 			ret = __dispose_buffer(jh,
 					journal->j_running_transaction);
-			spin_unlock(&journal->j_list_lock);
+			up(&journal->j_list_sem);
 			jbd_unlock_bh_state(bh);
 			up(&journal->j_state_sem);
 			journal_put_journal_head(jh);
@@ -1788,7 +1786,7 @@ static int journal_unmap_buffer(journal_
 				JBUFFER_TRACE(jh, "give to committing trans");
 				ret = __dispose_buffer(jh,
 					journal->j_committing_transaction);
-				spin_unlock(&journal->j_list_lock);
+				up(&journal->j_list_sem);
 				jbd_unlock_bh_state(bh);
 				up(&journal->j_state_sem);
 				journal_put_journal_head(jh);
@@ -1812,7 +1810,7 @@ static int journal_unmap_buffer(journal_
 					journal->j_running_transaction);
 			jh->b_next_transaction = NULL;
 		}
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		jbd_unlock_bh_state(bh);
 		up(&journal->j_state_sem);
 		journal_put_journal_head(jh);
@@ -1831,7 +1829,7 @@ static int journal_unmap_buffer(journal_
 zap_buffer:
 	journal_put_journal_head(jh);
 zap_buffer_no_jh:
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	jbd_unlock_bh_state(bh);
 	up(&journal->j_state_sem);
 zap_buffer_unlocked:
@@ -1907,8 +1905,6 @@ void __journal_file_buffer(struct journa
 	struct buffer_head *bh = jh2bh(jh);
 
 	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
-	assert_spin_locked(&transaction->t_journal->j_list_lock);
-
 	J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
 	J_ASSERT_JH(jh, jh->b_transaction == transaction ||
 				jh->b_transaction == 0);
@@ -1974,9 +1970,9 @@ void journal_file_buffer(struct journal_
 				transaction_t *transaction, int jlist)
 {
 	jbd_lock_bh_state(jh2bh(jh));
-	spin_lock(&transaction->t_journal->j_list_lock);
+	down(&transaction->t_journal->j_list_sem);
 	__journal_file_buffer(jh, transaction, jlist);
-	spin_unlock(&transaction->t_journal->j_list_lock);
+	up(&transaction->t_journal->j_list_sem);
 	jbd_unlock_bh_state(jh2bh(jh));
 }
 
@@ -1986,7 +1982,7 @@ void journal_file_buffer(struct journal_
  * already started to be used by a subsequent transaction, refile the
  * buffer on that transaction's metadata list.
  *
- * Called under journal->j_list_lock
+ * Called under journal->j_list_sem
  *
  * Called under jbd_lock_bh_state(jh2bh(jh))
  */
@@ -1996,8 +1992,6 @@ void __journal_refile_buffer(struct jour
 	struct buffer_head *bh = jh2bh(jh);
 
 	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
-	if (jh->b_transaction)
-		assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock);
 
 	/* If the buffer is now unused, just drop it. */
 	if (jh->b_next_transaction == NULL) {
@@ -2040,12 +2034,12 @@ void journal_refile_buffer(journal_t *jo
 	struct buffer_head *bh = jh2bh(jh);
 
 	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 
 	__journal_refile_buffer(jh);
 	jbd_unlock_bh_state(bh);
 	journal_remove_journal_head(bh);
 
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	__brelse(bh);
 }
--- linux/fs/jbd/commit.c.orig
+++ linux/fs/jbd/commit.c
@@ -79,14 +79,14 @@ nope:
 }
 
 /*
- * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
+ * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_sem is
  * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * return 0.  j_list_sem is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
 	if (!jbd_trylock_bh_state(bh)) {
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		schedule();
 		return 0;
 	}
@@ -189,9 +189,9 @@ void journal_commit_transaction(journal_
 	 */
 
 #ifdef COMMIT_STATS
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	summarise_journal_usage(journal);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 #endif
 
 	/* Do we need to erase the effects of a prior journal_flush? */
@@ -275,9 +275,9 @@ void journal_commit_transaction(journal_
 	 * checkpoint lists.  We do this *before* commit because it potentially
 	 * frees some memory
 	 */
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	__journal_clean_checkpoint_list(journal);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 
 	jbd_debug (3, "JBD: commit phase 1\n");
 
@@ -299,7 +299,7 @@ void journal_commit_transaction(journal_
 	 * First, drop modified flag: all accesses to the buffers
 	 * will be tracked for a new trasaction only -bzzz
 	 */
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	if (commit_transaction->t_buffers) {
 		new_jh = jh = commit_transaction->t_buffers->b_tnext;
 		do {
@@ -309,7 +309,7 @@ void journal_commit_transaction(journal_
 			new_jh = new_jh->b_tnext;
 		} while (new_jh != jh);
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 
 	/*
 	 * Now start flushing things to disk, in the order they appear
@@ -329,7 +329,7 @@ void journal_commit_transaction(journal_
 	 */
 write_out_data:
 	cond_resched();
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 
 	while (commit_transaction->t_sync_datalist) {
 		struct buffer_head *bh;
@@ -345,10 +345,6 @@ write_out_data:
 			__journal_file_buffer(jh, commit_transaction,
 						BJ_Locked);
 			jbd_unlock_bh_state(bh);
-			if (lock_need_resched(&journal->j_list_lock)) {
-				spin_unlock(&journal->j_list_lock);
-				goto write_out_data;
-			}
 		} else {
 			if (buffer_dirty(bh)) {
 				BUFFER_TRACE(bh, "start journal writeout");
@@ -357,7 +353,7 @@ write_out_data:
 				if (bufs == journal->j_wbufsize) {
 					jbd_debug(2, "submit %d writes\n",
 							bufs);
-					spin_unlock(&journal->j_list_lock);
+					up(&journal->j_list_sem);
 					ll_rw_block(WRITE, bufs, wbuf);
 					journal_brelse_array(wbuf, bufs);
 					bufs = 0;
@@ -371,19 +367,15 @@ write_out_data:
 				jbd_unlock_bh_state(bh);
 				journal_remove_journal_head(bh);
 				put_bh(bh);
-				if (lock_need_resched(&journal->j_list_lock)) {
-					spin_unlock(&journal->j_list_lock);
-					goto write_out_data;
-				}
 			}
 		}
 	}
 
 	if (bufs) {
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		ll_rw_block(WRITE, bufs, wbuf);
 		journal_brelse_array(wbuf, bufs);
-		spin_lock(&journal->j_list_lock);
+		down(&journal->j_list_sem);
 	}
 
 	/*
@@ -396,15 +388,15 @@ write_out_data:
 		bh = jh2bh(jh);
 		get_bh(bh);
 		if (buffer_locked(bh)) {
-			spin_unlock(&journal->j_list_lock);
+			up(&journal->j_list_sem);
 			wait_on_buffer(bh);
 			if (unlikely(!buffer_uptodate(bh)))
 				err = -EIO;
-			spin_lock(&journal->j_list_lock);
+			down(&journal->j_list_sem);
 		}
 		if (!inverted_lock(journal, bh)) {
 			put_bh(bh);
-			spin_lock(&journal->j_list_lock);
+			down(&journal->j_list_sem);
 			continue;
 		}
 		if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
@@ -416,9 +408,8 @@ write_out_data:
 			jbd_unlock_bh_state(bh);
 		}
 		put_bh(bh);
-		cond_resched_lock(&journal->j_list_lock);
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 
 	if (err)
 		__journal_abort_hard(journal);
@@ -614,7 +605,7 @@ start_journal_io:
 	jbd_debug(3, "JBD: commit phase 4\n");
 
 	/*
-	 * akpm: these are BJ_IO, and j_list_lock is not needed.
+	 * akpm: these are BJ_IO, and j_list_sem is not needed.
 	 * See __journal_try_to_free_buffer.
 	 */
 wait_for_iobuf:
@@ -752,7 +743,7 @@ restart_loop:
 			jh->b_frozen_data = NULL;
 		}
 
-		spin_lock(&journal->j_list_lock);
+		down(&journal->j_list_sem);
 		cp_transaction = jh->b_cp_transaction;
 		if (cp_transaction) {
 			JBUFFER_TRACE(jh, "remove from old cp transaction");
@@ -792,7 +783,7 @@ restart_loop:
 			journal_remove_journal_head(bh);  /* needs a brelse */
 			release_buffer_page(bh);
 		}
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		if (cond_resched())
 			goto restart_loop;
 	}
@@ -804,13 +795,13 @@ restart_loop:
 	J_ASSERT(commit_transaction->t_state == T_COMMIT);
 
 	/*
-	 * This is a bit sleazy.  We borrow j_list_lock to protect
+	 * This is a bit sleazy.  We borrow j_list_sem to protect
 	 * journal->j_committing_transaction in __journal_remove_checkpoint.
 	 * Really, __jornal_remove_checkpoint should be using j_state_sem but
 	 * it's a bit hassle to hold that across __journal_remove_checkpoint
 	 */
 	down(&journal->j_state_sem);
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	commit_transaction->t_state = T_FINISHED;
 	J_ASSERT(commit_transaction == journal->j_committing_transaction);
 	journal->j_commit_sequence = commit_transaction->t_tid;
@@ -835,7 +826,7 @@ restart_loop:
 				commit_transaction;
 		}
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 
 	jbd_debug(1, "JBD: commit %d complete, head %d\n",
 		  journal->j_commit_sequence, journal->j_tail_sequence);
--- linux/fs/jbd/journal.c.orig
+++ linux/fs/jbd/journal.c
@@ -672,7 +672,7 @@ static journal_t * journal_init_common (
 	init_MUTEX(&journal->j_barrier);
 	init_MUTEX(&journal->j_checkpoint_sem);
 	spin_lock_init(&journal->j_revoke_lock);
-	spin_lock_init(&journal->j_list_lock);
+	init_MUTEX(&journal->j_list_sem);
 	init_MUTEX(&journal->j_state_sem);
 
 	journal->j_commit_interval = (HZ * JBD_DEFAULT_MAX_COMMIT_AGE);
@@ -1139,17 +1139,17 @@ void journal_destroy(journal_t *journal)
 	/* Force any old transactions to disk */
 
 	/* Totally anal locking here... */
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	while (journal->j_checkpoint_transactions != NULL) {
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		log_do_checkpoint(journal);
-		spin_lock(&journal->j_list_lock);
+		down(&journal->j_list_sem);
 	}
 
 	J_ASSERT(journal->j_running_transaction == NULL);
 	J_ASSERT(journal->j_committing_transaction == NULL);
 	J_ASSERT(journal->j_checkpoint_transactions == NULL);
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 
 	/* We can now mark the journal as empty. */
 	journal->j_tail = 0;
@@ -1361,13 +1361,13 @@ int journal_flush(journal_t *journal)
 	}
 
 	/* ...and flush everything in the log out to disk. */
-	spin_lock(&journal->j_list_lock);
+	down(&journal->j_list_sem);
 	while (!err && journal->j_checkpoint_transactions != NULL) {
-		spin_unlock(&journal->j_list_lock);
+		up(&journal->j_list_sem);
 		err = log_do_checkpoint(journal);
-		spin_lock(&journal->j_list_lock);
+		down(&journal->j_list_sem);
 	}
-	spin_unlock(&journal->j_list_lock);
+	up(&journal->j_list_sem);
 	cleanup_journal_tail(journal);
 
 	/* Finally, mark the journal as really needing no recovery.
--- linux/include/linux/jbd.h.orig
+++ linux/include/linux/jbd.h
@@ -413,20 +413,20 @@ struct handle_s 
 /*
  * Lock ranking:
  *
- *    j_list_lock
+ *    j_list_sem
  *      ->jbd_lock_bh_journal_head()	(This is "innermost")
  *
  *    j_state_sem
  *    ->jbd_lock_bh_state()
  *
  *    jbd_lock_bh_state()
- *    ->j_list_lock
+ *    ->j_list_sem
  *
  *    j_state_sem
  *    ->t_handle_lock
  *
  *    j_state_sem
- *    ->j_list_lock			(journal_unmap_buffer)
+ *    ->j_list_sem			(journal_unmap_buffer)
  *
  */
 
@@ -458,62 +458,62 @@ struct transaction_s 
 	 */
 	unsigned long		t_log_start;
 
-	/* Number of buffers on the t_buffers list [j_list_lock] */
+	/* Number of buffers on the t_buffers list [j_list_sem] */
 	int			t_nr_buffers;
 
 	/*
 	 * Doubly-linked circular list of all buffers reserved but not yet
-	 * modified by this transaction [j_list_lock]
+	 * modified by this transaction [j_list_sem]
 	 */
 	struct journal_head	*t_reserved_list;
 
 	/*
 	 * Doubly-linked circular list of all buffers under writeout during
-	 * commit [j_list_lock]
+	 * commit [j_list_sem]
 	 */
 	struct journal_head	*t_locked_list;
 
 	/*
 	 * Doubly-linked circular list of all metadata buffers owned by this
-	 * transaction [j_list_lock]
+	 * transaction [j_list_sem]
 	 */
 	struct journal_head	*t_buffers;
 
 	/*
 	 * Doubly-linked circular list of all data buffers still to be
-	 * flushed before this transaction can be committed [j_list_lock]
+	 * flushed before this transaction can be committed [j_list_sem]
 	 */
 	struct journal_head	*t_sync_datalist;
 
 	/*
 	 * Doubly-linked circular list of all forget buffers (superseded
 	 * buffers which we can un-checkpoint once this transaction commits)
-	 * [j_list_lock]
+	 * [j_list_sem]
 	 */
 	struct journal_head	*t_forget;
 
 	/*
 	 * Doubly-linked circular list of all buffers still to be flushed before
-	 * this transaction can be checkpointed. [j_list_lock]
+	 * this transaction can be checkpointed. [j_list_sem]
 	 */
 	struct journal_head	*t_checkpoint_list;
 
 	/*
 	 * Doubly-linked circular list of temporary buffers currently undergoing
-	 * IO in the log [j_list_lock]
+	 * IO in the log [j_list_sem]
 	 */
 	struct journal_head	*t_iobuf_list;
 
 	/*
 	 * Doubly-linked circular list of metadata buffers being shadowed by log
 	 * IO.  The IO buffers on the iobuf list and the shadow buffers on this
-	 * list match each other one for one at all times. [j_list_lock]
+	 * list match each other one for one at all times. [j_list_sem]
 	 */
 	struct journal_head	*t_shadow_list;
 
 	/*
 	 * Doubly-linked circular list of control buffers being written to the
-	 * log. [j_list_lock]
+	 * log. [j_list_sem]
 	 */
 	struct journal_head	*t_log_list;
 
@@ -536,7 +536,7 @@ struct transaction_s 
 
 	/*
 	 * Forward and backward links for the circular list of all transactions
-	 * awaiting checkpoint. [j_list_lock]
+	 * awaiting checkpoint. [j_list_sem]
 	 */
 	transaction_t		*t_cpnext, *t_cpprev;
 
@@ -590,7 +590,7 @@ struct transaction_s 
  * @j_fs_dev: Device which holds the client fs.  For internal journal this will
  *     be equal to j_dev
  * @j_maxlen: Total maximum capacity of the journal region on disk.
- * @j_list_lock: Protects the buffer lists and internal buffer state.
+ * @j_list_sem: Protects the buffer lists and internal buffer state.
  * @j_inode: Optional inode where we store the journal.  If present, all journal
  *     block numbers are mapped into this inode via bmap().
  * @j_tail_sequence:  Sequence number of the oldest transaction in the log 
@@ -658,7 +658,7 @@ struct journal_s
 
 	/*
 	 * ... and a linked circular list of all transactions waiting for
-	 * checkpointing. [j_list_lock]
+	 * checkpointing. [j_list_sem]
 	 */
 	transaction_t		*j_checkpoint_transactions;
 
@@ -731,7 +731,7 @@ struct journal_s
 	/*
 	 * Protects the buffer lists and internal buffer state.
 	 */
-	spinlock_t		j_list_lock;
+	struct semaphore	j_list_sem;
 
 	/* Optional inode where we store the journal.  If present, all */
 	/* journal block numbers are mapped into this inode via */
--- linux/include/linux/journal-head.h.orig
+++ linux/include/linux/journal-head.h
@@ -56,7 +56,7 @@ struct journal_head {
 	 * metadata: either the running transaction or the committing
 	 * transaction (if there is one).  Only applies to buffers on a
 	 * transaction's data or metadata journaling list.
-	 * [j_list_lock] [jbd_lock_bh_state()]
+	 * [j_list_sem] [jbd_lock_bh_state()]
 	 */
 	transaction_t *b_transaction;
 
@@ -77,14 +77,14 @@ struct journal_head {
 	/*
 	 * Pointer to the compound transaction against which this buffer
 	 * is checkpointed.  Only dirty buffers can be checkpointed.
-	 * [j_list_lock]
+	 * [j_list_sem]
 	 */
 	transaction_t *b_cp_transaction;
 
 	/*
 	 * Doubly-linked list of buffers still remaining to be flushed
 	 * before an old transaction can be checkpointed.
-	 * [j_list_lock]
+	 * [j_list_sem]
 	 */
 	struct journal_head *b_cpnext, *b_cpprev;
 };

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [patch 3/3] remove bitlocks
  2005-03-16  9:53                                                     ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar
@ 2005-03-16  9:57                                                       ` Ingo Molnar
  0 siblings, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16  9:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


this patch is a port of Steven Rostedt's bitlock-removal patch to
BK-curr. It changes the ext3 code to use wait_on_bit_lock() on
&jbd_lock_bh_sleep, instead of the bitlock primitives.

Builds/boots/works fine on x86.

From: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/fs/jbd/journal.c.orig
+++ linux/fs/jbd/journal.c
@@ -82,6 +82,17 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+/*
+ * Used in the locking of the bh_state and bh_journalhead bit locks.
+ */
+int jbd_lock_bh_sleep(void *notused)
+{
+	schedule();
+	return 0;
+}
+#endif
+
 /*
  * Helper function used to manage commit timeouts
  */
--- linux/include/linux/jbd.h.orig
+++ linux/include/linux/jbd.h
@@ -65,7 +65,6 @@ extern int journal_enable_debug;
 		}							\
 	} while (0)
 #else
-#define jbd_debug(f, a...)	/**/
 #endif
 
 extern void * __jbd_kmalloc (const char *where, size_t size, int flags, int retry);
@@ -324,34 +323,63 @@ static inline struct journal_head *bh2jh
 	return bh->b_private;
 }
 
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+int jbd_lock_bh_sleep(void *notused);
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }
 
 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	if (test_and_set_bit(BH_State, &bh->b_state))
+		return 0;
+#endif
+	__acquire(bitlock);
+	return 1;
 }
 
 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	return test_bit(BH_State, &bh->b_state);
+#else
+	return 1;
+#endif
 }
 
 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_State, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_State);
+#endif
+	__release(bitlock);
 }
 
 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	wait_on_bit_lock(&bh->b_state,BH_JournalHead,&jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE);
+#endif
+	__acquire(bitlock);
 }
 
 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+	clear_bit(BH_JournalHead, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_JournalHead);
+#endif
+	__release(bitlock);
 }
 
 struct jbd_revoke_table_s;
--- linux/include/linux/spinlock.h.orig
+++ linux/include/linux/spinlock.h
@@ -522,78 +522,6 @@ extern int _atomic_dec_and_lock(atomic_t
 
 #define atomic_dec_and_lock(atomic,lock) __cond_lock(_atomic_dec_and_lock(atomic,lock))
 
-/*
- *  bit-based spin_lock()
- *
- * Don't use this unless you really need to: spin_lock() and spin_unlock()
- * are significantly faster.
- */
-static inline void bit_spin_lock(int bitnum, unsigned long *addr)
-{
-	/*
-	 * Assuming the lock is uncontended, this never enters
-	 * the body of the outer loop. If it is contended, then
-	 * within the inner loop a non-atomic test is used to
-	 * busywait with less bus contention for a good time to
-	 * attempt to acquire the lock bit.
-	 */
-	preempt_disable();
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-	while (test_and_set_bit(bitnum, addr)) {
-		while (test_bit(bitnum, addr)) {
-			preempt_enable();
-			cpu_relax();
-			preempt_disable();
-		}
-	}
-#endif
-	__acquire(bitlock);
-}
-
-/*
- * Return true if it was acquired
- */
-static inline int bit_spin_trylock(int bitnum, unsigned long *addr)
-{
-	preempt_disable();	
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-	if (test_and_set_bit(bitnum, addr)) {
-		preempt_enable();
-		return 0;
-	}
-#endif
-	__acquire(bitlock);
-	return 1;
-}
-
-/*
- *  bit-based spin_unlock()
- */
-static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-	BUG_ON(!test_bit(bitnum, addr));
-	smp_mb__before_clear_bit();
-	clear_bit(bitnum, addr);
-#endif
-	preempt_enable();
-	__release(bitlock);
-}
-
-/*
- * Return true if the lock is held.
- */
-static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-	return test_bit(bitnum, addr);
-#elif defined CONFIG_PREEMPT
-	return preempt_count();
-#else
-	return 1;
-#endif
-}
-
 #define DEFINE_SPINLOCK(x) spinlock_t x = SPIN_LOCK_UNLOCKED
 #define DEFINE_RWLOCK(x) rwlock_t x = RW_LOCK_UNLOCKED
 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16  9:51                                                 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar
  2005-03-16  9:53                                                   ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar
@ 2005-03-16 10:04                                                   ` Andrew Morton
  2005-03-16 10:12                                                     ` Ingo Molnar
  2005-03-16 10:19                                                     ` Ingo Molnar
  1 sibling, 2 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 10:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
>  > There's a little lock ranking diagram in jbd.h which tells us that
>  > these locks nest inside j_list_lock and j_state_lock.  So I guess
>  > you'll need to turn those into semaphores.
> 
>  indeed. I did this (see the three followup patches, against BK-curr),
>  and it builds/boots/works just fine on an ext3 box. Do we want to try
>  this in -mm?

ooh, I'd rather not.  I spent an intense three days removing all the
sleeping locks from ext3 (and three months debugging the result).  Ended up
gaining 1000% on 16-way.

Putting them back in will really hurt the SMP performance.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:04                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton
@ 2005-03-16 10:12                                                     ` Ingo Molnar
  2005-03-16 10:23                                                       ` Steven Rostedt
  2005-03-16 10:26                                                       ` Andrew Morton
  2005-03-16 10:19                                                     ` Ingo Molnar
  1 sibling, 2 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16 10:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> >  > There's a little lock ranking diagram in jbd.h which tells us that
> >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> >  > you'll need to turn those into semaphores.
> > 
> >  indeed. I did this (see the three followup patches, against BK-curr),
> >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> >  this in -mm?
> 
> ooh, I'd rather not.  I spent an intense three days removing all the
> sleeping locks from ext3 (and three months debugging the result). 
> Ended up gaining 1000% on 16-way.
> 
> Putting them back in will really hurt the SMP performance.

ah. Yeah. Sniff.

if we gain 1000% on a 16-way then there's something really wrong about
semaphores (or scheduling) though. A semaphore is almost a spinlock, in
the uncontended case - and even under contention we really (should) just
spend the cycles that we'd spend spinning. There will be some
intermediate contention level where semaphores hurt, but 1000% sounds
truly excessive.

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:04                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton
  2005-03-16 10:12                                                     ` Ingo Molnar
@ 2005-03-16 10:19                                                     ` Ingo Molnar
  2005-03-16 10:40                                                       ` Andrew Morton
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16 10:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> >  > There's a little lock ranking diagram in jbd.h which tells us that
> >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> >  > you'll need to turn those into semaphores.
> > 
> >  indeed. I did this (see the three followup patches, against BK-curr),
> >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> >  this in -mm?
> 
> ooh, I'd rather not.  I spent an intense three days removing all the
> sleeping locks from ext3 (and three months debugging the result). 
> Ended up gaining 1000% on 16-way.
> 
> Putting them back in will really hurt the SMP performance.

seems like turning the bitlocks into spinlocks is the best option then. 
We'd need one lock in buffer_head (j_state_lock, renamed to something
more sensible like b_private_lock), and one lock in journal_head
(j_list_lock) i guess. How much would the +4/+8 bytes size increase in
buffer_head [on SMP] be frowned upon? 

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:12                                                     ` Ingo Molnar
@ 2005-03-16 10:23                                                       ` Steven Rostedt
  2005-03-16 10:26                                                         ` Ingo Molnar
  2005-03-16 10:26                                                       ` Andrew Morton
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16 10:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Ingo Molnar wrote:

>
> * Andrew Morton <akpm@osdl.org> wrote:
> >
> > ooh, I'd rather not.  I spent an intense three days removing all the
> > sleeping locks from ext3 (and three months debugging the result).
> > Ended up gaining 1000% on 16-way.
> >
> > Putting them back in will really hurt the SMP performance.
>
> ah. Yeah. Sniff.
>
> if we gain 1000% on a 16-way then there's something really wrong about
> semaphores (or scheduling) though. A semaphore is almost a spinlock, in
> the uncontended case - and even under contention we really (should) just
> spend the cycles that we'd spend spinning. There will be some
> intermediate contention level where semaphores hurt, but 1000% sounds
> truly excessive.
>

Could it possibly be that in the process of removing all the sleeping
locks from ext3, that Andrew also removed a flaw in ext3 itself that is
responsible for the 1000% improvement?

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:23                                                       ` Steven Rostedt
@ 2005-03-16 10:26                                                         ` Ingo Molnar
  0 siblings, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16 10:26 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, rlrevell, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > > ooh, I'd rather not.  I spent an intense three days removing all the
> > > sleeping locks from ext3 (and three months debugging the result).
> > > Ended up gaining 1000% on 16-way.
> > >
> > > Putting them back in will really hurt the SMP performance.
> >
> > ah. Yeah. Sniff.
> >
> > if we gain 1000% on a 16-way then there's something really wrong about
> > semaphores (or scheduling) though. A semaphore is almost a spinlock, in
> > the uncontended case - and even under contention we really (should) just
> > spend the cycles that we'd spend spinning. There will be some
> > intermediate contention level where semaphores hurt, but 1000% sounds
> > truly excessive.
> >
> 
> Could it possibly be that in the process of removing all the sleeping
> locks from ext3, that Andrew also removed a flaw in ext3 itself that
> is responsible for the 1000% improvement?

i think the chances for that are really remote. I think it must have
been a workload ending up scheduling itself to death, while spinlocks
force atomicity of execution and affinity.

we should be able to see the same scenario with PREEMPT_RT on a 16-way
:-)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:12                                                     ` Ingo Molnar
  2005-03-16 10:23                                                       ` Steven Rostedt
@ 2005-03-16 10:26                                                       ` Andrew Morton
  2005-03-16 10:29                                                         ` Ingo Molnar
  2005-03-16 10:34                                                         ` Arjan van de Ven
  1 sibling, 2 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 10:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> > ooh, I'd rather not.  I spent an intense three days removing all the
>  > sleeping locks from ext3 (and three months debugging the result). 
>  > Ended up gaining 1000% on 16-way.
>  > 
>  > Putting them back in will really hurt the SMP performance.
> 
>  ah. Yeah. Sniff.
> 
>  if we gain 1000% on a 16-way then there's something really wrong about
>  semaphores (or scheduling) though. A semaphore is almost a spinlock, in
>  the uncontended case - and even under contention we really (should) just
>  spend the cycles that we'd spend spinning. There will be some
>  intermediate contention level where semaphores hurt, but 1000% sounds
>  truly excessive.

I forget how much of the 1000% came from that, but it was quite a lot.

Removing the BKL was the first step.  That took the context switch rate
under high load from ~10,000/sec up to ~300,000/sec.  Because the first
thing a CPU hit on entry to the fs was then a semaphore.  Performance rather
took a dive.

Of course the locks also became much finer-grained, so the contention
opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
scope, so I'd expect the context switch rate to go up quite a lot again.

The hold times are short, and a context switch hurts rather ore than a quick
spin.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:26                                                       ` Andrew Morton
@ 2005-03-16 10:29                                                         ` Ingo Molnar
  2005-03-16 10:41                                                           ` Andrew Morton
  2005-03-16 10:34                                                         ` Arjan van de Ven
  1 sibling, 1 reply; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16 10:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> I forget how much of the 1000% came from that, but it was quite a lot.
> 
> Removing the BKL was the first step.  That took the context switch
> rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
> first thing a CPU hit on entry to the fs was then a semaphore. 
> Performance rather took a dive.
> 
> Of course the locks also became much finer-grained, so the contention
> opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
> scope, so I'd expect the context switch rate to go up quite a lot
> again.
> 
> The hold times are short, and a context switch hurts rather ore than a
> quick spin.

which particular workload was this - dbench? (I can try PREEMPT_RT on an
8-way, such effects will show up tenfold.)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:26                                                       ` Andrew Morton
  2005-03-16 10:29                                                         ` Ingo Molnar
@ 2005-03-16 10:34                                                         ` Arjan van de Ven
  1 sibling, 0 replies; 125+ messages in thread
From: Arjan van de Ven @ 2005-03-16 10:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, rostedt, rlrevell, linux-kernel

On Wed, 2005-03-16 at 02:26 -0800, Andrew Morton wrote:
> 
> The hold times are short, and a context switch hurts rather ore than a
> quick
> spin.

so we need a spinaphore ;)



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:19                                                     ` Ingo Molnar
@ 2005-03-16 10:40                                                       ` Andrew Morton
  2005-03-16 10:51                                                         ` Ingo Molnar
  2005-03-16 11:05                                                         ` Steven Rostedt
  0 siblings, 2 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 10:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > >  > There's a little lock ranking diagram in jbd.h which tells us that
> > >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> > >  > you'll need to turn those into semaphores.
> > > 
> > >  indeed. I did this (see the three followup patches, against BK-curr),
> > >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> > >  this in -mm?
> > 
> > ooh, I'd rather not.  I spent an intense three days removing all the
> > sleeping locks from ext3 (and three months debugging the result). 
> > Ended up gaining 1000% on 16-way.
> > 
> > Putting them back in will really hurt the SMP performance.
> 
> seems like turning the bitlocks into spinlocks is the best option then. 
> We'd need one lock in buffer_head (j_state_lock, renamed to something
> more sensible like b_private_lock), and one lock in journal_head
> (j_list_lock) i guess.

Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
and jbd_lock_bh_journal_head().  I think they both need to be in the
buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
caller's jbd_lock_bh_state()).

Or make them global, or put them in the journal.

> How much would the +4/+8 bytes size increase in
> buffer_head [on SMP] be frowned upon? 

It wouldn't be the end of the world.  I'm not clear on what bits of the
rt-super-low-latency stuff is intended for mainline though?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:29                                                         ` Ingo Molnar
@ 2005-03-16 10:41                                                           ` Andrew Morton
  0 siblings, 0 replies; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 10:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, rlrevell, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > I forget how much of the 1000% came from that, but it was quite a lot.
> > 
> > Removing the BKL was the first step.  That took the context switch
> > rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
> > first thing a CPU hit on entry to the fs was then a semaphore. 
> > Performance rather took a dive.
> > 
> > Of course the locks also became much finer-grained, so the contention
> > opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
> > scope, so I'd expect the context switch rate to go up quite a lot
> > again.
> > 
> > The hold times are short, and a context switch hurts rather ore than a
> > quick spin.
> 
> which particular workload was this - dbench? (I can try PREEMPT_RT on an
> 8-way, such effects will show up tenfold.)
> 

Oh gee, that was back in the days when Martin was being useful.  SDET, I
think.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:40                                                       ` Andrew Morton
@ 2005-03-16 10:51                                                         ` Ingo Molnar
  2005-03-16 11:05                                                         ` Steven Rostedt
  1 sibling, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-16 10:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, rlrevell, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> > How much would the +4/+8 bytes size increase in
> > buffer_head [on SMP] be frowned upon? 
> 
> It wouldn't be the end of the world.  I'm not clear on what bits of
> the rt-super-low-latency stuff is intended for mainline though?

in the long run, most of it. There are no conceptual barriers so far,
the -RT tree consists of lots of small details and the PREEMPT_RT
framework itself. We are trying to solve (and merge) the small details
first (in upstream), so that PREEMPT_RT itself becomes uncontroversial.

(and it's not really the low latency that matters mainly - more valuable
is the fact that under PREEMPT_RT high latencies are statistically much
more unlikely [you need to do some really intentional and easy to see
things to introduce high latencies], while in the current upstream
kernel, high latencies are often side-effects of pretty normal kernel
coding activities, so low latencies are always a catch-up game that can
never be truly won for sure. So yes, while a 10 usec worst-case latency
under arbitrary Linux workloads [on the right hardware] is indeed sexy,
more important is that things are much more deterministic and hence much
more trustable from a hard-RT POV.)

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 10:40                                                       ` Andrew Morton
  2005-03-16 10:51                                                         ` Ingo Molnar
@ 2005-03-16 11:05                                                         ` Steven Rostedt
  2005-03-16 11:19                                                           ` Andrew Morton
  1 sibling, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16 11:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Andrew Morton wrote:

>
> Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
> and jbd_lock_bh_journal_head().  I think they both need to be in the
> buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
> caller's jbd_lock_bh_state()).
>
> Or make them global, or put them in the journal.

The jbd_lock_bh_journal_head can be one global lock without a problem. But
when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
me.  So this one has to go into the buffer head.  What do you mean with
"put them in the journal", do you mean the journal_s structure? Is there a
safe way to get to that structure from the buffer head?  The state lock is
used quite a bit and it gets tricky trying to figure out how to use other
structures wrt buffer_heads at all the locations that use
jbd_lock_bh_state.

-- Steve

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 11:05                                                         ` Steven Rostedt
@ 2005-03-16 11:19                                                           ` Andrew Morton
  2005-03-16 14:04                                                             ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 11:19 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, rlrevell, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> 
> 
> On Wed, 16 Mar 2005, Andrew Morton wrote:
> 
> >
> > Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
> > and jbd_lock_bh_journal_head().  I think they both need to be in the
> > buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
> > caller's jbd_lock_bh_state()).
> >
> > Or make them global, or put them in the journal.
> 
> The jbd_lock_bh_journal_head can be one global lock without a problem.

As I say, we can probably eliminate it.

> But
> when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
> me.

That's a worry.

>  So this one has to go into the buffer head.  What do you mean with
> "put them in the journal", do you mean the journal_s structure?

Yes.

> Is there a
> safe way to get to that structure from the buffer head?

No convenient way, iirc.  But there's usually a fairly straightforward way
to get at the journal from within JBD code.

>  The state lock is
> used quite a bit and it gets tricky trying to figure out how to use other
> structures wrt buffer_heads at all the locations that use
> jbd_lock_bh_state.

That one should go into the buffer_head, I guess.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 11:19                                                           ` Andrew Morton
@ 2005-03-16 14:04                                                             ` Steven Rostedt
  2005-03-16 16:47                                                               ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16 14:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Andrew Morton wrote:

>
> > But
> > when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
> > me.
>
> That's a worry.
>

OK, I'm wrong here. I just tried it again and it didn't deadlock (that
must have been another lock I was dealing with).  But it does test if the
buffer head is locked or not, and asserts if it is. I'm running the
following patch with on problems so far. I still use the lock bits to
determine if the bh state is locked.

Do you and Ingo think that this would have too much contention.

Ingo, I still get the following bug because of the added BUFFER_FNS and
DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
a deadlock there.


BUG: Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
c0214888
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
CPU:    0
EIP:    0060:[<c0214888>]    Not tainted VLI
EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
EIP is at vt_ioctl+0x18/0x1ab0
eax: 00000000   ebx: 00005603   ecx: 00005603   edx: cec18c80
esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
ds: 007b   es: 007b   ss: 0068   preempt: 00000000
Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
Stack: cb403680 cb478000 cb403530 c034594c cb403530 00000246 cb479e7c
c0117217
       c0345954 00000006 00000001 00000000 00000000 cb479ebc cefa1c04
c13e1000
       ced6b9b8 00000000 00000000 cb479ed4 c01707f1 ced6b9b8 00000007
00000000
Call Trace:
 [<c0103cdf>] show_stack+0x7f/0xa0 (28)
 [<c0103e95>] show_registers+0x165/0x1d0 (56)
 [<c0104088>] die+0xc8/0x150 (64)
 [<c0115376>] do_page_fault+0x356/0x6c4 (216)
 [<c0103973>] error_code+0x2b/0x30 (268)
 [<c020fd6b>] tty_ioctl+0x34b/0x490 (52)
 [<c016837f>] do_ioctl+0x4f/0x70 (32)
 [<c0168582>] vfs_ioctl+0x62/0x1d0 (40)
 [<c0168751>] sys_ioctl+0x61/0x90 (40)
 [<c0102ec3>] syscall_call+0x7/0xb (-8124)
Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85




Here's the patch (on Ingo's -40 kernel).

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-16 07:47:50.000000000 -0500
@@ -82,6 +82,9 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
+spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-16 08:51:27.292105187 -0500
@@ -313,6 +313,8 @@
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(State,state)
+BUFFER_FNS(JournalHead,journal)

 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
@@ -324,34 +326,50 @@
 	return bh->b_private;
 }

+extern spinlock_t jbd_state_lock;
+extern spinlock_t jbd_journal_lock;
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	spin_lock(&jbd_state_lock);
+	BUG_ON(buffer_state(bh));
+	set_buffer_state(bh);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	if (spin_trylock(&jbd_state_lock)) {
+		BUG_ON(buffer_state(bh));
+		set_buffer_state(bh);
+		return 1;
+	}
+	return 0;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return buffer_state(bh); //spin_is_locked(&jbd_state_lock);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	BUG_ON(!buffer_state(bh));
+	clear_buffer_state(bh);
+	spin_unlock(&jbd_state_lock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+	spin_lock(&jbd_journal_lock);
+	BUG_ON(buffer_journal(bh));
+	set_buffer_journal(bh);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+	BUG_ON(!buffer_journal(bh));
+	clear_buffer_journal(bh);
+	spin_unlock(&jbd_journal_lock);
 }

 struct jbd_revoke_table_s;

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 14:04                                                             ` Steven Rostedt
@ 2005-03-16 16:47                                                               ` Steven Rostedt
  2005-03-16 17:47                                                                 ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16 16:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Steven Rostedt wrote:
>
> Ingo, I still get the following bug because of the added BUFFER_FNS and
> DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
> a deadlock there.
>
>

Hi Ingo,

I just ran this with PREEMPT_RT and it works fine.  Now is this the best
solution, or adding a lock to the buffer head?  This works but I don't
have anything more than a 2X CPU to test this on.  If either you or Andrew
can try this on the 8x or 16x that would be great..

Also, I only get the BUG with PREEMPT_DESKTOP.  I really don't understand
why this happens. I sent you a test patch earlier with just adding
BUFFER_FNS(JournalHead,journalhead) in jbd.h, and under PREEMPT_DESKTOP
that causes this bug as well. No other changes, just adding the BUFFER_FNS
call causes this. I can't find any other reference to buffer_journal
(besides reiser_fs).  What do you think, and are you getting the same bug?

-- Steve


> BUG: Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
>  printing eip:
> c0214888
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
> pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
> intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
> CPU:    0
> EIP:    0060:[<c0214888>]    Not tainted VLI
> EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
> EIP is at vt_ioctl+0x18/0x1ab0
> eax: 00000000   ebx: 00005603   ecx: 00005603   edx: cec18c80
> esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
> ds: 007b   es: 007b   ss: 0068   preempt: 00000000
> Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
> Stack: cb403680 cb478000 cb403530 c034594c cb403530 00000246 cb479e7c
> c0117217
>        c0345954 00000006 00000001 00000000 00000000 cb479ebc cefa1c04
> c13e1000
>        ced6b9b8 00000000 00000000 cb479ed4 c01707f1 ced6b9b8 00000007
> 00000000
> Call Trace:
>  [<c0103cdf>] show_stack+0x7f/0xa0 (28)
>  [<c0103e95>] show_registers+0x165/0x1d0 (56)
>  [<c0104088>] die+0xc8/0x150 (64)
>  [<c0115376>] do_page_fault+0x356/0x6c4 (216)
>  [<c0103973>] error_code+0x2b/0x30 (268)
>  [<c020fd6b>] tty_ioctl+0x34b/0x490 (52)
>  [<c016837f>] do_ioctl+0x4f/0x70 (32)
>  [<c0168582>] vfs_ioctl+0x62/0x1d0 (40)
>  [<c0168751>] sys_ioctl+0x61/0x90 (40)
>  [<c0102ec3>] syscall_call+0x7/0xb (-8124)
> Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
> 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
> 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85
>
>
>
>
> Here's the patch (on Ingo's -40 kernel).
>
> diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
> --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
> +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c	2005-03-16 07:47:50.000000000 -0500
> @@ -82,6 +82,9 @@
>
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
>
> +spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
> +spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
> +
>  /*
>   * Helper function used to manage commit timeouts
>   */
> diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
> --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
> +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h	2005-03-16 08:51:27.292105187 -0500
> @@ -313,6 +313,8 @@
>  BUFFER_FNS(RevokeValid, revokevalid)
>  TAS_BUFFER_FNS(RevokeValid, revokevalid)
>  BUFFER_FNS(Freed, freed)
> +BUFFER_FNS(State,state)
> +BUFFER_FNS(JournalHead,journal)
>
>  static inline struct buffer_head *jh2bh(struct journal_head *jh)
>  {
> @@ -324,34 +326,50 @@
>  	return bh->b_private;
>  }
>
> +extern spinlock_t jbd_state_lock;
> +extern spinlock_t jbd_journal_lock;
> +
>  static inline void jbd_lock_bh_state(struct buffer_head *bh)
>  {
> -	bit_spin_lock(BH_State, &bh->b_state);
> +	spin_lock(&jbd_state_lock);
> +	BUG_ON(buffer_state(bh));
> +	set_buffer_state(bh);
>  }
>
>  static inline int jbd_trylock_bh_state(struct buffer_head *bh)
>  {
> -	return bit_spin_trylock(BH_State, &bh->b_state);
> +	if (spin_trylock(&jbd_state_lock)) {
> +		BUG_ON(buffer_state(bh));
> +		set_buffer_state(bh);
> +		return 1;
> +	}
> +	return 0;
>  }
>
>  static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
>  {
> -	return bit_spin_is_locked(BH_State, &bh->b_state);
> +	return buffer_state(bh); //spin_is_locked(&jbd_state_lock);
>  }
>
>  static inline void jbd_unlock_bh_state(struct buffer_head *bh)
>  {
> -	bit_spin_unlock(BH_State, &bh->b_state);
> +	BUG_ON(!buffer_state(bh));
> +	clear_buffer_state(bh);
> +	spin_unlock(&jbd_state_lock);
>  }
>
>  static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
>  {
> -	bit_spin_lock(BH_JournalHead, &bh->b_state);
> +	spin_lock(&jbd_journal_lock);
> +	BUG_ON(buffer_journal(bh));
> +	set_buffer_journal(bh);
>  }
>
>  static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
>  {
> -	bit_spin_unlock(BH_JournalHead, &bh->b_state);
> +	BUG_ON(!buffer_journal(bh));
> +	clear_buffer_journal(bh);
> +	spin_unlock(&jbd_journal_lock);
>  }
>
>  struct jbd_revoke_table_s;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 16:47                                                               ` Steven Rostedt
@ 2005-03-16 17:47                                                                 ` Steven Rostedt
  2005-03-16 19:20                                                                   ` Lee Revell
                                                                                     ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-16 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Steven Rostedt wrote:

>
> Hi Ingo,
>
> I just ran this with PREEMPT_RT and it works fine.

Not quite, and I will assume that some of the other patches I sent have
this same problem.  The jbd_trylock_bh_state really scares me. It seems
that in fs/jbd/commit.c in journal_commit_transaction we have the
following code:


write_out_data:
	cond_resched();
	spin_lock(&journal->j_list_lock);

	while (commit_transaction->t_sync_datalist) {
		struct buffer_head *bh;

		jh = commit_transaction->t_sync_datalist;
		commit_transaction->t_sync_datalist = jh->b_tnext;
		bh = jh2bh(jh);
		if (buffer_locked(bh)) {
			BUFFER_TRACE(bh, "locked");
			if (!inverted_lock(journal, bh))
				goto write_out_data;


where invert_data simply is:


/*
 * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
is
 * held.  For ranking reasons we must trylock.  If we lose, schedule away
and
 * return 0.  j_list_lock is dropped in this case.
 */
static int inverted_lock(journal_t *journal, struct buffer_head *bh)
{
	if (!jbd_trylock_bh_state(bh)) {
		spin_unlock(&journal->j_list_lock);
		schedule();
		return 0;
	}
	return 1;
}


So, with kjournal running as a FIFO, it may hit this (as it did with my
last test) and not get the lock. All it does is release another lock
(ranking reasons) and calls schedule and tries again.  With kjournal the
highest running process on the system (UP) it deadlocks since whoever has
the lock will never get a chance to run.  There's a couple of places that
jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
that it definitely deadlocks the system.  I believe that the
code in checkpoint.c also has this problem.

I guess one way to solve this is to add a wait queue here (before
schedule()), and have the one holding the lock to wake up all on the
waitqueue when they release it.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-03-16  7:50                                               ` Steven Rostedt
@ 2005-03-16 18:21                                                 ` Lee Revell
  0 siblings, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-03-16 18:21 UTC (permalink / raw)
  To: rostedt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Wed, 2005-03-16 at 02:50 -0500, Steven Rostedt wrote:
> 
> On Tue, 15 Mar 2005, Lee Revell wrote:
> 
> > On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote:
> > > Damn! The answer was right there in front of my eyes! Here's the cleanest
> > > solution. I forgot about wait_on_bit_lock.  I've converted all the locks
> > > to use this instead.  We probably need to get priority inheritence working
> > > on this too someday, but for now it's better than wasting memory or
> > > getting into deadlocks.
> > >
> >
> > I am still not clear on why this did not hit with earlier kernels +
> > PREEMPT_DESKTOP.  Were the bitlocks introduced recently?  Or was another
> > lock-break patch dropped?
> >
> 
> When did you start seeing this? This code has been there as far back as
> 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far
> back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't
> start picking this up till later, or that you were just lucky that no
> contention was happening on that lock.

Sometime after the RT preempt patches were rebased to mainline.

I don't see how there could be contention as I am on a UP.

Lee



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 17:47                                                                 ` Steven Rostedt
@ 2005-03-16 19:20                                                                   ` Lee Revell
  2005-03-17  7:15                                                                     ` Steven Rostedt
  2005-03-16 21:15                                                                   ` Andrew Morton
  2005-03-17  9:58                                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt
  2 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-16 19:20 UTC (permalink / raw)
  To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel

On Wed, 2005-03-16 at 12:47 -0500, Steven Rostedt wrote:
> 
> On Wed, 16 Mar 2005, Steven Rostedt wrote:
> 
> >
> > Hi Ingo,
> >
> > I just ran this with PREEMPT_RT and it works fine.
> 
> Not quite, and I will assume that some of the other patches I sent have
> this same problem.  The jbd_trylock_bh_state really scares me. It seems
> that in fs/jbd/commit.c in journal_commit_transaction we have the
> following code:

I am a bit confused, big surprise.  Does this thread still have anything
to do with this trace from my "Latency regressions" bug report?

http://www.alsa-project.org/~rlrevell/2912us

The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".

PREEMPT_RT has always worked perfectly.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 17:47                                                                 ` Steven Rostedt
  2005-03-16 19:20                                                                   ` Lee Revell
@ 2005-03-16 21:15                                                                   ` Andrew Morton
  2005-03-17  9:21                                                                     ` Steven Rostedt
  2005-03-17  9:58                                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt
  2 siblings, 1 reply; 125+ messages in thread
From: Andrew Morton @ 2005-03-16 21:15 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, rlrevell, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> /*
>   * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
>  is
>   * held.  For ranking reasons we must trylock.  If we lose, schedule away
>  and
>   * return 0.  j_list_lock is dropped in this case.
>   */
>  static int inverted_lock(journal_t *journal, struct buffer_head *bh)
>  {
>  	if (!jbd_trylock_bh_state(bh)) {
>  		spin_unlock(&journal->j_list_lock);
>  		schedule();
>  		return 0;
>  	}
>  	return 1;
>  }
> 

That's very lame code, that.  The old "I don't know what the heck to do now
so I'll schedule" trick.  Sorry.

>  I guess one way to solve this is to add a wait queue here (before
>  schedule()), and have the one holding the lock to wake up all on the
>  waitqueue when they release it.

yup.  A patch against mainline would be appropriate, please.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 19:20                                                                   ` Lee Revell
@ 2005-03-17  7:15                                                                     ` Steven Rostedt
  2005-03-17 15:41                                                                       ` Lee Revell
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-17  7:15 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel



On Wed, 16 Mar 2005, Lee Revell wrote:

> I am a bit confused, big surprise.  Does this thread still have anything
> to do with this trace from my "Latency regressions" bug report?

Don't worry, I've been in a state of confusion for a long time now ;-)

>
> http://www.alsa-project.org/~rlrevell/2912us
>
> The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".
>
> PREEMPT_RT has always worked perfectly.
>

I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
affect your latency anymore.  It probably does indirectly somehow.  I
still think it has to do with the bitspinlocks.  But I'm not sure. Just
let me know if you want to be taken off this thread and I'll remove you
from my CC list.  Until then, I'll keep you on.

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 21:15                                                                   ` Andrew Morton
@ 2005-03-17  9:21                                                                     ` Steven Rostedt
  2005-03-18  9:23                                                                       ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-17  9:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Andrew Morton wrote:

> >  I guess one way to solve this is to add a wait queue here (before
> >  schedule()), and have the one holding the lock to wake up all on the
> >  waitqueue when they release it.
>
> yup.  A patch against mainline would be appropriate, please.
>

Hi Andrew,

Here's the patch against 2.6.11.  I tested it, by adding (after making the
patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
That way I have same scenerio as with Ingo's kernel, and I turned on
NEED_JOURNAL_STATE_WAIT.  I'm still running that kernel so it looks like
it works.  Making those two locks global causes this deadlock on kjournal
much quicker, and I don't need to run on an SMP machine (since my SMP
machines are currently being used for other tasks).

Some comments on my patch.  I only implement the wait queue when
bit_spin_trylock is an actual lock (thus creating the problem). I didn't
want to add this code if it was needed (ie. !(CONFIG_SMP &&
CONFIG_DEBUG_SPINLOCKS)).  So in bit_spin_trylock, I define
NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock.  When
NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
journal code.

Now the question is, should we make those two locks global? It would help
Ingo's cause (and mine as well). But I don't know the impact on a large
SMP configuration.  Andrew, since you have a 16xSMP machine, could you (if
you have time) try out the effect of that. If you do have time, then I'll
send you a patch that goes on top of this one to change the two locks into
global spin locks.

Ingo, where do you want to go from here? I guess we need to wait on what
Andrew decides.

-- Steve


diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
--- linux-2.6.11.orig/fs/jbd/commit.c	2005-03-02 02:38:25.000000000 -0500
+++ linux-2.6.11/fs/jbd/commit.c	2005-03-17 03:40:06.000000000 -0500
@@ -80,15 +80,33 @@

 /*
  * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * held.  For ranking reasons we must trylock.  If we lose put ourselves on a
+ * state wait queue and we'll be woken up when it is unlocked. Then we return
+ * 0 to try this again.  j_list_lock is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
 	if (!jbd_trylock_bh_state(bh)) {
+		/*
+		 * jbd_trylock_bh_state always returns true unless CONFIG_SMP or
+		 * CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
+		 * The bit_spin_locks in jbd_lock_bh_state need to be removed anyway.
+		 */
+#ifdef NEED_JOURNAL_STATE_WAIT
+		DECLARE_WAITQUEUE(wait, current);
 		spin_unlock(&journal->j_list_lock);
-		schedule();
+		add_wait_queue_exclusive(&journal_state_wait,&wait);
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		/* Check to see if the lock has been unlocked in this short time */
+		if (jbd_is_locked_bh_state(bh))
+			schedule();
+		set_current_state(TASK_RUNNING);
+		remove_wait_queue(&journal_state_wait,&wait);
 		return 0;
+#else
+		/* This should never be hit */
+		BUG();
+#endif
 	}
 	return 1;
 }
diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
--- linux-2.6.11.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11/fs/jbd/journal.c	2005-03-17 03:47:40.000000000 -0500
@@ -80,6 +80,11 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef NEED_JOURNAL_STATE_WAIT
+EXPORT_SYMBOL(journal_state_wait);
+DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
--- linux-2.6.11.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11/include/linux/jbd.h	2005-03-17 03:48:18.000000000 -0500
@@ -324,6 +324,20 @@
 	return bh->b_private;
 }

+#ifdef NEED_JOURNAL_STATE_WAIT
+/*
+ * The journal_state_wait is a wait queue that tasks will wait on
+ * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
+ * Instead of spinning on schedule, the task now adds itself to this wait queue
+ * and will be woken up when the jbd_lock_bh_state is released.
+ *
+ * Since the bit_spin_locks are only locks under CONFIG_SMP and
+ * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those
+ * cases.
+ */
+extern wait_queue_head_t journal_state_wait;
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);
@@ -342,6 +356,13 @@
 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_unlock(BH_State, &bh->b_state);
+#ifdef NEED_JOURNAL_STATE_WAIT
+	/*
+	 * There may be a task sleeping, and waiting to be woken up
+	 * when this is unlocked.
+	 */
+	wake_up(&journal_state_wait);
+#endif
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
diff -ur linux-2.6.11.orig/include/linux/spinlock.h linux-2.6.11/include/linux/spinlock.h
--- linux-2.6.11.orig/include/linux/spinlock.h	2005-03-02 02:38:09.000000000 -0500
+++ linux-2.6.11/include/linux/spinlock.h	2005-03-17 03:39:13.024466071 -0500
@@ -527,6 +527,9 @@
  *
  * Don't use this unless you really need to: spin_lock() and spin_unlock()
  * are significantly faster.
+ *
+ * FIXME: These are evil and need to be removed. They are currently only
+ *  used by the journal code of ext3.
  */
 static inline void bit_spin_lock(int bitnum, unsigned long *addr)
 {
@@ -557,6 +560,13 @@
 {
 	preempt_disable();
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	/*
+	 * This is only used by the journal code of ext3 and if this
+	 * is set then we need to tell the journal code that it needs
+	 * a wait queue to keep kjournald from spinning on a lock.
+	 */
+#define NEED_JOURNAL_STATE_WAIT
+
 	if (test_and_set_bit(bitnum, addr)) {
 		preempt_enable();
 		return 0;

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-16 17:47                                                                 ` Steven Rostedt
  2005-03-16 19:20                                                                   ` Lee Revell
  2005-03-16 21:15                                                                   ` Andrew Morton
@ 2005-03-17  9:58                                                                   ` Steven Rostedt
  2 siblings, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-17  9:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, rlrevell, linux-kernel



On Wed, 16 Mar 2005, Steven Rostedt wrote:
> [...]  There's a couple of places that
> jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
> that it definitely deadlocks the system.  I believe that the
> code in checkpoint.c also has this problem.
>

I've examined the code in checkpoint.c, and I now believe that it doesn't
have this problem.  When it fails a lock, it just falls out of the while
loops.

-- Steve

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-17  7:15                                                                     ` Steven Rostedt
@ 2005-03-17 15:41                                                                       ` Lee Revell
  2005-03-17 16:23                                                                         ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-17 15:41 UTC (permalink / raw)
  To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel

On Thu, 2005-03-17 at 02:15 -0500, Steven Rostedt wrote:
> 
> On Wed, 16 Mar 2005, Lee Revell wrote:
> 
> > I am a bit confused, big surprise.  Does this thread still have anything
> > to do with this trace from my "Latency regressions" bug report?
> 
> Don't worry, I've been in a state of confusion for a long time now ;-)
> 
> >
> > http://www.alsa-project.org/~rlrevell/2912us
> >
> > The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".
> >
> > PREEMPT_RT has always worked perfectly.
> >
> 
> I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
> affect your latency anymore.  It probably does indirectly somehow.  I
> still think it has to do with the bitspinlocks.  But I'm not sure. Just
> let me know if you want to be taken off this thread and I'll remove you
> from my CC list.  Until then, I'll keep you on.

Sorry, it's hard to follow this thread.  Just to make sure we're all on
the same page, what exactly is the symptom of this ext3 issue you are
working on?  Is it a performance regression, or a latency issue, or a
lockup - ?

Whatever your problem is, I am not seeing it.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-17 15:41                                                                       ` Lee Revell
@ 2005-03-17 16:23                                                                         ` Steven Rostedt
  2005-03-17 16:36                                                                           ` Lee Revell
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-17 16:23 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel



On Thu, 17 Mar 2005, Lee Revell wrote:

>
> Sorry, it's hard to follow this thread.  Just to make sure we're all on
> the same page, what exactly is the symptom of this ext3 issue you are
> working on?  Is it a performance regression, or a latency issue, or a
> lockup - ?
>
> Whatever your problem is, I am not seeing it.
>

The root is a lockup.  I think you can get this lockup whether or not it
is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
on. Then this is what you want to do on a UP Machine.

Set kjournald to FIFO (any realtime priority).  And then from a non-RT
task, just do a "make clean; make" on the kernel. It may take a few
minutes but your system will lock up.  That's because kjournal will wait
on the bit_spin_lock, but will never be preempted by the one holding the
lock, because it is FIFO and the one holding the lock (the kernel compile)
is not RT. Even if it was, and the same priority as kjournal, it would
still lock, since kjournal is FIFO and will only yield to higher
priority threads.

Now this lockup has uncovered other problems with ext3.  Mainly that it
uses bit spinlocks, which in of itself is bad.  You don't want a busy wait
unless you really need it.  A normal spinlock is such a thing in vanilla
SMP systems, since a schedule would take longer than the one holding the
lock. Ingo's RT kernel, removes most of these, and makes them into
mutexes.  This may slow down the overall performance but it shortens
latencies for RT tasks, which is what RT tries to do.

Now the latest problem is also bad, since you should never just call
schedule as a "yield" to let someone else release a lock.  Since the
ranking order of the locks prevents just grabbing the lock and then
risking a deadlock, ext3 tries to get the lock, and if it fails, it
releases the other lock it has, calls schedule, then tries again.  This is
usually bad, since it would most likely be rescheduled, so basically it is
worst than a spinlock, since it actually goes through the schedule logic
again and spins!  With Ingo's RT patch, this also becomes a deadlock
the same way as bit_spin_locks can.

Hope this helps,

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
  2005-02-19 20:45       ` Lee Revell
  2005-02-20  0:19         ` Lee Revell
@ 2005-03-17 16:33         ` Lee Revell
  1 sibling, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-03-17 16:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton

On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote:
> On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long
> > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02.
> > > 
> > > could you send me the full trace?
> > 
> > just in case the system in question is still running - could you also do 
> > a 'verbose' trace via:
> > 
> > 	echo 1 > /proc/sys/kernel/trace_verbose
> 
> OK, here is a 2912us verbose latency trace with "data=ordered", gzipped.
> dbench 32 or 64 is the easiest way to trigger these.
> 
> I have not tried "data=journal".  As previously stated "data=writeback"
> works perfectly - I ran JACK overnight while stressing the fs and did
> not get one xrun.

Any update on this?  The problem is still apparent in 2.6.11.  It seems
to be a regression from 2.6.10.  And now I've heard 2.6.12-rc1 mentioned
with no motion on this.

Here's the trace again in case you missed it:

http://www.alsa-project.org/~rlrevell/2912us

The "latency regressions" thread was all sub-millisecond stuff which can
be ignored IMHO.  Still interesting because they are regressions after
all, but not a real world problem.

However this one can be several milliseconds.  It's a real problem.

I'd hate to have to ship 2.6.12 with a disclaimer that ext3 with
"data=ordered" is not suitable for the desktop (as it clearly violates
the stated desktop responsiveness goal of 1ms).

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-17 16:23                                                                         ` Steven Rostedt
@ 2005-03-17 16:36                                                                           ` Lee Revell
  2005-03-18  6:58                                                                             ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Lee Revell @ 2005-03-17 16:36 UTC (permalink / raw)
  To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel

On Thu, 2005-03-17 at 11:23 -0500, Steven Rostedt wrote:
> 
> On Thu, 17 Mar 2005, Lee Revell wrote:
> 
> >
> > Sorry, it's hard to follow this thread.  Just to make sure we're all on
> > the same page, what exactly is the symptom of this ext3 issue you are
> > working on?  Is it a performance regression, or a latency issue, or a
> > lockup - ?
> >
> > Whatever your problem is, I am not seeing it.
> >
> 
> The root is a lockup.  I think you can get this lockup whether or not it
> is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
> on. Then this is what you want to do on a UP Machine.

OK, no need to cc: me on this one any more.  It's really low priority
IMO compared to the big latencies I am seeing with ext3 and
"data=ordered".  Unless you think there is any relation.

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-17 16:36                                                                           ` Lee Revell
@ 2005-03-18  6:58                                                                             ` Steven Rostedt
  2005-03-18 18:19                                                                               ` Lee Revell
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-18  6:58 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, mingo, linux-kernel



On Thu, 17 Mar 2005, Lee Revell wrote:
>
> OK, no need to cc: me on this one any more.  It's really low priority
> IMO compared to the big latencies I am seeing with ext3 and
> "data=ordered".  Unless you think there is any relation.
>

IMO a deadlock is higher priority than a big latency :-)

I still belive that something to do with the locking in ext3 has to do
with your latencies, but I'll take you off when I send something to Andrew
or Ingo next time. Hopefully, they'll do the same.

When this problem is solved on Ingo's side, maybe this will solve your
latency problem, so I recommend that you keep trying the latest RT
kernels.  BTW what test are you running that causes these latencies?

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)
  2005-03-17  9:21                                                                     ` Steven Rostedt
@ 2005-03-18  9:23                                                                       ` Steven Rostedt
  2005-03-18  9:32                                                                         ` Andrew Morton
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-18  9:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel


Andrew,

Since I haven't gotten a response from you, I'd figure that you may have
missed this, since the subject didn't change.  So I changed the subject to
get your attention, and I've resent this. Here's the patch to get rid of
the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
patch is appropriate.

Thanks,

-- Steve


On Thu, 17 Mar 2005, Steven Rostedt wrote:

>
>
> On Wed, 16 Mar 2005, Andrew Morton wrote:
>
> > >  I guess one way to solve this is to add a wait queue here (before
> > >  schedule()), and have the one holding the lock to wake up all on the
> > >  waitqueue when they release it.
> >
> > yup.  A patch against mainline would be appropriate, please.
> >
>
> Hi Andrew,
>
> Here's the patch against 2.6.11.  I tested it, by adding (after making the
> patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
> That way I have same scenerio as with Ingo's kernel, and I turned on
> NEED_JOURNAL_STATE_WAIT.  I'm still running that kernel so it looks like
> it works.  Making those two locks global causes this deadlock on kjournal
> much quicker, and I don't need to run on an SMP machine (since my SMP
> machines are currently being used for other tasks).
>
> Some comments on my patch.  I only implement the wait queue when
> bit_spin_trylock is an actual lock (thus creating the problem). I didn't
> want to add this code if it was needed (ie. !(CONFIG_SMP &&
> CONFIG_DEBUG_SPINLOCKS)).  So in bit_spin_trylock, I define
> NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock.  When
> NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
> journal code.
>
> Now the question is, should we make those two locks global? It would help
> Ingo's cause (and mine as well). But I don't know the impact on a large
> SMP configuration.  Andrew, since you have a 16xSMP machine, could you (if
> you have time) try out the effect of that. If you do have time, then I'll
> send you a patch that goes on top of this one to change the two locks into
> global spin locks.
>
> Ingo, where do you want to go from here? I guess we need to wait on what
> Andrew decides.
>
> -- Steve
>
>

diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
--- linux-2.6.11.orig/fs/jbd/commit.c	2005-03-02 02:38:25.000000000 -0500
+++ linux-2.6.11/fs/jbd/commit.c	2005-03-17 03:40:06.000000000 -0500
@@ -80,15 +80,33 @@

 /*
  * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * held.  For ranking reasons we must trylock.  If we lose put ourselves on a
+ * state wait queue and we'll be woken up when it is unlocked. Then we return
+ * 0 to try this again.  j_list_lock is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
 	if (!jbd_trylock_bh_state(bh)) {
+		/*
+		 * jbd_trylock_bh_state always returns true unless CONFIG_SMP or
+		 * CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
+		 * The bit_spin_locks in jbd_lock_bh_state need to be removed anyway.
+		 */
+#ifdef NEED_JOURNAL_STATE_WAIT
+		DECLARE_WAITQUEUE(wait, current);
 		spin_unlock(&journal->j_list_lock);
-		schedule();
+		add_wait_queue_exclusive(&journal_state_wait,&wait);
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		/* Check to see if the lock has been unlocked in this short time */
+		if (jbd_is_locked_bh_state(bh))
+			schedule();
+		set_current_state(TASK_RUNNING);
+		remove_wait_queue(&journal_state_wait,&wait);
 		return 0;
+#else
+		/* This should never be hit */
+		BUG();
+#endif
 	}
 	return 1;
 }
diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
--- linux-2.6.11.orig/fs/jbd/journal.c	2005-03-02 02:37:49.000000000 -0500
+++ linux-2.6.11/fs/jbd/journal.c	2005-03-17 03:47:40.000000000 -0500
@@ -80,6 +80,11 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef NEED_JOURNAL_STATE_WAIT
+EXPORT_SYMBOL(journal_state_wait);
+DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
--- linux-2.6.11.orig/include/linux/jbd.h	2005-03-02 02:38:19.000000000 -0500
+++ linux-2.6.11/include/linux/jbd.h	2005-03-17 03:48:18.000000000 -0500
@@ -324,6 +324,20 @@
 	return bh->b_private;
 }

+#ifdef NEED_JOURNAL_STATE_WAIT
+/*
+ * The journal_state_wait is a wait queue that tasks will wait on
+ * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
+ * Instead of spinning on schedule, the task now adds itself to this wait queue
+ * and will be woken up when the jbd_lock_bh_state is released.
+ *
+ * Since the bit_spin_locks are only locks under CONFIG_SMP and
+ * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those
+ * cases.
+ */
+extern wait_queue_head_t journal_state_wait;
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_lock(BH_State, &bh->b_state);
@@ -342,6 +356,13 @@
 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
 	bit_spin_unlock(BH_State, &bh->b_state);
+#ifdef NEED_JOURNAL_STATE_WAIT
+	/*
+	 * There may be a task sleeping, and waiting to be woken up
+	 * when this is unlocked.
+	 */
+	wake_up(&journal_state_wait);
+#endif
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
diff -ur linux-2.6.11.orig/include/linux/spinlock.h linux-2.6.11/include/linux/spinlock.h
--- linux-2.6.11.orig/include/linux/spinlock.h	2005-03-02 02:38:09.000000000 -0500
+++ linux-2.6.11/include/linux/spinlock.h	2005-03-17 03:39:13.024466071 -0500
@@ -527,6 +527,9 @@
  *
  * Don't use this unless you really need to: spin_lock() and spin_unlock()
  * are significantly faster.
+ *
+ * FIXME: These are evil and need to be removed. They are currently only
+ *  used by the journal code of ext3.
  */
 static inline void bit_spin_lock(int bitnum, unsigned long *addr)
 {
@@ -557,6 +560,13 @@
 {
 	preempt_disable();
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+	/*
+	 * This is only used by the journal code of ext3 and if this
+	 * is set then we need to tell the journal code that it needs
+	 * a wait queue to keep kjournald from spinning on a lock.
+	 */
+#define NEED_JOURNAL_STATE_WAIT
+
 	if (test_and_set_bit(bitnum, addr)) {
 		preempt_enable();
 		return 0;

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)
  2005-03-18  9:23                                                                       ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt
@ 2005-03-18  9:32                                                                         ` Andrew Morton
  2005-03-18 10:38                                                                           ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Andrew Morton @ 2005-03-18  9:32 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> 
> Andrew,
> 
> Since I haven't gotten a response from you,

It sometimes takes me half a day to get onto looking at patches.  And if I
take them I usually don't reply (sorry).  But I don't drop stuff, so if you
don't hear, please assume the patch stuck.  If others raise objections
to the patch I'll usually duck it as well, but it's pretty obvious when that
happens.

I really should knock up a script to send out an email when I add a patch
to -mm.

> I'd figure that you may have
> missed this, since the subject didn't change.  So I changed the subject to
> get your attention, and I've resent this. Here's the patch to get rid of
> the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
> patch is appropriate.

I'm rather aghast at all the ifdeffery and complexity in this one.  But I
haven't looked at it closely yet.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)
  2005-03-18  9:32                                                                         ` Andrew Morton
@ 2005-03-18 10:38                                                                           ` Steven Rostedt
  2005-03-18 11:07                                                                             ` Andrew Morton
  0 siblings, 1 reply; 125+ messages in thread
From: Steven Rostedt @ 2005-03-18 10:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel


On Fri, 18 Mar 2005, Andrew Morton wrote:
> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >
> > Andrew,
> >
> > Since I haven't gotten a response from you,
>
> It sometimes takes me half a day to get onto looking at patches.  And if I
> take them I usually don't reply (sorry).  But I don't drop stuff, so if you
> don't hear, please assume the patch stuck.  If others raise objections
> to the patch I'll usually duck it as well, but it's pretty obvious when that
> happens.

Sorry, I didn't mean to be pushy. I understand that you have a lot on your
plate, and I'm sure you don't drop stuff. I just wasn't sure that you
noticed that that was a patch and not just a reply on this thread, since I
didn't flag it as such in the subject. I just didn't want it to slip under
the radar.


>
> I really should knock up a script to send out an email when I add a patch
> to -mm.
>

I thought you might have had something like that already, which was
another reason I thought you might have skipped this.


> > I'd figure that you may have
> > missed this, since the subject didn't change.  So I changed the subject to
> > get your attention, and I've resent this. Here's the patch to get rid of
> > the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
> > patch is appropriate.
>
> I'm rather aghast at all the ifdeffery and complexity in this one.  But I
> haven't looked at it closely yet.
>

I wanted to keep the wait logic out when it wasn't a problem. Basically,
the problem only occurs when bit_spin_trylock is defined as an actual
trylock. So I put in a define there to enable the wait queues.  I didn't
want to waste cycles checking the wait queue in jbd_unlock_bh_state when
there would never be anything on it.  Heck, I figured why even have the
wait queue wasting memory if it wasn't needed.  So that added the
ifdeffery complexity.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)
  2005-03-18 10:38                                                                           ` Steven Rostedt
@ 2005-03-18 11:07                                                                             ` Andrew Morton
  2005-03-18 12:10                                                                               ` Steven Rostedt
  0 siblings, 1 reply; 125+ messages in thread
From: Andrew Morton @ 2005-03-18 11:07 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
>  > I really should knock up a script to send out an email when I add a patch
>  > to -mm.
>  >
> 
>  I thought you might have had something like that already, which was
>  another reason I thought you might have skipped this.
>

I do now..

> 
>  > > I'd figure that you may have
>  > > missed this, since the subject didn't change.  So I changed the subject to
>  > > get your attention, and I've resent this. Here's the patch to get rid of
>  > > the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
>  > > patch is appropriate.
>  >
>  > I'm rather aghast at all the ifdeffery and complexity in this one.  But I
>  > haven't looked at it closely yet.
>  >
> 
>  I wanted to keep the wait logic out when it wasn't a problem. Basically,
>  the problem only occurs when bit_spin_trylock is defined as an actual
>  trylock. So I put in a define there to enable the wait queues.  I didn't
>  want to waste cycles checking the wait queue in jbd_unlock_bh_state when
>  there would never be anything on it.  Heck, I figured why even have the
>  wait queue wasting memory if it wasn't needed.  So that added the
>  ifdeffery complexity.

No, that code's just a problem.  For ranking reasons it's essentially doing
this:

repeat:
	cond_resched();
	spin_lock(j_list_lock);
	....
	if (!bit_spin_trylock(bh)) {
		spin_unlock(j_list_lock);
		schedule();
		goto repeat;
	}

Now imagine that some other CPU holds the bit_spin_lock and is spinning,
trying to get the spin_lock().  The above code assumes that the schedule()
and cond_resched() will take "long enough" for the other CPU to get the
spinlock, do its business then release the locks.

So all the schedule() is really doing is "blow a few cycles so the other
CPU can get in and grab the spinlock".  That'll work OK on normal SMP but I
suspect that on NUMA setups with really big latencies we could end up
starving the other CPU: this CPU would keep on grabbing the lock.  It
depends on how the interconnect cache and all that goop works.

So what to do?

One approach would be to spin on the bit_spin_trylock after having dropped
j_list_lock.  That'll tell us when the other CPU has moved on.

Another approach would be to sleep on a waitqueue somewhere.  But that
means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.

Another approach would be to simply whack an msleep(1) in there.  That
might be OK - it should be very rare.

Probably the first approach would be the one to use.  That's for mainline. 
I don't know what the super-duper-RT fix would be.  Why did we start
discussing this anyway?

Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
to make it do so.  But even then I don't see a problem for the mainline
kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
running.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)
  2005-03-18 11:07                                                                             ` Andrew Morton
@ 2005-03-18 12:10                                                                               ` Steven Rostedt
  0 siblings, 0 replies; 125+ messages in thread
From: Steven Rostedt @ 2005-03-18 12:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel


On Fri, 18 Mar 2005, Andrew Morton wrote:
> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >  I wanted to keep the wait logic out when it wasn't a problem. Basically,
> >  the problem only occurs when bit_spin_trylock is defined as an actual
> >  trylock. So I put in a define there to enable the wait queues.  I didn't
> >  want to waste cycles checking the wait queue in jbd_unlock_bh_state when
> >  there would never be anything on it.  Heck, I figured why even have the
> >  wait queue wasting memory if it wasn't needed.  So that added the
> >  ifdeffery complexity.
>
> No, that code's just a problem.  For ranking reasons it's essentially doing
> this:
>
> repeat:
> 	cond_resched();
> 	spin_lock(j_list_lock);
> 	....
> 	if (!bit_spin_trylock(bh)) {
> 		spin_unlock(j_list_lock);
> 		schedule();
> 		goto repeat;
> 	}
>

Yep, that I understand.

> Now imagine that some other CPU holds the bit_spin_lock and is spinning,
> trying to get the spin_lock().  The above code assumes that the schedule()
> and cond_resched() will take "long enough" for the other CPU to get the
> spinlock, do its business then release the locks.
>
> So all the schedule() is really doing is "blow a few cycles so the other
> CPU can get in and grab the spinlock".  That'll work OK on normal SMP but I
> suspect that on NUMA setups with really big latencies we could end up
> starving the other CPU: this CPU would keep on grabbing the lock.  It
> depends on how the interconnect cache and all that goop works.
>
> So what to do?
>
> One approach would be to spin on the bit_spin_trylock after having dropped
> j_list_lock.  That'll tell us when the other CPU has moved on.
>

This is probably the best for mainline, since, as you mentioned, the
abover code is just bad.

> Another approach would be to sleep on a waitqueue somewhere.  But that
> means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.
>

That's the approach that my patch made.

> Another approach would be to simply whack an msleep(1) in there.  That
> might be OK - it should be very rare.
>

This approach is not much better than the current implementation.

> Probably the first approach would be the one to use.  That's for mainline.
> I don't know what the super-duper-RT fix would be.  Why did we start
> discussing this anyway?
>
> Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
> to make it do so.  But even then I don't see a problem for the mainline
> kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
> running.
>

So this comes down to just a problem with Ingo's PREEPMT_RT.  This means
that the latency of kjournald, even without SCHED_FIFO will be large. If
it preempts a process that has one of these bit spinlocks, (Ingo's RT
kernel takes out the preempt_disable in them), then the kjournal thread
will spin till its quota is free, causing problems for other processes.
Even a process with a higher priority than kjournal if it blocks on one of
the other locks that kjournal can have while attempting to get the bit
locks.

I know Ingo wants to get his patch eventually into the mainline without
too much drag. But this problem needs to be solved in the mainline to
accomplish this.

What do you recommend?

-- Steve


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
  2005-03-18  6:58                                                                             ` Steven Rostedt
@ 2005-03-18 18:19                                                                               ` Lee Revell
  0 siblings, 0 replies; 125+ messages in thread
From: Lee Revell @ 2005-03-18 18:19 UTC (permalink / raw)
  To: rostedt; +Cc: Andrew Morton, mingo, linux-kernel

On Fri, 2005-03-18 at 01:58 -0500, Steven Rostedt wrote:
> 
> On Thu, 17 Mar 2005, Lee Revell wrote:
> >
> > OK, no need to cc: me on this one any more.  It's really low priority
> > IMO compared to the big latencies I am seeing with ext3 and
> > "data=ordered".  Unless you think there is any relation.
> >
> 
> IMO a deadlock is higher priority than a big latency :-)
> 

Of course, if I was hitting the deadlock in normal use.

> I still belive that something to do with the locking in ext3 has to do
> with your latencies, but I'll take you off when I send something to Andrew
> or Ingo next time. Hopefully, they'll do the same.

If you suspect they are related then yes I would like to be copied.

> 
> When this problem is solved on Ingo's side, maybe this will solve your
> latency problem, so I recommend that you keep trying the latest RT
> kernels.  BTW what test are you running that causes these latencies?

dbench 16

Lee


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [patch] Real-Time Preemption, deactivate() scheduling issue
  2005-03-03 19:36         ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints
  2005-03-03 22:32           ` Esben Nielsen
@ 2005-03-29  8:45           ` Ingo Molnar
  1 sibling, 0 replies; 125+ messages in thread
From: Ingo Molnar @ 2005-03-29  8:45 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: linux-kernel


* Eugeny S. Mints <emints@ru.mvista.com> wrote:

> please consider the following scenario for full RT kernel.
> 
> Task A is running then an irq is occured which in turn wakes up irq 
> related thread (B) of a higher priority than A.
> 
> my current understanding that actual context switch between A and B will 
> occure at preempt_schedule_irq() on the "return form irq " path.
> 
> in this case the following "if" statement in __schedule() always returns 
> false since  preempt_schedule_irq() always sets up  PREEMPT_ACTIVE 
> before __schedule() call.
> 
>         if ((prev->state & ~TASK_RUNNING_MUTEX) &&
>                         !(preempt_count() & PREEMPT_ACTIVE)) {
> 
> as result the deactivate() is never called for preempted task A in this 
> scenario. BUt if the task A is preempted while not in TASK_RUNNING state 
> such behaviour seems incorrect since we get a task in not TASK_RUNNING 
> state linked into a run queue.

this behavior is intentional: 'forced preemption' (of any sort, even in 
the upstream kernel's CONFIG_PREEMPT model) should not impact the task's 
state. So it does not modify p->state. [ The TASK_RUNNING_MUTEX state 
furthermore enables wakeups to occur in an invariant way: even though 
technically the tasks are on the runqueue, a 'normal' wakeup is still 
noticed and later on acted upon.]

this is very important for forced preemption to not impact the coding 
model of kernel code that is normally tested with !PREEMPT. (the 
TASK_RUNNING_MUTEX scheduler feature furthermore enables us to preempt 
without impacting wakeup logic.)

> An example:
> 
> drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)
> 
>         spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other 
> tx operations */
>         while (dev->tx_buff.len > 0) {    /* wait until tx idle */
>                 spin_unlock_irqrestore(&dev->tx_lock, flags);
> 76:             set_current_state(TASK_UNINTERRUPTIBLE);
>                 schedule_timeout(msecs_to_jiffies(10));
>                 spin_lock_irqsave(&dev->tx_lock, flags);
>         }
> 
> At  line 76 irqs are enabled, preemption is enabled.
> Let assume the task A executes this code and gets preempted right after 
> line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be 
> deactevated. Of cource this is the bug in set_current_state() 
> utilization in this particular driver but schedule stuff should be 
> robust to such bugs I believe. There are a lot such bugs in the kernel I 
> believe.

it is not a problem to have tasks with TASK_UNINTERRUPTIBLE on the 
runqueue - this happens every day with CONFIG_PREEMPT kernels, and it's 
fully intentional. Can you see any bugs caused by this behavior?

	Ingo

^ permalink raw reply	[flat|nested] 125+ messages in thread

end of thread, other threads:[~2005-03-29  8:51 UTC | newest]

Thread overview: 125+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-04 10:03 [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Ingo Molnar
2005-02-04 15:19 ` Kevin Hilman
2005-02-04 17:30   ` Ingo Molnar
2005-02-04 18:19 ` Tom Rini
2005-02-07  9:03   ` Ingo Molnar
2005-02-07 14:35     ` Tom Rini
2005-02-08  8:27       ` Ingo Molnar
2005-02-06  4:19 ` Valdis.Kletnieks
2005-02-07  9:21   ` Ingo Molnar
2005-02-07 15:08     ` Real-Time Preemption and UML? Esben Nielsen
2005-02-07 18:35       ` Jeff Dike
2005-02-07 23:14         ` Esben Nielsen
2005-02-08  8:39           ` Ingo Molnar
2005-02-08 18:55             ` Jeff Dike
2005-02-08 21:20               ` Esben Nielsen
2005-02-08 21:44                 ` Ingo Molnar
2005-02-08 23:02                   ` Esben Nielsen
2005-02-08  7:55 ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Valdis.Kletnieks
2005-02-08  8:45   ` Ingo Molnar
2005-02-08 10:26     ` Valdis.Kletnieks
2005-02-08 21:58 ` William Weston
2005-02-09 11:51   ` Ingo Molnar
2005-02-10  2:13     ` William Weston
2005-02-10  7:52       ` Ingo Molnar
2005-02-10 20:21         ` George Anzinger
2005-02-10 20:40           ` Ingo Molnar
2005-02-10 21:05             ` George Anzinger
2005-02-11  8:34               ` Ingo Molnar
2005-02-11  9:38                 ` Sven Dietrich
2005-02-11  9:42                   ` Ingo Molnar
2005-02-11  0:09           ` Sven Dietrich
2005-02-11  6:01             ` George Anzinger
2005-02-11  8:28             ` Ingo Molnar
2005-02-11  9:53               ` Sven Dietrich
2005-02-11 10:04                 ` Ingo Molnar
2005-02-11 21:49                   ` Steven Rostedt
2005-02-13 12:59                     ` Ingo Molnar
2005-02-13 15:11                       ` Steven Rostedt
2005-03-03 19:36         ` [patch] Real-Time Preemption, deactivate() scheduling issue Eugeny S. Mints
2005-03-03 22:32           ` Esben Nielsen
2005-03-04 11:56             ` Eugeny S. Mints
2005-03-04 15:45               ` George Anzinger
2005-03-29  8:45           ` Ingo Molnar
2005-02-09 12:48   ` [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 Stephen Smalley
2005-02-10  2:20     ` William Weston
2005-02-19  5:08 ` Lee Revell
2005-02-19  6:47   ` Lee Revell
2005-02-19  9:00   ` Ingo Molnar
2005-02-19  9:03     ` Ingo Molnar
2005-02-19 20:45       ` Lee Revell
2005-02-20  0:19         ` Lee Revell
2005-03-17 16:33         ` Lee Revell
2005-02-23  2:22       ` Lee Revell
2005-03-10  9:37   ` Steven Rostedt
2005-03-10  9:54     ` Steven Rostedt
2005-03-11  9:57       ` Ingo Molnar
2005-03-11 10:15         ` Steven Rostedt
2005-03-11 10:17           ` Ingo Molnar
2005-03-11 10:24             ` Steven Rostedt
2005-03-11 10:43               ` Andrew Morton
2005-03-11 10:53                 ` Steven Rostedt
2005-03-11 14:40                 ` Steven Rostedt
2005-03-11 15:08                   ` Steven Rostedt
2005-03-11 15:30                     ` K.R. Foley
2005-03-11 15:38                   ` Ingo Molnar
2005-03-11 16:01                     ` Steven Rostedt
2005-03-11 20:39                     ` Steven Rostedt
2005-03-11 20:46                       ` Lee Revell
2005-03-11 22:06                         ` Lee Revell
2005-03-14  7:37                           ` Steven Rostedt
2005-03-14  9:33                             ` Steven Rostedt
2005-03-14 10:10                               ` Steven Rostedt
2005-03-14 15:50                                 ` Steven Rostedt
2005-03-14 19:02                                   ` Steven Rostedt
2005-03-15 11:44                                   ` Steven Rostedt
2005-03-15 12:00                                     ` Ingo Molnar
2005-03-15 13:07                                       ` Steven Rostedt
2005-03-15 13:35                                         ` Ingo Molnar
2005-03-15 13:55                                           ` Steven Rostedt
2005-03-15 19:12                                             ` Andrew Morton
2005-03-15 18:05                                           ` Steven Rostedt
2005-03-15 19:09                                             ` Lee Revell
2005-03-16  7:50                                               ` Steven Rostedt
2005-03-16 18:21                                                 ` Lee Revell
2005-03-16  7:31                                             ` Steven Rostedt
2005-03-16  8:50                                             ` Ingo Molnar
2005-03-16  9:15                                               ` Andrew Morton
2005-03-16  9:51                                                 ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Ingo Molnar
2005-03-16  9:53                                                   ` [patch 1/3] j_state_lock -> j_state_sem Ingo Molnar
2005-03-16  9:53                                                     ` [patch 2/3] j_list_lock -> j_list_sem Ingo Molnar
2005-03-16  9:57                                                       ` [patch 3/3] remove bitlocks Ingo Molnar
2005-03-16 10:04                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Andrew Morton
2005-03-16 10:12                                                     ` Ingo Molnar
2005-03-16 10:23                                                       ` Steven Rostedt
2005-03-16 10:26                                                         ` Ingo Molnar
2005-03-16 10:26                                                       ` Andrew Morton
2005-03-16 10:29                                                         ` Ingo Molnar
2005-03-16 10:41                                                           ` Andrew Morton
2005-03-16 10:34                                                         ` Arjan van de Ven
2005-03-16 10:19                                                     ` Ingo Molnar
2005-03-16 10:40                                                       ` Andrew Morton
2005-03-16 10:51                                                         ` Ingo Molnar
2005-03-16 11:05                                                         ` Steven Rostedt
2005-03-16 11:19                                                           ` Andrew Morton
2005-03-16 14:04                                                             ` Steven Rostedt
2005-03-16 16:47                                                               ` Steven Rostedt
2005-03-16 17:47                                                                 ` Steven Rostedt
2005-03-16 19:20                                                                   ` Lee Revell
2005-03-17  7:15                                                                     ` Steven Rostedt
2005-03-17 15:41                                                                       ` Lee Revell
2005-03-17 16:23                                                                         ` Steven Rostedt
2005-03-17 16:36                                                                           ` Lee Revell
2005-03-18  6:58                                                                             ` Steven Rostedt
2005-03-18 18:19                                                                               ` Lee Revell
2005-03-16 21:15                                                                   ` Andrew Morton
2005-03-17  9:21                                                                     ` Steven Rostedt
2005-03-18  9:23                                                                       ` [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks) Steven Rostedt
2005-03-18  9:32                                                                         ` Andrew Morton
2005-03-18 10:38                                                                           ` Steven Rostedt
2005-03-18 11:07                                                                             ` Andrew Morton
2005-03-18 12:10                                                                               ` Steven Rostedt
2005-03-17  9:58                                                                   ` [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks Steven Rostedt
2005-03-11  9:28 ` [patch] Real-Time Preemption, -RT-2.6.11-final-V0.7.40-00 Ingo Molnar
2005-03-11 12:10   ` Andrew Walrond
2005-03-14 20:19     ` Tom Rini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).