entry.S mess with %ebx (&current pointer)

Message ID Pine.LNX.3.96.981119112030.27027A-100000@dragon.bogus
State New, archived
Headers show
Series
  • entry.S mess with %ebx (&current pointer)
Related show

Commit Message

Andrea Arcangeli Nov. 19, 1998, 10:45 a.m. UTC
On Tue, 17 Nov 1998, Andrea Arcangeli wrote:

>I run the fork-test on 2.1.127 + patch IKD. It never hanged in two hours
>(under any kind of load). It was used to hang in arca-22 after some
>minutes.  So the bug is hided very well by the IKD tracing slowdown :-(.

I am sure that the kernel run do_signal() from signal_return, even if
sigpending is 0.  This mean that %ebx got corrupted somewhere (and this
explain very well also the need_resched flood I suspected some days ago).
A patch for 2.1.128 that fix the UP hang fine here is this: 



The only important thing is the added GET_CURRENT(). All other parts are
only strightforward and sensible performance optimizations. Note that
2.1.128 was used to run a do_signal() without any signal pending many many
times so fixing this bug should improve performance a lot (plus the
improvement I made replacing all `call(function)' with `pushl retaddress;
jmp function'. 

I had to think a bit more before say that the bug is due a gcc
miscompilation. Are we sure that %ebx is supposed to been not clobbered
across functions? Could be the somewhere used regparm attribute that
caused the miscompilation? This would explain also because why the IDK
kernel is immune from the stall. In the IKD kernel we #define FASTCALL(x) 
x, and I hacked ~all function in the kernel to handle the x86 normal
calling conventions (btw, in __switch_to() I used the
__attribute__((stdcall)) to avoid having to release the stack by hand in
the asm and to avoid having to use `call' in the calling asm :-).

I suggest everybody on x86 to try this patch also in SMP machines that was
used to hang frequently... Now _everything_ works __fine__ here on UP
with an UP kernel. I don' t know right now why the SMP kernel on UP
hardware was not reproducing the problem... If there will be further
signal problems let me know...

Andrea Arcangeli

PS. This patch and a lot of other things will be in the soon arca-23.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Comments

Linus Torvalds Nov. 19, 1998, 6:01 p.m. UTC | #1
On Thu, 19 Nov 1998, Andrea Arcangeli wrote:
> 
> I am sure that the kernel run do_signal() from signal_return, even if
> sigpending is 0.  This mean that %ebx got corrupted somewhere (and this
> explain very well also the need_resched flood I suspected some days ago).
> A patch for 2.1.128 that fix the UP hang fine here is this: 

Good debugging, but the fix is incorrect (or at least unnecessarily slow).

I see where the problem is, and I also see why it _only_ happens on UP.

The problem is that the fork return point to the new process does not
initialize %ebx in the UP case.

It _does_ initialize it in the SMP case, and this is basically just an
oversight. 

This patch should fix it properly, please tell me whether that is true..

		Linus

-----
diff -u --recursive --new-file v2.1.129/linux/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- v2.1.129/linux/arch/i386/kernel/entry.S	Sun Nov  8 14:02:42 1998
+++ linux/arch/i386/kernel/entry.S	Thu Nov 19 09:53:08 1998
@@ -150,14 +150,14 @@
 	jmp ret_from_sys_call
 
 
-#ifdef __SMP__
 	ALIGN
 	.globl	ret_from_smpfork
-ret_from_smpfork:
+ret_from_fork:
 	GET_CURRENT(%ebx)
+#ifdef __SMP__
 	btrl	$0, SYMBOL_NAME(scheduler_lock)
-	jmp	ret_from_sys_call
 #endif /* __SMP__ */
+	jmp	ret_from_sys_call
 
 /*
  * Return to user mode is not as complex as all this looks,
diff -u --recursive --new-file v2.1.129/linux/arch/i386/kernel/process.c linux/arch/i386/kernel/process.c
--- v2.1.129/linux/arch/i386/kernel/process.c	Fri Oct  9 13:27:05 1998
+++ linux/arch/i386/kernel/process.c	Thu Nov 19 09:53:35 1998
@@ -50,11 +50,7 @@
 
 spinlock_t semaphore_wake_lock = SPIN_LOCK_UNLOCKED;
 
-#ifdef __SMP__
-asmlinkage void ret_from_fork(void) __asm__("ret_from_smpfork");
-#else
-asmlinkage void ret_from_fork(void) __asm__("ret_from_sys_call");
-#endif
+asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 #ifdef CONFIG_APM
 extern int  apm_do_idle(void);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Linus Torvalds Nov. 19, 1998, 6:08 p.m. UTC | #2
On Thu, 19 Nov 1998, Linus Torvalds wrote:
> 
> This patch should fix it properly, please tell me whether that is true..

Duh, the ".globl" declaration should also obviously be fixed to be
ret_from_fork rather than ret_from_smpfork in order for this to link..

		Linus

-----
> diff -u --recursive --new-file v2.1.129/linux/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
> --- v2.1.129/linux/arch/i386/kernel/entry.S	Sun Nov  8 14:02:42 1998
> +++ linux/arch/i386/kernel/entry.S	Thu Nov 19 09:53:08 1998
> @@ -150,14 +150,14 @@
>  	jmp ret_from_sys_call
>  
>  
> -#ifdef __SMP__
>  	ALIGN
>  	.globl	ret_from_smpfork
		^^^^^^^^^^^^^^^^


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Andrea Arcangeli Nov. 19, 1998, 7:22 p.m. UTC | #3
On Thu, 19 Nov 1998, Linus Torvalds wrote:

>Good debugging, but the fix is incorrect (or at least unnecessarily slow).

Of course. I posted the working fix (that looked sane to me) because it
was late I had the train pending. I was going to discover _which_ syscall
was clobbering %ebx now, one second before to read your solution ;-). 

>This patch should fix it properly, please tell me whether that is true..

This allow the linking:

Index: arch/i386/kernel/entry.S
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/entry.S,v
retrieving revision 1.1.1.1.16.3
diff -u -r1.1.1.1.16.3 entry.S
--- entry.S	1998/11/19 19:01:50	1.1.1.1.16.3
+++ entry.S	1998/11/19 19:16:33
@@ -151,7 +151,7 @@
 
 
 	ALIGN
-	.globl	ret_from_smpfork
+	.globl	ret_from_fork
 ret_from_fork:
 	GET_CURRENT(%ebx)
 #ifdef __SMP__


I have not yet rebooted but it' s sure right because the stall was caused
by the new forked processes a bit before return in userspace (infact this
morning I just verifyed that the asm of sys_fork() was saving %ebx right).
If there will be problems I' ll let you know... 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Patch

Index: linux/arch/i386/kernel/entry.S
===================================================================
RCS file: /var/cvs/linux/arch/i386/kernel/entry.S,v
retrieving revision 1.1.1.1.16.1
diff -u -r1.1.1.1.16.1 entry.S
--- linux/arch/i386/kernel/entry.S	1998/10/31 00:35:56	1.1.1.1.16.1
+++ entry.S	1998/11/19 10:08:16
@@ -184,6 +184,7 @@ 
 	andl SYMBOL_NAME(bh_active),%eax
 	jne handle_bottom_half
 ret_with_reschedule:
+	GET_CURRENT(%ebx)
 	cmpl $0,need_resched(%ebx)
 	jne reschedule
 	cmpl $0,sigpending(%ebx)
@@ -197,16 +198,16 @@ 
 	movl %esp,%eax
 	jne v86_signal_return
 	xorl %edx,%edx
-	call SYMBOL_NAME(do_signal)
-	jmp restore_all
+	pushl $restore_all
+	jmp SYMBOL_NAME(do_signal)
 
 	ALIGN
 v86_signal_return:
 	call SYMBOL_NAME(save_v86_state)
 	movl %eax,%esp
 	xorl %edx,%edx
-	call SYMBOL_NAME(do_signal)
-	jmp restore_all
+	pushl $restore_all
+	jmp SYMBOL_NAME(do_signal)
 
 	ALIGN
 tracesys:
@@ -215,8 +216,8 @@ 
 	movl ORIG_EAX(%esp),%eax
 	call *SYMBOL_NAME(sys_call_table)(,%eax,4)
 	movl %eax,EAX(%esp)		# save the return value
-	call SYMBOL_NAME(syscall_trace)
-	jmp ret_from_sys_call
+	pushl $ret_from_sys_call
+	jmp SYMBOL_NAME(syscall_trace)
 badsys:
 	movl $-ENOSYS,EAX(%esp)
 	jmp ret_from_sys_call