Linux 0.01 disk lockup

Message ID Pine.LNX.3.96.1010927150812.28147B-100000@artax.karlin.mff.cuni.cz
State New, archived
Headers show
Series
  • Linux 0.01 disk lockup
Related show

Commit Message

Mikulas Patocka Sept. 27, 2001, 1:34 p.m. UTC
Hi.

Linux 0.01 has a bug in disk request sorting - when interrupt happens
while sorting is active, the interrupt routine won't clear do_hd - thus
the disk will stay locked up forever. 

Function add_request also lacks memory barriers - the compiler could
reorder writes to variable sorting and writes to request queue - producing
race conditions. Because gcc 1.40 does not have __asm__("":::"memory"), I
had to use dummy function call as a memory barrier. 

The following patch fixes both issues.

Mikulas


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Arnaldo Carvalho de Melo Sept. 27, 2001, 1:47 p.m. UTC | #1
Em Thu, Sep 27, 2001 at 03:34:11PM +0200, Mikulas Patocka escreveu:
> Linux 0.01 has a bug in disk request sorting - when interrupt happens
> while sorting is active, the interrupt routine won't clear do_hd - thus
> the disk will stay locked up forever. 
> 
> Function add_request also lacks memory barriers - the compiler could
> reorder writes to variable sorting and writes to request queue - producing
> race conditions. Because gcc 1.40 does not have __asm__("":::"memory"), I
> had to use dummy function call as a memory barrier. 
> 
> The following patch fixes both issues.

Fantastic! who is the maintainer for the 0.x kernel series these days? I
thought that 2.0 was Dave W., 2.2 was Alan, 2.4 Linus, so now we have to
find people for 1.2 and finally get 1.2.14 released, man, how I wanted one
with the dynamic PPP code in back in those days... 8)

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Richard Gooch Sept. 27, 2001, 3:12 p.m. UTC | #2
Mikulas Patocka writes:
> Linux 0.01 has a bug in disk request sorting - when interrupt
> happens while sorting is active, the interrupt routine won't clear
> do_hd - thus the disk will stay locked up forever.

Er, why bother to fix bugs in such an ancient kernel, rather than
upgrading to a more modern kernel (like 0.98:-)? It's like finding a
bug in 2.3.30 and fixing it rather than grabbing 2.4.10 and seeing if
the problem persists.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Linus Torvalds Sept. 27, 2001, 3:27 p.m. UTC | #3
On Thu, 27 Sep 2001, Mikulas Patocka wrote:
>
> Linux 0.01 has a bug in disk request sorting - when interrupt happens
> while sorting is active, the interrupt routine won't clear do_hd - thus
> the disk will stay locked up forever.

Ehh..

Mikulas, do you want to be the official maintainer for the 0.01.xxx
series?

Note that much of the maintenance work is probably just to reproduce and
make all the user-level etc infrastructure available..

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mikulas Patocka Sept. 27, 2001, 4:07 p.m. UTC | #4
> > Linux 0.01 has a bug in disk request sorting - when interrupt
> > happens while sorting is active, the interrupt routine won't clear
> > do_hd - thus the disk will stay locked up forever.
> 
> Er, why bother to fix bugs in such an ancient kernel, rather than
> upgrading to a more modern kernel (like 0.98:-)? It's like finding a
> bug in 2.3.30 and fixing it rather than grabbing 2.4.10 and seeing if
> the problem persists.

Well - why not? The disk interrupt locking algorithm in 0.01 is beautiful
(except for the bug - but it can be fixed). It's something you don't see
in 2.4.10 with __cli, __sti, __save_flags, __restore_flags everywhere. So
why not to post a bug report and patch for 10th anniversary of Linux?

Mikulas




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mikulas Patocka Sept. 27, 2001, 4:08 p.m. UTC | #5
> > Linux 0.01 has a bug in disk request sorting - when interrupt happens
> > while sorting is active, the interrupt routine won't clear do_hd - thus
> > the disk will stay locked up forever.
> 
> Ehh..
> 
> Mikulas, do you want to be the official maintainer for the 0.01.xxx
> series?
> 
> Note that much of the maintenance work is probably just to reproduce and
> make all the user-level etc infrastructure available..

It would be cool to have linux-0.01 distribution. I started to use linux
in 2.0 times, so I'm probably not the right person to maintain it. I don't
even know where to get programs for it and I doubt it would work on my 4G
disk.

Mikulas




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Rob Landley Sept. 29, 2001, 9:16 p.m. UTC | #6
On Thursday 27 September 2001 12:08, Mikulas Patocka wrote:
> > > Linux 0.01 has a bug in disk request sorting - when interrupt happens
> > > while sorting is active, the interrupt routine won't clear do_hd - thus
> > > the disk will stay locked up forever.
> >
> > Ehh..
> >
> > Mikulas, do you want to be the official maintainer for the 0.01.xxx
> > series?
> >
> > Note that much of the maintenance work is probably just to reproduce and
> > make all the user-level etc infrastructure available..
>
> It would be cool to have linux-0.01 distribution. I started to use linux
> in 2.0 times, so I'm probably not the right person to maintain it. I don't
> even know where to get programs for it and I doubt it would work on my 4G
> disk.
>
> Mikulas

You might want to read the mailing list entries from 1991 and early 1992:

http://www.kclug.org/old_archives/linux-activists/

I've put together a summary of some of the more interesting early posts from 
1991 and early 1992 for the computer history book I'm writing...

http://penguicon.sourceforge.net/comphist/1991.html

http://penguicon.sourceforge.net/comphist/1992.html

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff -u -r linux-orig/kernel/hd.c linux/kernel/hd.c
--- linux-orig/kernel/hd.c	Tue Sep 17 17:05:21 1991
+++ linux/kernel/hd.c	Thu Sep 27 15:14:38 2001
@@ -55,9 +55,9 @@ 
 ((s1)->head<(s2)->head || (s1)->head==(s2)->head && \
 ((s1)->sector<(s2)->sector))))
 
-static struct hd_request * this_request = NULL;
+struct hd_request * this_request = NULL;
 
-static int sorting=0;
+int sorting=0;
 
 static void do_request(void);
 static void reset_controller(void);
@@ -293,8 +293,10 @@ 
 {
 	int i,r;
 
-	if (sorting)
+	if (sorting) {
+		do_hd=NULL;
 		return;
+	}
 	if (!this_request) {
 		do_hd=NULL;
 		return;
@@ -319,6 +321,8 @@ 
 		panic("unknown hd-command");
 }
 
+void barrier();
+
 /*
  * add-request adds a request to the linked list.
  * It sets the 'sorting'-variable when doing something
@@ -338,6 +342,7 @@ 
  * disabling interrupts.
  */
 	sorting=1;
+	barrier();
 	if (!(tmp=this_request))
 		this_request=req;
 	else {
@@ -354,15 +359,19 @@ 
 			tmp->next=req;
 		}
 	}
+	barrier();
 	sorting=0;
+	barrier();
 /*
  * NOTE! As a result of sorting, the interrupts may have died down,
  * as they aren't redone due to locking with sorting=1. They might
  * also never have started, if this is the first request in the queue,
  * so we restart them if necessary.
  */
-	if (!do_hd)
+	if (!do_hd) {
+		barrier();
 		do_request();
+	}
 }
 
 void rw_abs_hd(int rw,unsigned int nr,unsigned int sec,unsigned int head,
diff -u -r linux-orig/kernel/system_call.s linux/kernel/system_call.s
--- linux-orig/kernel/system_call.s	Tue Sep 17 17:50:52 1991
+++ linux/kernel/system_call.s	Thu Sep 27 14:59:37 2001
@@ -47,7 +47,7 @@ 
 
 nr_system_calls = 67
 
-.globl _system_call,_sys_fork,_timer_interrupt,_hd_interrupt,_sys_execve
+.globl _system_call,_sys_fork,_timer_interrupt,_hd_interrupt,_sys_execve,_barrier
 
 .align 2
 bad_sys_call:
@@ -186,6 +186,9 @@ 
 	call _copy_process
 	addl $20,%esp
 1:	ret
+
+_barrier:
+	ret
 
 _hd_interrupt:
 	pushl %eax