linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.4.21]: nbd ksymoops-report
@ 2003-08-07 14:04 Bernd Schubert
  2003-08-07 14:46 ` Lou Langholtz
  2003-08-07 16:53 ` Paul Clements
  0 siblings, 2 replies; 9+ messages in thread
From: Bernd Schubert @ 2003-08-07 14:04 UTC (permalink / raw)
  To: linux-kernel

Hi,

every time when nbd-client disconnects a nbd-device the decoded oops 
from below will happen. 
This only happens after we upgraded from 2.4.20 to 2.4.21, 
so I guess the backported update from 2.5.50 causes this. 
Since the changelog for 2.4.22-rc1 doesn't describe any updates to nbd, 
I think this will be also valid for this kernel version. I will check this 
later on this evening.

ksymoops 2.4.8 on i686 2.4.21-tc2.  Options used
     -v /usr/src/System.maps/vmlinux__2.4.21-tc2 (specified)
     -k /proc/ksyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.21-tc2/ (default)
     -m /usr/src/System.maps/System.map__2.4.21-tc2 (specified)

Aug  6 17:24:31 goedel kernel: d89e2be7
Aug  6 17:24:31 goedel kernel: Oops: 0000
Aug  6 17:24:31 goedel kernel: CPU:    0
Aug  6 17:24:31 goedel kernel: EIP:    1010:[<d89e2be7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Aug  6 17:24:31 goedel kernel: EFLAGS: 00010282
Aug  6 17:24:31 goedel kernel: eax: 00000000   ebx: d89e43c4   ecx: 00000001   edx: 00000001
Aug  6 17:24:31 goedel kernel: esi: 00000000   edi: d89e43a0   ebp: 00000000   esp: d61a5f14
Aug  6 17:24:31 goedel kernel: ds: 1018   es: 1018   ss: 1018
Aug  6 17:24:31 goedel kernel: Process nbd-client (pid: 650, stackpage=d61a5000)
Aug  6 17:24:31 goedel kernel: Stack: d89e367c d4cd56e0 00000400 0000ab03 ffffffe7 00000000 d61a4000 d7fe44fc
Aug  6 17:24:31 goedel kernel:        d61a4000 00098c93 00098c94 00030002 00098c96 00098c97 00098d55 00098d56 
Aug  6 17:24:31 goedel kernel:        00098d57 00098d58 00098d59 00098d5a 00098d5b 00098d5c 00098d5d 00098e1b
Aug  6 17:24:31 goedel kernel: Call Trace:    [<d89e367c>] [<c0143f94>] [<c014c157>] [<c010a013>]
Aug  6 17:24:31 goedel kernel: Code: 8b 50 08 6a 03 50 8b 42 28 ff d0 c7 86 ac 43 9e d8 00 00 00


>>EIP; d89e2be7 <[nbd]nbd_ioctl+353/480>   <=====

>>ebx; d89e43c4 <[nbd].data.end+a4d/96e9>
>>edi; d89e43a0 <[nbd].data.end+a29/96e9>
>>esp; d61a5f14 <_end+15e07790/185558dc>

Trace; d89e367c <[nbd]__module_license+5db/78b>
Trace; c0143f94 <blkdev_ioctl+28/34>
Trace; c014c157 <sys_ioctl+1bb/1f7>
Trace; c010a013 <system_call+33/40>

Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>
00000000 <_EIP>:
Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>   <=====
   0:   8b 50 08                  mov    0x8(%eax),%edx   <=====
Code;  d89e2bea <[nbd]nbd_ioctl+356/480>
   3:   6a 03                     push   $0x3
Code;  d89e2bec <[nbd]nbd_ioctl+358/480>
   5:   50                        push   %eax
Code;  d89e2bed <[nbd]nbd_ioctl+359/480>
   6:   8b 42 28                  mov    0x28(%edx),%eax
Code;  d89e2bf0 <[nbd]nbd_ioctl+35c/480>
   9:   ff d0                     call   *%eax
Code;  d89e2bf2 <[nbd]nbd_ioctl+35e/480>
   b:   c7 86 ac 43 9e d8 00      movl   $0x0,0xd89e43ac(%esi)
Code;  d89e2bf9 <[nbd]nbd_ioctl+365/480>
  12:   00 00 00



-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 14:04 [2.4.21]: nbd ksymoops-report Bernd Schubert
@ 2003-08-07 14:46 ` Lou Langholtz
  2003-08-07 16:53 ` Paul Clements
  1 sibling, 0 replies; 9+ messages in thread
From: Lou Langholtz @ 2003-08-07 14:46 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-kernel, Paul Clements

Bernd Schubert wrote:

>Hi,
>
>every time when nbd-client disconnects a nbd-device the decoded oops 
>from below will happen. 
>This only happens after we upgraded from 2.4.20 to 2.4.21, 
>so I guess the backported update from 2.5.50 causes this. 
>Since the changelog for 2.4.22-rc1 doesn't describe any updates to nbd, 
>I think this will be also valid for this kernel version. I will check this 
>later on this evening.
>  
>
>. . .
>  
>
I've seen oops's from nbd disconnect in 2.4 also when some blocks were 
still being flushed (using the standard linux kernel distributed nbd 
driver). I don't know of any back ported fixes to nbd of the ones I've 
been introducing into 2.5+ kernels and have no idea though what could 
have changed between 2.4.20 and 2.4.21 that causes the diff you've seen 
(unless you just never tried the disconnect while blocks still had to be 
flushed before). But a lot of the nbd fixes that have been getting 
introduced into 2.5+ could very well close races and eliminate oops's in 
2.4 also. Getting some more exposure to these fixes in the 2.5+ kernels 
has made a lot of sense since these aren't supposed to be as stable and 
things can be tested more acceptably but at some point back-porting 
starts making sense too. Are we at that point yet?? I don't know. Paul 
Clements is now the NBD maintainer. We should see what he says (I've 
CC'd him on this email).

Stay in touch.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 14:04 [2.4.21]: nbd ksymoops-report Bernd Schubert
  2003-08-07 14:46 ` Lou Langholtz
@ 2003-08-07 16:53 ` Paul Clements
  2003-08-07 17:34   ` Paul Clements
  2003-08-07 17:40   ` Lou Langholtz
  1 sibling, 2 replies; 9+ messages in thread
From: Paul Clements @ 2003-08-07 16:53 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-kernel

On Thu, 7 Aug 2003, Bernd Schubert wrote:

> every time when nbd-client disconnects a nbd-device the decoded oops 
> from below will happen. 
> This only happens after we upgraded from 2.4.20 to 2.4.21, 
> so I guess the backported update from 2.5.50 causes this. 

Yes, it's definitely related to this...


> Aug  6 17:24:31 goedel kernel: Process nbd-client (pid: 650, stackpage=d61a5000)

Are you using the v2.0 nbd-client from nbd.sf.net?


> Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>
> 00000000 <_EIP>:
> Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>   <=====
>    0:   8b 50 08                  mov    0x8(%eax),%edx   <=====
> Code;  d89e2bea <[nbd]nbd_ioctl+356/480>
>    3:   6a 03                     push   $0x3
> Code;  d89e2bec <[nbd]nbd_ioctl+358/480>
>    5:   50                        push   %eax
> Code;  d89e2bed <[nbd]nbd_ioctl+359/480>
>    6:   8b 42 28                  mov    0x28(%edx),%eax
> Code;  d89e2bf0 <[nbd]nbd_ioctl+35c/480>
>    9:   ff d0                     call   *%eax


This corresponds to the following source:

lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);

Somehow, lo->sock is NULL here. The only way I see that this could
happen is if NBD_CLEAR_SOCK got called out of order (or you're 
using some non-standard nbd-client).

I guess it would be best to protect the NULLing of lo->sock 
in NBD_CLEAR_SOCK just in case, anyway.

Would you be willing to test a patch against 2.4.21?

--
Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 16:53 ` Paul Clements
@ 2003-08-07 17:34   ` Paul Clements
  2003-08-07 18:40     ` Bernd Schubert
  2003-08-07 22:25     ` Paul Clements
  2003-08-07 17:40   ` Lou Langholtz
  1 sibling, 2 replies; 9+ messages in thread
From: Paul Clements @ 2003-08-07 17:34 UTC (permalink / raw)
  To: Bernd Schubert, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

Paul Clements wrote:
> 
> On Thu, 7 Aug 2003, Bernd Schubert wrote:
> 
> > every time when nbd-client disconnects a nbd-device the decoded oops
> > from below will happen.
> > This only happens after we upgraded from 2.4.20 to 2.4.21,
> > so I guess the backported update from 2.5.50 causes this.

[snip]
 
> This corresponds to the following source:
> 
> lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
> 
> Somehow, lo->sock is NULL here. The only way I see that this could

Alright, looking back over the nbd-client source I now see what's going
on. You're calling "nbd-client -d" to manually disconnect?


> Would you be willing to test a patch against 2.4.21?

If you're willing to test the attached patch, I'd be grateful. Otherwise
I'll test it in the next few days and forward on to Marcelo...


Thanks,
Paul

[-- Attachment #2: nbd_sock_null_race_fix_2_4_21.diff --]
[-- Type: text/x-diff, Size: 1099 bytes --]

--- linux-2.4.21-PRISTINE/drivers/block/nbd.c	2003-06-13 10:51:32.000000000 -0400
+++ linux-2.4.21/drivers/block/nbd.c	2003-08-07 13:24:48.000000000 -0400
@@ -428,23 +428,24 @@ static int nbd_ioctl(struct inode *inode
                 return 0 ;
  
 	case NBD_CLEAR_SOCK:
+		error = 0;
+		down(&lo->tx_lock);
+		lo->sock = NULL;
+		up(&lo->tx_lock);
+		spin_lock(&lo->queue_lock);
+		file = lo->file;
+		lo->file = NULL;
+		spin_unlock(&lo->queue_lock);
 		nbd_clear_que(lo);
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
-			spin_unlock(&lo->queue_lock);
-			printk(KERN_ERR "nbd: Some requests are in progress -> can not turn off.\n");
-			return -EBUSY;
+			printk(KERN_ERR "nbd: disconnect: some requests are in progress -> please try again.\n");
+			error = -EBUSY;
 		}
-		file = lo->file;
-		if (!file) {
-			spin_unlock(&lo->queue_lock);
-			return -EINVAL;
-		}
-		lo->file = NULL;
-		lo->sock = NULL;
 		spin_unlock(&lo->queue_lock);
-		fput(file);
-		return 0;
+		if (file)
+			fput(file);
+		return error;
 	case NBD_SET_SOCK:
 		if (lo->file)
 			return -EBUSY;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 16:53 ` Paul Clements
  2003-08-07 17:34   ` Paul Clements
@ 2003-08-07 17:40   ` Lou Langholtz
  1 sibling, 0 replies; 9+ messages in thread
From: Lou Langholtz @ 2003-08-07 17:40 UTC (permalink / raw)
  To: Paul.Clements; +Cc: Bernd Schubert, linux-kernel

Paul Clements wrote:

>On Thu, 7 Aug 2003, Bernd Schubert wrote:
>
>  
>
>>every time when nbd-client disconnects a nbd-device the decoded oops 
>>from below will happen. 
>>This only happens after we upgraded from 2.4.20 to 2.4.21, 
>>so I guess the backported update from 2.5.50 causes this. 
>>    
>>
>
>Yes, it's definitely related to this...
>
>
>  
>
>>Aug  6 17:24:31 goedel kernel: Process nbd-client (pid: 650, stackpage=d61a5000)
>>    
>>
>
>Are you using the v2.0 nbd-client from nbd.sf.net?
>
>
>  
>
>>Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>
>>00000000 <_EIP>:
>>Code;  d89e2be7 <[nbd]nbd_ioctl+353/480>   <=====
>>   0:   8b 50 08                  mov    0x8(%eax),%edx   <=====
>>Code;  d89e2bea <[nbd]nbd_ioctl+356/480>
>>   3:   6a 03                     push   $0x3
>>Code;  d89e2bec <[nbd]nbd_ioctl+358/480>
>>   5:   50                        push   %eax
>>Code;  d89e2bed <[nbd]nbd_ioctl+359/480>
>>   6:   8b 42 28                  mov    0x28(%edx),%eax
>>Code;  d89e2bf0 <[nbd]nbd_ioctl+35c/480>
>>   9:   ff d0                     call   *%eax
>>    
>>
>
>
>This corresponds to the following source:
>
>lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
>
>Somehow, lo->sock is NULL here. The only way I see that this could
>happen is if NBD_CLEAR_SOCK got called out of order (or you're 
>using some non-standard nbd-client).
>
The out-of-order problem is due to "nbd-client -d" (the disconnect 
thread) winning a race with "nbd-client" and setting sock = NULL after 
nbd_do_it returned and before NBD_DO_IT gets into its down'd region and 
calls shutdown. This was the hazardous race that I was having a hard 
time remembering and explaining before that also needed locking for.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 17:34   ` Paul Clements
@ 2003-08-07 18:40     ` Bernd Schubert
  2003-08-07 18:45       ` Paul Clements
  2003-08-07 22:25     ` Paul Clements
  1 sibling, 1 reply; 9+ messages in thread
From: Bernd Schubert @ 2003-08-07 18:40 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-kernel

Hello!

Yes we are using the nbd-client from sf.net (due to other problems we replaced 
the debian (non-standard) sf.net binary with our own compiled binary).

On Thursday 07 August 2003 19:34, you wrote:
> Paul Clements wrote:
> > On Thu, 7 Aug 2003, Bernd Schubert wrote:
> > > every time when nbd-client disconnects a nbd-device the decoded oops
> > > from below will happen.
> > > This only happens after we upgraded from 2.4.20 to 2.4.21,
> > > so I guess the backported update from 2.5.50 causes this.
>
> [snip]
>
> > This corresponds to the following source:
> >
> > lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
> >
> > Somehow, lo->sock is NULL here. The only way I see that this could
>
> Alright, looking back over the nbd-client source I now see what's going
> on. You're calling "nbd-client -d" to manually disconnect?

The debian /etc/init.d/nbd-client script calls this on stopping stopping nbd. 
To make nbd working again after this oops we always need to reboot now (found 
this out after my first mail), so I'm really looking for an alternative way 
of stopping nbd. Would 'killall nbd-client' work?

>
> > Would you be willing to test a patch against 2.4.21?
>
> If you're willing to test the attached patch, I'd be grateful. Otherwise
> I'll test it in the next few days and forward on to Marcelo...

I will first test it at home. Unfortunality my laptop is in repair at IBM, so 
I only can use nbd via localhost.
If there is a way to prevent the reboot of the client, I can test it on monday 
on our cluster at work. 

Thanks a lot for your very fast help. Since we are using nbd to have a 
fallback server of our main server, we really need a working solution.


Thanks again and best regards,
	Bernd

-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 18:40     ` Bernd Schubert
@ 2003-08-07 18:45       ` Paul Clements
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Clements @ 2003-08-07 18:45 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-kernel

Bernd Schubert wrote:

> The debian /etc/init.d/nbd-client script calls this on stopping stopping nbd.
> To make nbd working again after this oops we always need to reboot now (found
> this out after my first mail), so I'm really looking for an alternative way
> of stopping nbd. Would 'killall nbd-client' work?

Yes, "killall -9 nbd-client" would work, and would avoid this problem.
This is how I generally stop nbd-client.


> If there is a way to prevent the reboot of the client, I can test it on monday
> on our cluster at work.

With the patch, you'll no longer see this oops or need to reboot, and
"nbd-client -d" will work as intended.

 
--
Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 17:34   ` Paul Clements
  2003-08-07 18:40     ` Bernd Schubert
@ 2003-08-07 22:25     ` Paul Clements
  2003-08-08 13:10       ` Bernd Schubert
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Clements @ 2003-08-07 22:25 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 665 bytes --]

Paul Clements wrote:
> 
> Paul Clements wrote:
> >
> > On Thu, 7 Aug 2003, Bernd Schubert wrote:
> >
> > > every time when nbd-client disconnects a nbd-device the decoded oops
> > > from below will happen.
> > > This only happens after we upgraded from 2.4.20 to 2.4.21,
> > > so I guess the backported update from 2.5.50 causes this.

[snip]

> > Would you be willing to test a patch against 2.4.21?
> 
> If you're willing to test the attached patch, I'd be grateful. Otherwise
> I'll test it in the next few days and forward on to Marcelo...

OK, the previous patch didn't quite do it. The attached should work (I
got a chance to test it, finally). 

Thanks,
Paul

[-- Attachment #2: nbd_sock_null_race_fix_2_4_21-2.diff --]
[-- Type: text/x-diff, Size: 1859 bytes --]

diff -up linux-2.4.21-PRISTINE/drivers/block/nbd.c linux-2.4.21/drivers/block/nbd.c
--- linux-2.4.21-PRISTINE/drivers/block/nbd.c	2003-06-13 10:51:32.000000000 -0400
+++ linux-2.4.21/drivers/block/nbd.c	2003-08-07 18:05:39.000000000 -0400
@@ -428,23 +428,24 @@ static int nbd_ioctl(struct inode *inode
                 return 0 ;
  
 	case NBD_CLEAR_SOCK:
+		error = 0;
+		down(&lo->tx_lock);
+		lo->sock = NULL;
+		up(&lo->tx_lock);
+		spin_lock(&lo->queue_lock);
+		file = lo->file;
+		lo->file = NULL;
+		spin_unlock(&lo->queue_lock);
 		nbd_clear_que(lo);
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
-			spin_unlock(&lo->queue_lock);
-			printk(KERN_ERR "nbd: Some requests are in progress -> can not turn off.\n");
-			return -EBUSY;
+			printk(KERN_ERR "nbd: disconnect: some requests are in progress -> please try again.\n");
+			error = -EBUSY;
 		}
-		file = lo->file;
-		if (!file) {
-			spin_unlock(&lo->queue_lock);
-			return -EINVAL;
-		}
-		lo->file = NULL;
-		lo->sock = NULL;
 		spin_unlock(&lo->queue_lock);
-		fput(file);
-		return 0;
+		if (file)
+			fput(file);
+		return error;
 	case NBD_SET_SOCK:
 		if (lo->file)
 			return -EBUSY;
@@ -491,9 +492,12 @@ static int nbd_ioctl(struct inode *inode
 		 * there should be a more generic interface rather than
 		 * calling socket ops directly here */
 		down(&lo->tx_lock);
-		printk(KERN_WARNING "nbd: shutting down socket\n");
-		lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
-		lo->sock = NULL;
+		if (lo->sock) {
+			printk(KERN_WARNING "nbd: shutting down socket\n");
+			lo->sock->ops->shutdown(lo->sock,
+				SEND_SHUTDOWN|RCV_SHUTDOWN);
+			lo->sock = NULL;
+		}
 		up(&lo->tx_lock);
 		spin_lock(&lo->queue_lock);
 		file = lo->file;
Common subdirectories: linux-2.4.21-PRISTINE/drivers/block/paride and linux-2.4.21/drivers/block/paride

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2.4.21]: nbd ksymoops-report
  2003-08-07 22:25     ` Paul Clements
@ 2003-08-08 13:10       ` Bernd Schubert
  0 siblings, 0 replies; 9+ messages in thread
From: Bernd Schubert @ 2003-08-08 13:10 UTC (permalink / raw)
  To: linux-kernel

On Friday 08 August 2003 00:25, you wrote:
> Paul Clements wrote:
> > Paul Clements wrote:
> > > On Thu, 7 Aug 2003, Bernd Schubert wrote:
> > > > every time when nbd-client disconnects a nbd-device the decoded oops
> > > > from below will happen.
> > > > This only happens after we upgraded from 2.4.20 to 2.4.21,
> > > > so I guess the backported update from 2.5.50 causes this.
>
> [snip]
>
> > > Would you be willing to test a patch against 2.4.21?
> >
> > If you're willing to test the attached patch, I'd be grateful. Otherwise
> > I'll test it in the next few days and forward on to Marcelo...
>
> OK, the previous patch didn't quite do it. The attached should work (I
> got a chance to test it, finally).

Hello Paul,

I just tested the patch and now 'nbd-client -d device' it works fine! When I'm 
back at work I will update our nbd-clients to the new module. (Now that you 
told me that 'kill -9 pid' even for the old module works, that won't be a 
problem.


Thanks a lot,
	Bernd

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-08-08 13:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-07 14:04 [2.4.21]: nbd ksymoops-report Bernd Schubert
2003-08-07 14:46 ` Lou Langholtz
2003-08-07 16:53 ` Paul Clements
2003-08-07 17:34   ` Paul Clements
2003-08-07 18:40     ` Bernd Schubert
2003-08-07 18:45       ` Paul Clements
2003-08-07 22:25     ` Paul Clements
2003-08-08 13:10       ` Bernd Schubert
2003-08-07 17:40   ` Lou Langholtz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).