linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Crash when unmounting NFS/TCP with -f
@ 2005-04-22 12:32 Brice Goglin
  2005-04-22 13:39 ` Trond Myklebust
  2005-04-22 18:43 ` Bill Davidsen
  0 siblings, 2 replies; 6+ messages in thread
From: Brice Goglin @ 2005-04-22 12:32 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-kernel

Hi Trond,

I'm using NFS (v2) over TCP (in a SSH tunnel).
Each time the SSH dies before a umount NFS, I have to umount -f
and I get a crash (only sysrq works).
Actually, the crash occurs a few seconds after umount -f.

It seems that killing SSH by hand does _not_ lead to crash.
But a long network failure does.
I remember seeing this bug several times with all stable releases
from 2.6.7 to 2.6.11. I didn't try with earlier versions.

I didn't see anything in the logs (after reboot). But I can't be sure
there was nothing in dmesg since I didn't get a chance to chvt 1 and
see console messages before rebooting (with sysrq).

Do you have any idea how to debug this ?

Thanks,
Brice

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash when unmounting NFS/TCP with -f
  2005-04-22 12:32 Crash when unmounting NFS/TCP with -f Brice Goglin
@ 2005-04-22 13:39 ` Trond Myklebust
  2005-04-22 18:43 ` Bill Davidsen
  1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2005-04-22 13:39 UTC (permalink / raw)
  To: Brice Goglin; +Cc: linux-kernel

fr den 22.04.2005 Klokka 14:32 (+0200) skreiv Brice Goglin:
> Hi Trond,
> 
> I'm using NFS (v2) over TCP (in a SSH tunnel).
> Each time the SSH dies before a umount NFS, I have to umount -f
> and I get a crash (only sysrq works).
> Actually, the crash occurs a few seconds after umount -f.
> 
> It seems that killing SSH by hand does _not_ lead to crash.
> But a long network failure does.
> I remember seeing this bug several times with all stable releases
> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
> 
> I didn't see anything in the logs (after reboot). But I can't be sure
> there was nothing in dmesg since I didn't get a chance to chvt 1 and
> see console messages before rebooting (with sysrq).

I'll try to reproduce. There has just been a discussion about "umount
-f" on the NFS mailing list (nfs@lists.sourceforge.net), where Peter
Cendio said he was seeing the following Oops:

  http://www.cendio.se/~peter/fc3-umount-crash.png

I am unable to reproduce Peter's crash, but I didn't try the scenario
that you describe above.

Cheers,
  Trond
-- 
Trond Myklebust <trond.myklebust@fys.uio.no>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash when unmounting NFS/TCP with -f
  2005-04-22 12:32 Crash when unmounting NFS/TCP with -f Brice Goglin
  2005-04-22 13:39 ` Trond Myklebust
@ 2005-04-22 18:43 ` Bill Davidsen
  2005-04-26  7:51   ` Brice Goglin
  1 sibling, 1 reply; 6+ messages in thread
From: Bill Davidsen @ 2005-04-22 18:43 UTC (permalink / raw)
  To: Brice Goglin; +Cc: trond.myklebust, linux-kernel

Brice Goglin wrote:
> Hi Trond,
> 
> I'm using NFS (v2) over TCP (in a SSH tunnel).
> Each time the SSH dies before a umount NFS, I have to umount -f
> and I get a crash (only sysrq works).
> Actually, the crash occurs a few seconds after umount -f.
> 
> It seems that killing SSH by hand does _not_ lead to crash.
> But a long network failure does.
> I remember seeing this bug several times with all stable releases
> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
> 
> I didn't see anything in the logs (after reboot). But I can't be sure
> there was nothing in dmesg since I didn't get a chance to chvt 1 and
> see console messages before rebooting (with sysrq).
> 
> Do you have any idea how to debug this ?

No clue, but a question: is this a hard or soft mount? Could you post 
your ssh and mount commands, munged as needed for security? That might 
give someone a clue.

I did this "back when" but I don't recall having a problem with it.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash when unmounting NFS/TCP with -f
  2005-04-22 18:43 ` Bill Davidsen
@ 2005-04-26  7:51   ` Brice Goglin
  2005-05-05 10:17     ` Brice Goglin
  0 siblings, 1 reply; 6+ messages in thread
From: Brice Goglin @ 2005-04-26  7:51 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: trond.myklebust, linux-kernel

Bill Davidsen a écrit :
> Brice Goglin wrote:
> 
>> Hi Trond,
>>
>> I'm using NFS (v2) over TCP (in a SSH tunnel).
>> Each time the SSH dies before a umount NFS, I have to umount -f
>> and I get a crash (only sysrq works).
>> Actually, the crash occurs a few seconds after umount -f.
>>
>> It seems that killing SSH by hand does _not_ lead to crash.
>> But a long network failure does.
>> I remember seeing this bug several times with all stable releases
>> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
>>
>> I didn't see anything in the logs (after reboot). But I can't be sure
>> there was nothing in dmesg since I didn't get a chance to chvt 1 and
>> see console messages before rebooting (with sysrq).
>>
>> Do you have any idea how to debug this ?
> 
> 
> No clue, but a question: is this a hard or soft mount? Could you post 
> your ssh and mount commands, munged as needed for security? That might 
> give someone a clue.

The ssh command is just
$ ssh kwad -L 2249:localhost:2049 -L 2248:localhost:870 -N -f
(port is forwarded to 2249 while mountport if forwarded to 2248)

Options is /proc/mounts are
rw,v2,rsize=8192,wsize=8192,hard,tcp,nolock,addr=localhost

I just had another network failure. I ran umount -f from vt1 to see
kernel message. I waited for about 1 minute but didn't get any crash.
So I switched back to X... and got the crash then.
Looks like this crash doesn't want me to see any message...

Brice

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash when unmounting NFS/TCP with -f
  2005-04-26  7:51   ` Brice Goglin
@ 2005-05-05 10:17     ` Brice Goglin
  2005-05-05 11:58       ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Brice Goglin @ 2005-05-05 10:17 UTC (permalink / raw)
  To: Bill Davidsen, trond.myklebust; +Cc: linux-kernel

Brice Goglin a écrit :
> The ssh command is just
> $ ssh kwad -L 2249:localhost:2049 -L 2248:localhost:870 -N -f
> (port is forwarded to 2249 while mountport if forwarded to 2248)
> 
> Options is /proc/mounts are
> rw,v2,rsize=8192,wsize=8192,hard,tcp,nolock,addr=localhost
> 
> I just had another network failure. I ran umount -f from vt1 to see
> kernel message. I waited for about 1 minute but didn't get any crash.
> So I switched back to X... and got the crash then.
> Looks like this crash doesn't want me to see any message...

I just got it through netconsole.
Unfortunatelly, the call trace doesn't appear.
Maybe the netconsole didn't have time send it before crashing.
Hope this helps.

Brice


RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
Unable to handle kernel paging request at virtual address ffffff98
 printing eip:
e0aaa07a
*pde = 00002067
*pte = 00000000
Oops: 0002 [#1]
PREEMPT

Modules linked in: netconsole sd_mod usb_storage vfat fat loop isofs
zlib_inflate nls_cp850 nls_iso8859_15 smbfs nfs lockd sunrpc i915 tun
ipt_MASQUERADE iptable_nat ipt_state ip_conntrack iptable_filter
ip_tables floppy uhci_hcd ehci_hcd dm_mod snd_intel8x0 snd_ac97_codec

CPU:    0
EIP:    0060:[<e0aaa07a>]    Not tainted VLI
EFLAGS: 00010297   (2.6.11=Macvin)
EIP is at rpc_wake_up_status+0x6a/0x80 [sunrpc]
eax: ffffff84   ebx: d0065888   ecx: 00000001   edx: c146e000
esi: fffffffb   edi: d0065888   ebp: d0065800   esp: c146ef14
ds: 007b   es: 007b   ss: 0068
Process events/0 (pid: 3, threadinfo=c146e000 task=c1473020)
Stack: c146ef44 d0065800 00000283 fffffffb e0aa710e d0065888 fffffffb
00120dcb c1473184 00000000 d0065904

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Crash when unmounting NFS/TCP with -f
  2005-05-05 10:17     ` Brice Goglin
@ 2005-05-05 11:58       ` Trond Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2005-05-05 11:58 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Bill Davidsen, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]

to den 05.05.2005 Klokka 12:17 (+0200) skreiv Brice Goglin:

> Unable to handle kernel paging request at virtual address ffffff98
>  printing eip:
> e0aaa07a
> *pde = 00002067
> *pte = 00000000
> Oops: 0002 [#1]
> PREEMPT
> 
> Modules linked in: netconsole sd_mod usb_storage vfat fat loop isofs
> zlib_inflate nls_cp850 nls_iso8859_15 smbfs nfs lockd sunrpc i915 tun
> ipt_MASQUERADE iptable_nat ipt_state ip_conntrack iptable_filter
> ip_tables floppy uhci_hcd ehci_hcd dm_mod snd_intel8x0 snd_ac97_codec
> 
> CPU:    0
> EIP:    0060:[<e0aaa07a>]    Not tainted VLI
> EFLAGS: 00010297   (2.6.11=Macvin)
> EIP is at rpc_wake_up_status+0x6a/0x80 [sunrpc]
> eax: ffffff84   ebx: d0065888   ecx: 00000001   edx: c146e000
> esi: fffffffb   edi: d0065888   ebp: d0065800   esp: c146ef14
> ds: 007b   es: 007b   ss: 0068
> Process events/0 (pid: 3, threadinfo=c146e000 task=c1473020)
> Stack: c146ef44 d0065800 00000283 fffffffb e0aa710e d0065888 fffffffb
> 00120dcb c1473184 00000000 d0065904

Have you tried the attached patch? Andrew has already included it in the
-mm series.

Cheers,
  Trond

[-- Attachment #2: Vedlagt melding - [PATCH 2/2] RPC: kick off socket connect operations faster --]
[-- Type: message/rfc822, Size: 3288 bytes --]

From: Chuck Lever <cel@citi.umich.edu>
To: trond.myklebust@fys.uio.no
Subject: [PATCH 2/2] RPC: kick off socket connect operations faster
Date: Fri, 29 Apr 2005 15:46:04 -0400
Message-ID: <200504291946.j3TJk4qo009300@climax.citi.umich.edu>

 Make the socket transport kick the event queue to start socket connects
 immediately.  This should improve responsiveness of applications that are
 sensitive to slow mount operations (like automounters).

 We are now also careful to cancel the connect worker before destroying
 the xprt.  This eliminates a race where xprt_destroy can finish before
 the connect worker is even allowed to run.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  Hard-code impossibly small connect timeout.

 Version: Fri, 29 Apr 2005 15:32:01 -0400
 
 Signed-off-by: Chuck Lever <cel@netapp.com>
---
 
 net/sunrpc/xprt.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletion(-)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 10-rpc-reconnect/net/sunrpc/xprt.c 11-xprt-flush-connects/net/sunrpc/xprt.c
--- 10-rpc-reconnect/net/sunrpc/xprt.c	2005-04-29 15:18:47.677108000 -0400
+++ 11-xprt-flush-connects/net/sunrpc/xprt.c	2005-04-29 15:29:36.637250000 -0400
@@ -569,8 +569,11 @@ void xprt_connect(struct rpc_task *task)
 		if (xprt->sock != NULL)
 			schedule_delayed_work(&xprt->sock_connect,
 					RPC_REESTABLISH_TIMEOUT);
-		else
+		else {
 			schedule_work(&xprt->sock_connect);
+			if (!RPC_IS_ASYNC(task))
+				flush_scheduled_work();
+		}
 	}
 	return;
  out_write:
@@ -1666,6 +1669,10 @@ xprt_shutdown(struct rpc_xprt *xprt)
 	rpc_wake_up(&xprt->backlog);
 	wake_up(&xprt->cong_wait);
 	del_timer_sync(&xprt->timer);
+
+	/* synchronously wait for connect worker to finish */
+	cancel_delayed_work(&xprt->sock_connect);
+	flush_scheduled_work();
 }
 
 /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-05-05 11:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-22 12:32 Crash when unmounting NFS/TCP with -f Brice Goglin
2005-04-22 13:39 ` Trond Myklebust
2005-04-22 18:43 ` Bill Davidsen
2005-04-26  7:51   ` Brice Goglin
2005-05-05 10:17     ` Brice Goglin
2005-05-05 11:58       ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).