All of lore.kernel.org
 help / color / mirror / Atom feed
* [OT] Lite5200B w/ nfs root hangs after some time
@ 2009-04-22 17:42 Albrecht Dreß
  2009-04-22 18:10 ` Wolfgang Denk
  2009-05-18 22:04 ` Jerome Walters
  0 siblings, 2 replies; 6+ messages in thread
From: Albrecht Dreß @ 2009-04-22 17:42 UTC (permalink / raw)
  To: Linux PPC Development

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

Hi,

this question is maybe off-topic on this list...

I use the Lite5200B board with stock kernel 2.6.29, and boot with the  
root fs on nfs on an Ubuntu 8.10 PC.  Both the Lite5200B and the Ubuntu  
PC are part of a corporate network.  On the root fs, I have busybox and  
everything else needed.  Being connected with minicom to the serial  
console, after some time I always see a message like

[ 4912.350855] nfs: server 10.16.10.29 not responding, still trying

an the the system is *completely* dead, i.e. it doesn't respond to  
<ctrl>-<c>, it doesn't (as a desktop pc sometimes does) print more than  
one of these or any other kernel messages, and it doesn't recover from  
this condition.  Apparently it doesn't matter if just the busybox shell  
is waiting or if an application is running.

Any idea what goes wrong here, and how I could fix it?

Thanks, Albrecht.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [OT] Lite5200B w/ nfs root hangs after some time
  2009-04-22 17:42 [OT] Lite5200B w/ nfs root hangs after some time Albrecht Dreß
@ 2009-04-22 18:10 ` Wolfgang Denk
  2009-04-22 23:35   ` Roy Siu
  2009-04-23  0:05   ` Grant Likely
  2009-05-18 22:04 ` Jerome Walters
  1 sibling, 2 replies; 6+ messages in thread
From: Wolfgang Denk @ 2009-04-22 18:10 UTC (permalink / raw)
  To: Albrecht Dreß; +Cc: Linux PPC Development

Dear Albrecht =?iso-8859-1?b?RHJl3w==?=,

In message <1240422181.5492.0@antares> you wrote:
>
> this question is maybe off-topic on this list...

This is not off topic (actually, if you had bothered to check the
mailing ist archives before posting, you would have known - because
you would have found previous discussions of this issue, including the
necessary fix).

> I use the Lite5200B board with stock kernel 2.6.29, and boot with the  

That's known to show this problem.

> [ 4912.350855] nfs: server 10.16.10.29 not responding, still trying

well known symptom.

> an the the system is *completely* dead, i.e. it doesn't respond to  

No, it is NOT completely dead, just extremely slow.

> Any idea what goes wrong here, and how I could fix it?

Apply this patch:

http://patchwork.ozlabs.org/patch/24487/

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
When all is said and done, more is said than done.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [OT] Lite5200B w/ nfs root hangs after some time
  2009-04-22 18:10 ` Wolfgang Denk
@ 2009-04-22 23:35   ` Roy Siu
  2009-04-23  0:05   ` Grant Likely
  1 sibling, 0 replies; 6+ messages in thread
From: Roy Siu @ 2009-04-22 23:35 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Albrecht Dreß, Linux PPC Development

good try for the expr., platform.

=A6b 2009=A6~4=A4=EB23=A4=E9 =A4W=A4=C82:10 =AE=C9=A1A Wolfgang Denk =
=BCg=A8=EC=A1G

> Dear Albrecht =3D?iso-8859-1?b?RHJl3w=3D=3D?=3D,
>
> In message <1240422181.5492.0@antares> you wrote:
>>
>> this question is maybe off-topic on this list...
>
> This is not off topic (actually, if you had bothered to check the
> mailing ist archives before posting, you would have known - because
> you would have found previous discussions of this issue, including the
> necessary fix).
>
>> I use the Lite5200B board with stock kernel 2.6.29, and boot with the
>
> That's known to show this problem.
>
>> [ 4912.350855] nfs: server 10.16.10.29 not responding, still trying
>
> well known symptom.
>
>> an the the system is *completely* dead, i.e. it doesn't respond to
>
> No, it is NOT completely dead, just extremely slow.
>
>> Any idea what goes wrong here, and how I could fix it?
>
> Apply this patch:
>
> http://patchwork.ozlabs.org/patch/24487/
>
> Best regards,
>
> Wolfgang Denk
>
> --=20
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> When all is said and done, more is said than done.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [OT] Lite5200B w/ nfs root hangs after some time
  2009-04-22 18:10 ` Wolfgang Denk
  2009-04-22 23:35   ` Roy Siu
@ 2009-04-23  0:05   ` Grant Likely
  2009-04-23 17:23     ` Albrecht Dreß
  1 sibling, 1 reply; 6+ messages in thread
From: Grant Likely @ 2009-04-23  0:05 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Albrecht Dreß, Linux PPC Development

On Wed, Apr 22, 2009 at 12:10 PM, Wolfgang Denk <wd@denx.de> wrote:
> In message <1240422181.5492.0@antares> you wrote:
>> Any idea what goes wrong here, and how I could fix it?
>
> Apply this patch:
>
> http://patchwork.ozlabs.org/patch/24487/

No, that's not the problem in this case.  2.6.29 networking is broken
for a lot of platforms, not just mpc5200.  Use 2.6.29.1 instead.

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [OT] Lite5200B w/ nfs root hangs after some time
  2009-04-23  0:05   ` Grant Likely
@ 2009-04-23 17:23     ` Albrecht Dreß
  0 siblings, 0 replies; 6+ messages in thread
From: Albrecht Dreß @ 2009-04-23 17:23 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 392 bytes --]

Am 23.04.09 02:05 schrieb(en) Grant Likely:
>> http://patchwork.ozlabs.org/patch/24487/

That patch is already included in 2.6.29...

> No, that's not the problem in this case.  2.6.29 networking is broken  
> for a lot of platforms, not just mpc5200.  Use 2.6.29.1 instead.

...and with 2.6.29.1, everything works flawlessly again!

Thanks a lot for your help,
Cheers, Albrecht.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [OT] Lite5200B w/ nfs root hangs after some time
  2009-04-22 17:42 [OT] Lite5200B w/ nfs root hangs after some time Albrecht Dreß
  2009-04-22 18:10 ` Wolfgang Denk
@ 2009-05-18 22:04 ` Jerome Walters
  1 sibling, 0 replies; 6+ messages in thread
From: Jerome Walters @ 2009-05-18 22:04 UTC (permalink / raw)
  To: linuxppc-dev


We experience exactly the same problem. Our client is Debian Testing
(_Squeeze_) x86 =E2=80=93 diskless node which uses nfsroot and boots from t=
he server
also Debian Testing (_Squeeze_) x86. While the client hang the server is
responding to everyone else's requests. Restarting the nfsd on the server
doesn't solve the problem.

At first I wasnt able to capture debug information on the client side since
/var/log was mounted over the nfs, so I have installed a hard drive where I
mounted only /var/log to be able to capture debug logs from the client as
well.


Debug Logs:=20
http://fixity.net/tmp/client.log.gz - Kernel RPC Debug Log from the client
http://fixity.net/tmp/server.log.gz - Kernel RPC Debug Log from the server


How reproducible:
Happens from 10 to 90 minutes after booting the diskless node.


Actual results:
NFS connections stop responding, system hangs or becomes very slow and
unresponsive (it doesnt respond to Ctrl+Alt+Del as well). 60 to 90 minutes
after the first server time out client says server OK but the client is
still
unresponsive. Immediately after that the client logs server connection loss
again which leads to continues loop. Client is still unresponsive. Sometime=
s
client resumes normal operation for couple of hours but then the problem
repeats.


Connectivity info:=20
Both the client and the server are connected to Gigabit Ethernet Cisco Metr=
o
series managable switch. Both of them use Intel Pro 82545GM Gigabit Etherne=
t
Server Controllers. Neither one of them log any Ethernet errors and none ar=
e
logged by the switch.


Client & Server Load:
For the purposes of testing both machines were only running needed daemons
and
weren't loaded at all.


Client & Server Kernel:
On both the client and server custom compiled linux 2.6.29.3 kernel was
used.
Configuration file @ http://fixity.net/tmp/config-2.6.29.3.gz


Client & Server Network interface fragmented packet queue length:
net.ipv4.ipfrag_high_thresh =3D 524288
net.ipv4.ipfrag_low_thresh =3D 393216


Client Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1


Client Mount (cat /proc/mounts | grep nfsroot):
10.11.11.1:/nfsroot / nfs
rw,vers=3D3,rsize=3D524288,wsize=3D524288,namlen=3D255,hard,nointr,nolock,p=
roto=3Dtcp,timeo=3D7,retrans=3D10,sec=3Dsys,addr=3D10.11.11.1
0 0


Client fstab:
proc            /proc           proc    defaults        0       0
/dev/nfs        /               nfs     defaults        1       1
none            /tmp            tmpfs   defaults        0       0
none            /var/run        tmpfs   defaults        0       0
none            /var/lock       tmpfs   defaults        0       0
none            /var/tmp        tmpfs   defaults        0       0


Client Daemons:
portmap, rpc.statd, rpc.idmapd


Server Daemons:
portmap, rpc.statd, rpc.idmapd, rpc.mountd --manage-gids


Server Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1
nfs-kernel-server/testing uptodate 1:1.1.4-1


Server Export:
/nfsroot 10.11.11.*(rw,no_root_squash,async,no_subtree_check)


Server Options:
RPCNFSDCOUNT=3D16
RPCNFSDPRIORITY=3D0
RPCMOUNTDOPTS=3D--manage-gids
NEED_SVCGSSD=3Dno
RPCSVCGSSDOPTS=3Dno


Additional Info:
Since I have read that tweaking the nfsroot mount options could improve the=
=20
situation a have tested with different options as follows:
rsize/wsize=3D1024|2048|4096|8192|32768|524288
timeo=3D7|15|60|600
retrans=3D3|10|20
None resulted in solving the problem.


I have also tested with the following version on the client and server end
without any difference in the behaviour:
libnfsidmap2/testing uptodate 0.21-2
nfs-common 1:1.1.6-1 newer than version in archive
nfs-kernel-server 1:1.1.6-1 newer than version in archive



Any help or suggestions on fixing the problem would be highly appreciated. =
I=20
have been messing with that problem for the last couple of weeks and ran ou=
t
of ideas.



Best Regards,
Jerome Walters =20
--=20
View this message in context: http://www.nabble.com/-OT--Lite5200B-w--nfs-r=
oot-hangs-after-some-time-tp23181953p23606187.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-05-18 22:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22 17:42 [OT] Lite5200B w/ nfs root hangs after some time Albrecht Dreß
2009-04-22 18:10 ` Wolfgang Denk
2009-04-22 23:35   ` Roy Siu
2009-04-23  0:05   ` Grant Likely
2009-04-23 17:23     ` Albrecht Dreß
2009-05-18 22:04 ` Jerome Walters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.