linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Strange lockup with 2.6.0
@ 2004-01-09 14:39 Wakko Warner
  2004-01-09 15:18 ` Guennadi Liakhovetski
  0 siblings, 1 reply; 7+ messages in thread
From: Wakko Warner @ 2004-01-09 14:39 UTC (permalink / raw)
  To: linux-kernel

I usually do a backup of each filesystem simply using tar.  I attempted to
backup a machine I had that's running 2.6.0 and it hard locked.

The destination is over NFS to a server also running 2.6.0 (other machines
have performed the backup to this server w/o problems).  This server is
using KNFSD with v3 enabled.

I first thought the problem could be ACPI or Preemt so I disabled both of
these.  I thought it may be a module conflicting so I booted with
init=/bin/sh.  In the kernel, I have IDE support (intel piix, no HDD support
all ide devices are cds)  and aic7xxx.  My next thought was the NIC.  I was
using a 3c990 card so I swapped it with a 3c905c card.  No change.  I
thought it could be an over loaded power supply so I removed all drives
from the power supply except the boot drive.  No change.  I removed all
irrelevent hardware (leaving only the 3c905c nic and the aha39160 card).  I
also swapped memory around (3 128mb sticks) leaving only 1 stick in the
machine.  Still no change. (Only modules loaded was 3c59x, nfs, lockd, and
sunrpc)  I have NFSv3 snabled in NFS, but not v4.

I simply gave up trying to backup the machine to nfs.  I have a JAZ drive
installed on this machine (external) and decided to backup to the JAZ.  I
powered off the machine, booted with init=/bin/sh and performed the exact
same steps (excluding configuring the NIC and mounting NFS) that caused the
freeze.  This time it completed the entire backup.  No modules were loaded
at this time.

Hardware:
MS6163 System board
Pentium III 700mhz
3 128mb sticks memory (2 are same brand 2 sided, other is different 1 sided)
Adaptec 39160
3com 3c905c (eth0)
3com 3c990  (eth0, not installed at same time as 3c905c)
3com 3c900  (eth1)
Generic 56k ISA PNP modem
Belkin USB2.0 5 port
Ensonic ES1373 Sound card
Matrox G400Max dual AGP (was a G200 AGP)


Relevent kernel config available upon request.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 14:39 Strange lockup with 2.6.0 Wakko Warner
@ 2004-01-09 15:18 ` Guennadi Liakhovetski
  2004-01-09 15:49   ` Wakko Warner
  0 siblings, 1 reply; 7+ messages in thread
From: Guennadi Liakhovetski @ 2004-01-09 15:18 UTC (permalink / raw)
  To: Wakko Warner; +Cc: linux-kernel

On Fri, 9 Jan 2004, Wakko Warner wrote:

> I usually do a backup of each filesystem simply using tar.  I attempted to
> backup a machine I had that's running 2.6.0 and it hard locked.

Are sysrq-keys enabled? If so, could you catch the tar backtrace during
the lock-up (ALT-SysRq-t)? What was the latest kernel-version that worked?
Can you just try to write some data over NFS? Would it lock if you write 1
byte or 1K or 1M? Does it lock immediately as you start the backup or
after some time (you could start some process in the background
periodically printing some info on the terminal, like vmstat, cat
/proc/interrupts, free, tcpdump on both ends to a file...) Can you try NFS
over TCP? Are other machines, where backup works, also running 2.6,
10/100mbps?

Guennadi
---
Guennadi Liakhovetski



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 15:18 ` Guennadi Liakhovetski
@ 2004-01-09 15:49   ` Wakko Warner
  2004-01-09 16:02     ` Guennadi Liakhovetski
  0 siblings, 1 reply; 7+ messages in thread
From: Wakko Warner @ 2004-01-09 15:49 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel

Guennadi Liakhovetski wrote:
> On Fri, 9 Jan 2004, Wakko Warner wrote:
> 
> > I usually do a backup of each filesystem simply using tar.  I attempted to
> > backup a machine I had that's running 2.6.0 and it hard locked.
> 
> Are sysrq-keys enabled? If so, could you catch the tar backtrace during
> the lock-up (ALT-SysRq-t)? What was the latest kernel-version that worked?

Yes, but the machine hard locks.  sysrq does not work.  I have a small
utility I wrote that will set the state of the parport (I used this to tell
if it locks up) using outb to the port (This does not effect it in anyway,
it will lockup w/o it running)

This is also the first time I backed up this machine.  2.6.0 is the first
kernel I installed on it.  I can test 2.4.23 later.

> Can you just try to write some data over NFS? Would it lock if you write 1

I am constantly accessing NFS with this machine.  Read and write.  It was
only when I backed it up with tar.  In the event it doesn't lock, tar
crashes w/o error/warning (over NFS).

> byte or 1K or 1M? Does it lock immediately as you start the backup or

It locks up usually at one point, but not always.

> after some time (you could start some process in the background
> periodically printing some info on the terminal, like vmstat, cat
> /proc/interrupts, free, tcpdump on both ends to a file...) Can you try NFS

I can do this I think.  It's fun when running with init being bash.  It will
take some time to do since I can't scroll backwards.

> over TCP? Are other machines, where backup works, also running 2.6,

I can try TCP, but I'm not sure about the server accepting TCP (was there a
compile time option for NFSD to use TCP?)  These 2 machines are the only
ones I have on 2.6.

> 10/100mbps?

100 FD always.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 15:49   ` Wakko Warner
@ 2004-01-09 16:02     ` Guennadi Liakhovetski
  2004-01-09 16:45       ` Wakko Warner
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Guennadi Liakhovetski @ 2004-01-09 16:02 UTC (permalink / raw)
  To: Wakko Warner; +Cc: linux-kernel

On Fri, 9 Jan 2004, Wakko Warner wrote:

> Guennadi Liakhovetski wrote:
> > On Fri, 9 Jan 2004, Wakko Warner wrote:
> >
> > > I usually do a backup of each filesystem simply using tar.  I attempted to
> > > backup a machine I had that's running 2.6.0 and it hard locked.
> >
> > Are sysrq-keys enabled? If so, could you catch the tar backtrace during
> > the lock-up (ALT-SysRq-t)? What was the latest kernel-version that worked?
>
> Yes, but the machine hard locks.  sysrq does not work.  I have a small

__THAT__ hard...:-)

> utility I wrote that will set the state of the parport (I used this to tell
> if it locks up) using outb to the port (This does not effect it in anyway,
> it will lockup w/o it running)

You mean it just toggles a bit periodically?

> > Can you just try to write some data over NFS? Would it lock if you write 1
>
> I am constantly accessing NFS with this machine.  Read and write.  It was

How much data at one go (max)?

> only when I backed it up with tar.  In the event it doesn't lock, tar
> crashes w/o error/warning (over NFS).

So, it locks not always?

> > byte or 1K or 1M? Does it lock immediately as you start the backup or
>
> It locks up usually at one point, but not always.

Since you could backup to Jazz, looks like your filesystem is ok, NFS also
works in principle...

> > after some time (you could start some process in the background
> > periodically printing some info on the terminal, like vmstat, cat
> > /proc/interrupts, free, tcpdump on both ends to a file...) Can you try NFS
>
> I can do this I think.  It's fun when running with init being bash.  It will
> take some time to do since I can't scroll backwards.

You could also attach a serial console and direct the output there (then
you also can scroll).

> > over TCP? Are other machines, where backup works, also running 2.6,
>
> I can try TCP, but I'm not sure about the server accepting TCP (was there a
> compile time option for NFSD to use TCP?)  These 2 machines are the only

Yes.

> ones I have on 2.6.
>
> > 10/100mbps?
>
> 100 FD always.

Why I am interested in your experiences is that I also have a problem
transferring large (several M) files over NFS when the server is 2.6 and
both ends have 100 FD. (You can see my posts this week about 2.6 NFS.) And
in my case it TCP fixed it. But I never had hard-locks, just cp hanged in
D, and tcpdump showed timed out reassembly on the receiving side. But I
was reading from the server.

Guennadi
---
Guennadi Liakhovetski



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 16:02     ` Guennadi Liakhovetski
@ 2004-01-09 16:45       ` Wakko Warner
  2004-01-10  1:06       ` Wakko Warner
  2004-01-10  3:17       ` Wakko Warner
  2 siblings, 0 replies; 7+ messages in thread
From: Wakko Warner @ 2004-01-09 16:45 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel

> > > > I usually do a backup of each filesystem simply using tar.  I attempted to
> > > > backup a machine I had that's running 2.6.0 and it hard locked.
> > >
> > > Are sysrq-keys enabled? If so, could you catch the tar backtrace during
> > > the lock-up (ALT-SysRq-t)? What was the latest kernel-version that worked?
> >
> > Yes, but the machine hard locks.  sysrq does not work.  I have a small
> 
> __THAT__ hard...:-)

Yup.  That hard.

> > utility I wrote that will set the state of the parport (I used this to tell
> > if it locks up) using outb to the port (This does not effect it in anyway,
> > it will lockup w/o it running)
> 
> You mean it just toggles a bit periodically?

I have a set of LEDs attached to the parport (12) and this program writes to
it in a way that makes it bounce the 'on' led every .25 seconds  I'll send
you the program if you're interested.

> > > Can you just try to write some data over NFS? Would it lock if you write 1
> >
> > I am constantly accessing NFS with this machine.  Read and write.  It was
> 
> How much data at one go (max)?

Dunno.  I've never given   it that much thought.  I have the completed
backup on the jaz.  I can attempt to dump it to the server to see if that
makes a difference.

> > only when I backed it up with tar.  In the event it doesn't lock, tar
> > crashes w/o error/warning (over NFS).
> 
> So, it locks not always?

Most of the time, yes it does.  I'd say 90% of the time it hard locks.  If
it doesn't and I attempt it again it always hard locks (except one time I
did it).  I've done the tests numberous times.

> > > byte or 1K or 1M? Does it lock immediately as you start the backup or
> >
> > It locks up usually at one point, but not always.
> 
> Since you could backup to Jazz, looks like your filesystem is ok, NFS also
> works in principle...

Before one test, I did: cp /dev/sda /dev/null
to see if it has any problems with the disk.  It was fine.

> > > after some time (you could start some process in the background
> > > periodically printing some info on the terminal, like vmstat, cat
> > > /proc/interrupts, free, tcpdump on both ends to a file...) Can you try NFS
> >
> > I can do this I think.  It's fun when running with init being bash.  It will
> > take some time to do since I can't scroll backwards.
> 
> You could also attach a serial console and direct the output there (then
> you also can scroll).

I thought about this.  Hopefully compiling in serial doesn't add another
variable to this.  I currently have serial compiled as a module.

> > > over TCP? Are other machines, where backup works, also running 2.6,
> >
> > I can try TCP, but I'm not sure about the server accepting TCP (was there a
> > compile time option for NFSD to use TCP?)  These 2 machines are the only
> 
> Yes.

I did not compile the server with TCP support.

> > ones I have on 2.6.
> >
> > > 10/100mbps?
> >
> > 100 FD always.
> 
> Why I am interested in your experiences is that I also have a problem
> transferring large (several M) files over NFS when the server is 2.6 and
> both ends have 100 FD. (You can see my posts this week about 2.6 NFS.) And
> in my case it TCP fixed it. But I never had hard-locks, just cp hanged in
> D, and tcpdump showed timed out reassembly on the receiving side. But I
> was reading from the server.

That's interesting.  I hope it doesn't matter if the server is a diskless
machine.  Interesting you mention the server being 2.6.  The NFS I did above
was to a different (also diskless) server.  The 2.6 one I threw a hard disk
on so I could do backups of all my machines (and w/o shutting another down). 
Out of the 5 machines on this network, only 2 have usable IDE ports (one has
none, one's a laptop, one is full of cdroms which is the machine that's
hanging on me)

On a side note, I have a 2.4.x (x>=20) using knfsd and nohide on directories.
A 2.4.x client can see those contents, a 2.6.x client can't w/o mounting each
individually.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 16:02     ` Guennadi Liakhovetski
  2004-01-09 16:45       ` Wakko Warner
@ 2004-01-10  1:06       ` Wakko Warner
  2004-01-10  3:17       ` Wakko Warner
  2 siblings, 0 replies; 7+ messages in thread
From: Wakko Warner @ 2004-01-10  1:06 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel

> > I am constantly accessing NFS with this machine.  Read and write.  It was
> 
> How much data at one go (max)?

I just dumped the backup I made from jaz to the nfs.  I found out that some
things didn't get backed up.  I did multiple backups and one file was larger
than the last (for the same filesystem).

Once I copyied to nfs (which did *NOT* crash it), I ran md5sum on both nfs
copy and jaz copy.  both were exact same.  then I copyied from nfs to nfs.

The size was about 350mb.  (Quite surprised about the jaz drive performance
=)

> > only when I backed it up with tar.  In the event it doesn't lock, tar
> > crashes w/o error/warning (over NFS).
> 
> So, it locks not always?

In the above case, still was booted with init=/bin/sh and did not lockup.  I
did several tar backups.  Sometimes I got a segmentation fault and killed
tar, other times I got my shell killed.

I have not tried enabling TCP yet.  I'm going to try a 2.4 kernel soon.  (I
want to stay with 2.6 since I have a DVD+RW drive)

> > > byte or 1K or 1M? Does it lock immediately as you start the backup or
> >
> > It locks up usually at one point, but not always.
> 
> Since you could backup to Jazz, looks like your filesystem is ok, NFS also
> works in principle...

As stated above, one of the filesystems did not completely backup.

> > > after some time (you could start some process in the background
> > > periodically printing some info on the terminal, like vmstat, cat
> > > /proc/interrupts, free, tcpdump on both ends to a file...) Can you try NFS
> >
> > I can do this I think.  It's fun when running with init being bash.  It will
> > take some time to do since I can't scroll backwards.
> 
> You could also attach a serial console and direct the output there (then
> you also can scroll).

I have not retrieved this yet.

> > > 10/100mbps?
> >
> > 100 FD always.
> 
> Why I am interested in your experiences is that I also have a problem
> transferring large (several M) files over NFS when the server is 2.6 and
> both ends have 100 FD. (You can see my posts this week about 2.6 NFS.) And
> in my case it TCP fixed it. But I never had hard-locks, just cp hanged in
> D, and tcpdump showed timed out reassembly on the receiving side. But I
> was reading from the server.

I have done several gig of transfers to the 2.4 server.  I was burning a
bunch of data from nfs to dvd+r.

In the tests I did above, I ran dmesg several times, Not once did I see an
oops.  I'm not sure, I may have a hardware problem (It's going to be
replaced soon anyway)

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Strange lockup with 2.6.0
  2004-01-09 16:02     ` Guennadi Liakhovetski
  2004-01-09 16:45       ` Wakko Warner
  2004-01-10  1:06       ` Wakko Warner
@ 2004-01-10  3:17       ` Wakko Warner
  2 siblings, 0 replies; 7+ messages in thread
From: Wakko Warner @ 2004-01-10  3:17 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-kernel

> > > Are sysrq-keys enabled? If so, could you catch the tar backtrace during
> > > the lock-up (ALT-SysRq-t)? What was the latest kernel-version that worked?
> >
> > Yes, but the machine hard locks.  sysrq does not work.  I have a small
> 
> __THAT__ hard...:-)

I'm pretty sure I found the problem.  (well, it hasn't locked yet).  The CPU
voltage was set at 2.00v in the bios, now it's at 1.65v and I'm not having
any lockups.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-01-10  3:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-09 14:39 Strange lockup with 2.6.0 Wakko Warner
2004-01-09 15:18 ` Guennadi Liakhovetski
2004-01-09 15:49   ` Wakko Warner
2004-01-09 16:02     ` Guennadi Liakhovetski
2004-01-09 16:45       ` Wakko Warner
2004-01-10  1:06       ` Wakko Warner
2004-01-10  3:17       ` Wakko Warner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).