All of lore.kernel.org
 help / color / mirror / Atom feed
* USB misbehavior causes system hang
@ 2007-02-27 14:06 Eric Buddington
  2007-02-27 14:15 ` Oliver Neukum
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Buddington @ 2007-02-27 14:06 UTC (permalink / raw)
  To: linux-kernel

2.6.20-mm2 #1 Mon Feb 26 13:16:04 EST 2007 i686 unknown

I have an external USB drive (WD MyBook 5000YS), which I use for backups.

When I try to back up to it, it works for a while, but inevitably
starts resetting like mad, gives I/O errors, and then (here's the
problem), the softdog module reboots the system.

---------------------------
scsi1 : SCSI emulation for USB Mass Storage devices
input: Western Digital External HDD as /class/input/input8
input: USB HID v1.11 Device [Western Digital External HDD] on usb-0000:00:03.2-6
.2
scsi 1:0:0:0: Direct-Access     WD       5000YS External  106a PQ: 0 ANSI: 4
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: assuming drive cache: write through
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: assuming drive cache: write through
 sda: sda1 sda2
sd 1:0:0:0: Attached scsi disk sda
sd 1:0:0:0: Attached scsi generic sg0 type 0
reiser4[pdflush(194)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]
:
NOTICE: md1 does not support write barriers, using synchronous write instead.
reiser4: sda2: found disk format 4.0.0.

---- (works fine here for a while, then:) ----------

usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
usb 1-6.2: device descriptor read/64, error -110
usb 1-6.2: device descriptor read/64, error -110
usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
usb 1-6.2: device descriptor read/64, error -110
usb 1-6.2: device descriptor read/64, error -110
usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
usb 1-6.2: device descriptor read/8, error -110
usb 1-6.2: device descriptor read/8, error -110
usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
usb 1-6.2: device descriptor read/8, error -110
usb 1-6.2: device descriptor read/8, error -110
sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
sd 1:0:0:0: SCSI error: return code = 0x00050000
end_request: I/O error, dev sda, sector 919931588
sd 1:0:0:0: rejecting I/O to offline device
sd 1:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 919931828
sd 1:0:0:0: rejecting I/O to offline device
sd 1:0:0:0: rejecting I/O to offline device
sd 1:0:0:0: rejecting I/O to offline device
...
SoftDog: Initiating system reboot.

-----------------------------------------------------

Now, the USB problem may well be a device or cabling issue, but I
don't think that this drive failure should trigger a reboot - I assume
the drive failure is somehow constipating the entire disk I/O system,
and preventing my softdog-patting script from running.

My softdog script loops 'date >/tmp/$logfile;echo 1 > /dev/watchdog; sleep
30'. Nothing in it accesses the USB disk. /tmp is tmpfs.

Should the USB failure cause this problem?

-Eric



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB misbehavior causes system hang
  2007-02-27 14:06 USB misbehavior causes system hang Eric Buddington
@ 2007-02-27 14:15 ` Oliver Neukum
  2007-02-27 21:49 ` Pete Zaitcev
  2007-03-05 10:17 ` Andrew Morton
  2 siblings, 0 replies; 6+ messages in thread
From: Oliver Neukum @ 2007-02-27 14:15 UTC (permalink / raw)
  To: ebuddington; +Cc: linux-kernel

Am Dienstag, 27. Februar 2007 15:06 schrieb Eric Buddington:

Exactly this portion of the log would hold the reason for the reset.
 
> usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> usb 1-6.2: device descriptor read/64, error -110
> usb 1-6.2: device descriptor read/64, error -110
[..]
> end_request: I/O error, dev sda, sector 919931828
> sd 1:0:0:0: rejecting I/O to offline device
> sd 1:0:0:0: rejecting I/O to offline device
> sd 1:0:0:0: rejecting I/O to offline device

By that point USB is no longer involved. These requests error out
in the SCSI layer.

> My softdog script loops 'date >/tmp/$logfile;echo 1 > /dev/watchdog; sleep
> 30'. Nothing in it accesses the USB disk. /tmp is tmpfs.

If you've written to the USB disk a lot, you are likely to cause writes
to it as soon as you are under memory pressure.
 
> Should the USB failure cause this problem?

No.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB misbehavior causes system hang
  2007-02-27 14:06 USB misbehavior causes system hang Eric Buddington
  2007-02-27 14:15 ` Oliver Neukum
@ 2007-02-27 21:49 ` Pete Zaitcev
  2007-03-01 13:39   ` Eric Buddington
  2007-03-05 10:17 ` Andrew Morton
  2 siblings, 1 reply; 6+ messages in thread
From: Pete Zaitcev @ 2007-02-27 21:49 UTC (permalink / raw)
  To: ebuddington; +Cc: Eric Buddington, linux-kernel, zaitcev

On Tue, 27 Feb 2007 09:06:21 -0500, Eric Buddington <ebuddington@verizon.net> wrote:

> sd 1:0:0:0: rejecting I/O to offline device
> ...
> SoftDog: Initiating system reboot.

> Now, the USB problem may well be a device or cabling issue, but I
> don't think that this drive failure should trigger a reboot - I assume
> the drive failure is somehow constipating the entire disk I/O system,
> and preventing my softdog-patting script from running.

Have you tried ub? In theory, its threadless design is supposed to
help with just this kind of a problem. Please let me know, I'm very
curous.

However, the main issue here is the OOM with all the dirty data.
We saw that before. For some weird reason, ext3 is especially good
at producing the immense amounts of write-out. Are you on ext3 or
VFAT on that drive?

Please try to find the CPU traces by hitting SysRq-w, SysRq-p. CPU
is looping under a lock somewhere and eventually it cases the watchdog
to trigger. It may be a USB issue, maybe a VM issue. I can't tell
until we get stack traces.

This does not help you to deal with the unreliable drive, I'm afraid,
but it would be a great service if you pinned down the reason of looping.

-- Pete

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB misbehavior causes system hang
  2007-02-27 21:49 ` Pete Zaitcev
@ 2007-03-01 13:39   ` Eric Buddington
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Buddington @ 2007-03-01 13:39 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: linux-kernel

On Tue, Feb 27, 2007 at 01:49:40PM -0800, Pete Zaitcev wrote:
> However, the main issue here is the OOM with all the dirty data.
> We saw that before. For some weird reason, ext3 is especially good
> at producing the immense amounts of write-out. Are you on ext3 or
> VFAT on that drive?

Reiser4.

> Please try to find the CPU traces by hitting SysRq-w, SysRq-p. CPU
> is looping under a lock somewhere and eventually it cases the watchdog
> to trigger. It may be a USB issue, maybe a VM issue. I can't tell
> until we get stack traces.

I can log the dmesg's via netconsole, but I'm often not at the
computer to use SysRq. I've just discovered /proc/sysrq-trigger, which
I can maybe use from a script that watches dmesg. I'll report back if
I can catch the dumps.

-Eric




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB misbehavior causes system hang
  2007-02-27 14:06 USB misbehavior causes system hang Eric Buddington
  2007-02-27 14:15 ` Oliver Neukum
  2007-02-27 21:49 ` Pete Zaitcev
@ 2007-03-05 10:17 ` Andrew Morton
  2007-03-05 16:55   ` Eric Buddington
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-03-05 10:17 UTC (permalink / raw)
  To: ebuddington; +Cc: ebuddington, linux-kernel, linux-usb-devel

> On Tue, 27 Feb 2007 09:06:21 -0500 Eric Buddington <ebuddington@verizon.net> wrote:
> 2.6.20-mm2 #1 Mon Feb 26 13:16:04 EST 2007 i686 unknown
> 
> I have an external USB drive (WD MyBook 5000YS), which I use for backups.
> 
> When I try to back up to it, it works for a while, but inevitably
> starts resetting like mad, gives I/O errors, and then (here's the
> problem), the softdog module reboots the system.
> 
> ---------------------------
> scsi1 : SCSI emulation for USB Mass Storage devices
> input: Western Digital External HDD as /class/input/input8
> input: USB HID v1.11 Device [Western Digital External HDD] on usb-0000:00:03.2-6
> .2
> scsi 1:0:0:0: Direct-Access     WD       5000YS External  106a PQ: 0 ANSI: 4
> SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
> sda: Write Protect is off
> sda: assuming drive cache: write through
> SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
> sda: Write Protect is off
> sda: assuming drive cache: write through
>  sda: sda1 sda2
> sd 1:0:0:0: Attached scsi disk sda
> sd 1:0:0:0: Attached scsi generic sg0 type 0
> reiser4[pdflush(194)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]
> :
> NOTICE: md1 does not support write barriers, using synchronous write instead.
> reiser4: sda2: found disk format 4.0.0.
> 
> ---- (works fine here for a while, then:) ----------
> 
> usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> usb 1-6.2: device descriptor read/64, error -110
> usb 1-6.2: device descriptor read/64, error -110
> usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> usb 1-6.2: device descriptor read/64, error -110
> usb 1-6.2: device descriptor read/64, error -110
> usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> usb 1-6.2: device descriptor read/8, error -110
> usb 1-6.2: device descriptor read/8, error -110
> usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> usb 1-6.2: device descriptor read/8, error -110
> usb 1-6.2: device descriptor read/8, error -110
> sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
> sd 1:0:0:0: SCSI error: return code = 0x00050000
> end_request: I/O error, dev sda, sector 919931588
> sd 1:0:0:0: rejecting I/O to offline device
> sd 1:0:0:0: SCSI error: return code = 0x00010000
> end_request: I/O error, dev sda, sector 919931828
> sd 1:0:0:0: rejecting I/O to offline device
> sd 1:0:0:0: rejecting I/O to offline device
> sd 1:0:0:0: rejecting I/O to offline device

Does 2.6.20 do this?  2.6.21-rc1?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: USB misbehavior causes system hang
  2007-03-05 10:17 ` Andrew Morton
@ 2007-03-05 16:55   ` Eric Buddington
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Buddington @ 2007-03-05 16:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ebuddington, ebuddington, linux-kernel, linux-usb-devel

On Mon, Mar 05, 2007 at 02:17:28AM -0800, Andrew Morton wrote:
> > On Tue, 27 Feb 2007 09:06:21 -0500 Eric Buddington <ebuddington@verizon.net> wrote:
> > 2.6.20-mm2 #1 Mon Feb 26 13:16:04 EST 2007 i686 unknown
> > 
> > I have an external USB drive (WD MyBook 5000YS), which I use for backups.
> > 
> > When I try to back up to it, it works for a while, but inevitably
> > starts resetting like mad, gives I/O errors, and then (here's the
> > problem), the softdog module reboots the system.
> > 
> > ---------------------------
> > scsi1 : SCSI emulation for USB Mass Storage devices
> > input: Western Digital External HDD as /class/input/input8
> > input: USB HID v1.11 Device [Western Digital External HDD] on usb-0000:00:03.2-6
> > .2
> > scsi 1:0:0:0: Direct-Access     WD       5000YS External  106a PQ: 0 ANSI: 4
> > SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
> > sda: Write Protect is off
> > sda: assuming drive cache: write through
> > SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
> > sda: Write Protect is off
> > sda: assuming drive cache: write through
> >  sda: sda1 sda2
> > sd 1:0:0:0: Attached scsi disk sda
> > sd 1:0:0:0: Attached scsi generic sg0 type 0
> > reiser4[pdflush(194)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]
> > :
> > NOTICE: md1 does not support write barriers, using synchronous write instead.
> > reiser4: sda2: found disk format 4.0.0.
> > 
> > ---- (works fine here for a while, then:) ----------
> > 
> > usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> > usb 1-6.2: device descriptor read/64, error -110
> > usb 1-6.2: device descriptor read/64, error -110
> > usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> > usb 1-6.2: device descriptor read/64, error -110
> > usb 1-6.2: device descriptor read/64, error -110
> > usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> > usb 1-6.2: device descriptor read/8, error -110
> > usb 1-6.2: device descriptor read/8, error -110
> > usb 1-6.2: reset high speed USB device using ehci_hcd and address 36
> > usb 1-6.2: device descriptor read/8, error -110
> > usb 1-6.2: device descriptor read/8, error -110
> > sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
> > sd 1:0:0:0: SCSI error: return code = 0x00050000
> > end_request: I/O error, dev sda, sector 919931588
> > sd 1:0:0:0: rejecting I/O to offline device
> > sd 1:0:0:0: SCSI error: return code = 0x00010000
> > end_request: I/O error, dev sda, sector 919931828
> > sd 1:0:0:0: rejecting I/O to offline device
> > sd 1:0:0:0: rejecting I/O to offline device
> > sd 1:0:0:0: rejecting I/O to offline device
> 
> Does 2.6.20 do this?  2.6.21-rc1?

2.6.20-mm2 seems to be OK; I've been backing up to the disk for a few
hours now, with no problem.

-Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-03-05 16:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-27 14:06 USB misbehavior causes system hang Eric Buddington
2007-02-27 14:15 ` Oliver Neukum
2007-02-27 21:49 ` Pete Zaitcev
2007-03-01 13:39   ` Eric Buddington
2007-03-05 10:17 ` Andrew Morton
2007-03-05 16:55   ` Eric Buddington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.