All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] firewire: mass-storage i/o-problems
@ 2007-07-22 21:32 Andreas Messer
  2007-07-22 23:34 ` Stefan Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Messer @ 2007-07-22 21:32 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6353 bytes --]

Hello,

I tried the new firewire stack with a external harddisc and a external dvd 
writer and get massive i/o problems. Here is the kernel output for the 
harddisc. Please cc me for further questions. I hope its not too much output 
for lkml email. 

--------------------------------------------------------------
firewire_core: created new fw device fw1 (0 config rom retries)
firewire_core: phy config: card 0, new root=ffc0, gap_count=5
scsi2 : SBP-2 IEEE-1394
firewire_sbp2: management write failed, rcode 0x14
firewire_sbp2: orb reply timed out, rcode=0x11
firewire_sbp2: management write failed, rcode 0x10
message repeated 2 times
firewire_sbp2: management write failed, rcode 0x14
firewire_sbp2: failed to login to fw1.0
firewire_sbp2: management write failed, rcode 0x13
firewire_sbp2: removed sbp2 unit fw1.0
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
scsi3 : SBP-2 IEEE-1394
firewire_core: created new fw device fw1 (0 config rom retries)
firewire_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
firewire_sbp2: - management_agent_address: 0xfffff0010000
firewire_sbp2: - command_block_agent_address: 0xfffff0010020
firewire_sbp2: - status write address: 0x000100000000
scsi 3:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
sd 3:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
firewire_sbp2: sbp2_scsi_abort
firewire_sbp2: sbp2_scsi_abort
sd 3:0:0:0: scsi: Device offlined - not ready after error recovery
sd 3:0:0:0: [sdb] Write Protect is off
sd 3:0:0:0: [sdb] Mode Sense: 00 00 00 00
sd 3:0:0:0: rejecting I/O to offline device
sd 3:0:0:0: [sdb] Asking for cache data failed
sd 3:0:0:0: [sdb] Assuming drive cache: write through
sd 3:0:0:0: [sdb] Attached SCSI disk
sd 3:0:0:0: Attached scsi generic sg3 type 14
firewire_sbp2: management write failed, rcode 0x13
firewire_sbp2: removed sbp2 unit fw1.0

replugged hdd:

firewire_core: phy config: card 0, new root=ffc1, gap_count=5
scsi4 : SBP-2 IEEE-1394
firewire_core: created new fw device fw1 (0 config rom retries)
firewire_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
firewire_sbp2: - management_agent_address: 0xfffff0010000
firewire_sbp2: - command_block_agent_address: 0xfffff0010020
firewire_sbp2: - status write address: 0x000100000000
scsi 4:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
sd 4:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 11 00 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 4:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 11 00 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sdb: sdb1
sd 4:0:0:0: [sdb] Attached SCSI disk
sd 4:0:0:0: Attached scsi generic sg3 type 14
firewire_sbp2: sbp2_scsi_abort
firewire_sbp2: sbp2_scsi_abort
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: [sdb] Result: hostbyte=DID_BUS_BUSY 
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 518
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdb, sector 286153
FAT: FAT read failed (blocknr 455)
sd 4:0:0:0: rejecting I/O to offline device
FAT: Directory bread(block 286090) failed
sd 4:0:0:0: rejecting I/O to offline device

.... (many lines of that)

Buffer I/O error on device sdb1, logical block 143073
lost page write due to I/O error on sdb1
firewire_sbp2: management write failed, rcode 0x13
sd 4:0:0:0: [sdb] Synchronizing SCSI cache
sd 4:0:0:0: [sdb] Result: hostbyte=DID_BUS_BUSY 
driverbyte=DRIVER_OK,SUGGEST_OK
firewire_sbp2: removed sbp2 unit fw1.0

replugged again:

firewire_core: phy config: card 0, new root=ffc1, gap_count=5
scsi5 : SBP-2 IEEE-1394
firewire_core: created new fw device fw1 (0 config rom retries)
firewire_sbp2: orb reply timed out, rcode=0x11
firewire_sbp2: management write failed, rcode 0x12
message repeated 4 times
firewire_sbp2: failed to login to fw1.0
firewire_sbp2: status write for unknown orb
firewire_sbp2: management write failed, rcode 0x13
firewire_sbp2: removed sbp2 unit fw1.0

replugged again:

firewire_core: phy config: card 0, new root=ffc1, gap_count=5
scsi6 : SBP-2 IEEE-1394
firewire_core: created new fw device fw1 (0 config rom retries)
firewire_sbp2: orb reply timed out, rcode=0x11
firewire_sbp2: management write failed, rcode 0x12
message repeated 4 times
firewire_sbp2: failed to login to fw1.0
firewire_sbp2: status write for unknown orb

using old fw stack, everything fine:

ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[ee001000-ee0017ff] Max 
Packet=[2048] IR/IT contexts=[4/8]
ieee1394: The root node is not cycle master capable; selecting a new root node 
and resetting...
ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e0000002e]
ieee1394: Host added: ID:BUS[0-01:1023] GUID[000a480000000edf]
scsi2 : SBP-2 IEEE-1394
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048]
scsi 2:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
sd 2:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Mode Sense: 11 00 00 00
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 2:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Mode Sense: 11 00 00 00
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sdb: sdb1
sd 2:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: Attached scsi generic sg3 type 14
-------------------------------------------

Hardware: 
Via ohci1394 FW Controller (PCI), nforce2 chipset, Athlon XP-M

Software:
Kernel Vanilla 2.6.22, gcc 4.1.2. 

On another PC same problem, but replugging one or two times get the thing 
working. 

greeting
Andreas Messer
-- 
gnuPG keyid: 8C2BAF51
fingerprint: 28EE 8438 E688 D992 3661 C753 90B3 BAAA 8C2B AF51

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-22 21:32 [BUG] firewire: mass-storage i/o-problems Andreas Messer
@ 2007-07-22 23:34 ` Stefan Richter
  2007-07-23  6:37   ` Manuel Lauss
  2007-07-23  8:07   ` Andreas Messer
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Richter @ 2007-07-22 23:34 UTC (permalink / raw)
  To: Andreas Messer; +Cc: linux-kernel, linux1394-devel

(quoting in full for linux1394-devel, Cc added)

Andreas Messer wrote at LKML:
> Hello,
> 
> I tried the new firewire stack with a external harddisc and a external dvd 
> writer and get massive i/o problems. Here is the kernel output for the 
> harddisc. Please cc me for further questions. I hope its not too much output 
> for lkml email. 
> 
> --------------------------------------------------------------
> firewire_core: created new fw device fw1 (0 config rom retries)
> firewire_core: phy config: card 0, new root=ffc0, gap_count=5
> scsi2 : SBP-2 IEEE-1394
> firewire_sbp2: management write failed, rcode 0x14
> firewire_sbp2: orb reply timed out, rcode=0x11
> firewire_sbp2: management write failed, rcode 0x10
> message repeated 2 times
> firewire_sbp2: management write failed, rcode 0x14
> firewire_sbp2: failed to login to fw1.0
> firewire_sbp2: management write failed, rcode 0x13

#define RCODE_SEND_ERROR		0x10
#define RCODE_GENERATION		0x13
#define RCODE_NO_ACK			0x14

There are multiple bus resets happening while fw-sbp2 tries to log in.
Normally I would say that this is a sign of an electrically unstable
bus.  But since the old drivers don't show anything like that at all,
there must be problems in the new drivers.

> firewire_sbp2: removed sbp2 unit fw1.0
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> scsi3 : SBP-2 IEEE-1394
> firewire_core: created new fw device fw1 (0 config rom retries)

Here even the device fw1 vanished from fw-core's point of view, then
came back.

> firewire_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
> firewire_sbp2: - management_agent_address: 0xfffff0010000
> firewire_sbp2: - command_block_agent_address: 0xfffff0010020
> firewire_sbp2: - status write address: 0x000100000000
> scsi 3:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
> sd 3:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
> firewire_sbp2: sbp2_scsi_abort
> firewire_sbp2: sbp2_scsi_abort
> sd 3:0:0:0: scsi: Device offlined - not ready after error recovery
> sd 3:0:0:0: [sdb] Write Protect is off
> sd 3:0:0:0: [sdb] Mode Sense: 00 00 00 00
> sd 3:0:0:0: rejecting I/O to offline device
> sd 3:0:0:0: [sdb] Asking for cache data failed
> sd 3:0:0:0: [sdb] Assuming drive cache: write through
> sd 3:0:0:0: [sdb] Attached SCSI disk
> sd 3:0:0:0: Attached scsi generic sg3 type 14
> firewire_sbp2: management write failed, rcode 0x13
> firewire_sbp2: removed sbp2 unit fw1.0
> 
> replugged hdd:
> 
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> scsi4 : SBP-2 IEEE-1394
> firewire_core: created new fw device fw1 (0 config rom retries)
> firewire_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
> firewire_sbp2: - management_agent_address: 0xfffff0010000
> firewire_sbp2: - command_block_agent_address: 0xfffff0010020
> firewire_sbp2: - status write address: 0x000100000000
> scsi 4:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
> sd 4:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
> sd 4:0:0:0: [sdb] Write Protect is off
> sd 4:0:0:0: [sdb] Mode Sense: 11 00 00 00
> sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
> sd 4:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
> sd 4:0:0:0: [sdb] Write Protect is off
> sd 4:0:0:0: [sdb] Mode Sense: 11 00 00 00
> sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
> sdb: sdb1
> sd 4:0:0:0: [sdb] Attached SCSI disk
> sd 4:0:0:0: Attached scsi generic sg3 type 14
> firewire_sbp2: sbp2_scsi_abort
> firewire_sbp2: sbp2_scsi_abort
> sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
> sd 4:0:0:0: [sdb] Result: hostbyte=DID_BUS_BUSY 
> driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdb, sector 518
> sd 4:0:0:0: rejecting I/O to offline device
> sd 4:0:0:0: rejecting I/O to offline device
> sd 4:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdb, sector 286153
> FAT: FAT read failed (blocknr 455)
> sd 4:0:0:0: rejecting I/O to offline device
> FAT: Directory bread(block 286090) failed
> sd 4:0:0:0: rejecting I/O to offline device
> 
> .... (many lines of that)
> 
> Buffer I/O error on device sdb1, logical block 143073
> lost page write due to I/O error on sdb1
> firewire_sbp2: management write failed, rcode 0x13
> sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> sd 4:0:0:0: [sdb] Result: hostbyte=DID_BUS_BUSY 
> driverbyte=DRIVER_OK,SUGGEST_OK
> firewire_sbp2: removed sbp2 unit fw1.0
> 
> replugged again:
> 
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> scsi5 : SBP-2 IEEE-1394
> firewire_core: created new fw device fw1 (0 config rom retries)
> firewire_sbp2: orb reply timed out, rcode=0x11
> firewire_sbp2: management write failed, rcode 0x12
> message repeated 4 times
> firewire_sbp2: failed to login to fw1.0
> firewire_sbp2: status write for unknown orb
> firewire_sbp2: management write failed, rcode 0x13
> firewire_sbp2: removed sbp2 unit fw1.0
> 
> replugged again:
> 
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> scsi6 : SBP-2 IEEE-1394
> firewire_core: created new fw device fw1 (0 config rom retries)
> firewire_sbp2: orb reply timed out, rcode=0x11
> firewire_sbp2: management write failed, rcode 0x12
> message repeated 4 times
> firewire_sbp2: failed to login to fw1.0
> firewire_sbp2: status write for unknown orb
> 
> using old fw stack, everything fine:
> 
> ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[ee001000-ee0017ff] Max 
> Packet=[2048] IR/IT contexts=[4/8]
> ieee1394: The root node is not cycle master capable; selecting a new root node 
> and resetting...
> ieee1394: Node added: ID:BUS[0-00:1023] GUID[0050770e0000002e]
> ieee1394: Host added: ID:BUS[0-01:1023] GUID[000a480000000edf]
> scsi2 : SBP-2 IEEE-1394
> ieee1394: sbp2: Logged into SBP-2 device
> ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> scsi 2:0:0:0: Direct-Access-RBC SAMSUNG HD300LD PQ: 0 ANSI: 4
> sd 2:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
> sd 2:0:0:0: [sdb] Write Protect is off
> sd 2:0:0:0: [sdb] Mode Sense: 11 00 00 00
> sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
> sd 2:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
> sd 2:0:0:0: [sdb] Write Protect is off
> sd 2:0:0:0: [sdb] Mode Sense: 11 00 00 00
> sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
> sdb: sdb1
> sd 2:0:0:0: [sdb] Attached SCSI disk
> sd 2:0:0:0: Attached scsi generic sg3 type 14
> -------------------------------------------
> 
> Hardware: 
> Via ohci1394 FW Controller (PCI), nforce2 chipset, Athlon XP-M

Are you sure the FireWire controller is from VIA?  Check with lscpi.
The NVidia nForce2 chipset has an own FireWire controller, and that one
is only "supported" by a gross hack in ohci1394 and at the moment
unsupported in firewire-ohci.  There is even a bug report in Red Hat's
bugzilla where fw-ohci hung up when trying to initialize an nForce2 chip.

> Software:
> Kernel Vanilla 2.6.22, gcc 4.1.2. 
> 
> On another PC same problem, but replugging one or two times get the thing 
> working. 

Which controller does this other PC have?

BTW, the GUID of the host in the above ieee1394 log is from Albatron
Technology, and th GUID of the disk is from Prolific Technology, Inc.
Some (many?) Prolific based devices were and maybe still are sold with
outdated buggy firmware.
-- 
Stefan Richter
-=====-=-=== -=== =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-22 23:34 ` Stefan Richter
@ 2007-07-23  6:37   ` Manuel Lauss
  2007-07-23  7:40     ` Stefan Richter
  2007-07-23  8:07   ` Andreas Messer
  1 sibling, 1 reply; 14+ messages in thread
From: Manuel Lauss @ 2007-07-23  6:37 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Andreas Messer, linux-kernel, linux1394-devel

Hello,

On Mon, Jul 23, 2007 at 01:34:11AM +0200, Stefan Richter wrote:

<massive bugreport snipped>

> > Software:
> > Kernel Vanilla 2.6.22, gcc 4.1.2. 
> > 
> > On another PC same problem, but replugging one or two times get the thing 
> > working. 
> 
> Which controller does this other PC have?

I too experience these bugs with the new fw stack; this time with a TI
OHCI-1394a combo chip in 2 different laptops and a Via 1394 pci addon card.
The target is a an external hd enclosure with an Oxford Semi chip.

Thanks,
	Manuel Lauss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23  6:37   ` Manuel Lauss
@ 2007-07-23  7:40     ` Stefan Richter
  2007-07-23  8:45       ` Manuel Lauss
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Richter @ 2007-07-23  7:40 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Andreas Messer, linux-kernel, linux1394-devel

Manuel Lauss wrote:
> I too experience these bugs with the new fw stack; this time with a TI
> OHCI-1394a combo chip in 2 different laptops and a Via 1394 pci addon card.
> The target is a an external hd enclosure with an Oxford Semi chip.

Thanks for the info.  Then there is definitely a driver problem.  It is
essential that I (and Kristian) find a way to reproduce it.

I have 4 test PCs but use only one of them regularly.  Testing takes
time...  So far I did almost all high-volume tests on Core 2 Duo/ i945,
kernel compiled for 32bit SMP Preempt.  I have 6 different FireWire
cards which I could test in it (right now at most 3 at once; usually I
only test 1 at once).  Sometimes I use a Core 2 Duo/ i945, 64bit SMP
Preempt, with only one card which I can't swap.  Very rarely I test an
Athlon/ KM266, kernel compiled for UP non-preempt.  I've also got an
ancient Pentium MMX notebook on which I have yet to install a new distro
before I can test it reliably.

According to Andreas' report, I suppose I should do high-volume tests on
the Athlon now.

On my main machine, I only saw very rare and recoverable "status write
for unknown orb" errors (one during many hours of continuous I/O).  I
have 10 or so different SBP-2 devices but only test few of them
regularly.  They all work for me, except for 100% login failures on an
old CD-R/W and fw-sbp2's inability to access the 2nd logical unit of the
one dual-LU device of mine.  (Multi LU capability should be added soon.)

I always only test vanilla kernels (with ieee1394/firewire patches).  I
wanted to start testing -rt kernels because they are known to make bugs
of the old ieee1394 drivers more visible, but simply didn't find the
time to do such tests yet.
-- 
Stefan Richter
-=====-=-=== -=== =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-22 23:34 ` Stefan Richter
  2007-07-23  6:37   ` Manuel Lauss
@ 2007-07-23  8:07   ` Andreas Messer
  2007-07-23  9:58     ` Andreas Micklei
  2007-07-23 11:30     ` Stefan Richter
  1 sibling, 2 replies; 14+ messages in thread
From: Andreas Messer @ 2007-07-23  8:07 UTC (permalink / raw)
  To: Stefan Richter, linux1394-devel, Manuel Lauss, linux-kernel

On Monday, 23. Juli 2007 01:34 Stefan Richter wrote:
> (quoting in full for linux1394-devel, Cc added)
>
> Andreas Messer wrote at LKML:
> > Hello,
> >
> > I tried the new firewire stack with a external harddisc and a external
> > dvd writer and get massive i/o problems. Here is the kernel output for
> > the harddisc. Please cc me for further questions. I hope its not too much
> > output for lkml email.

< bugreport snipped>
>
> Are you sure the FireWire controller is from VIA?  Check with lscpi.
> The NVidia nForce2 chipset has an own FireWire controller, and that one
> is only "supported" by a gross hack in ohci1394 and at the moment
> unsupported in firewire-ohci.  There is even a bug report in Red Hat's
> bugzilla where fw-ohci hung up when trying to initialize an nForce2 chip.

The first System, where harddisc don't work at all with the new stack: 
Nvidia NForce2 Ultra Chipset; 1GB RAM; XP-M 2600; FW-Controller is
PCI addon card; lspci says Via
Vanilla Kernel 2.6.22, apic, Preempt, Dynamic Tics. hmm something else 
interesting?

> > On another PC same problem, but replugging one or two times get the thing
> > working.
>
> Which controller does this other PC have?

Second System, here replugging helped, testet with harddisc and external
dvd-rw drive:
Nvidia Nforce2 (no Ultra!) Chipset; 1GB Ram XP-M 2500@2600; Same FW Controller
Kernel almost the same (maybe different device drivers)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23  7:40     ` Stefan Richter
@ 2007-07-23  8:45       ` Manuel Lauss
  2007-07-23 11:34         ` Stefan Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Manuel Lauss @ 2007-07-23  8:45 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Andreas Messer, linux-kernel, linux1394-devel

Hi,

On Mon, Jul 23, 2007 at 09:40:21AM +0200, Stefan Richter wrote:
> Manuel Lauss wrote:
> > I too experience these bugs with the new fw stack; this time with a TI
> > OHCI-1394a combo chip in 2 different laptops and a Via 1394 pci addon card.
> > The target is a an external hd enclosure with an Oxford Semi chip.
> 
> Thanks for the info.  Then there is definitely a driver problem.  It is
> essential that I (and Kristian) find a way to reproduce it.

I noticed the failures start when there are 2 concurrent disk accesses
(copy something from fw disk on shell 1 and it runs fine; start to
copy something TO the fw disk on shell 2 and a "management write failed"
error appears after 1-2 sec. with the orb timeout after a looong time)
I didn't investigate it further and switched back to the old stack.

Thanks,
	Manuel Lauss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23  8:07   ` Andreas Messer
@ 2007-07-23  9:58     ` Andreas Micklei
  2007-07-23 11:30     ` Stefan Richter
  1 sibling, 0 replies; 14+ messages in thread
From: Andreas Micklei @ 2007-07-23  9:58 UTC (permalink / raw)
  To: linux1394-devel, linux1394-devel
  Cc: Andreas Messer, Stefan Richter, Manuel Lauss, linux-kernel

Am Montag, 23. Juli 2007 schrieb Andreas Messer:
> The first System, where harddisc don't work at all with the new stack:
> Nvidia NForce2 Ultra Chipset; 1GB RAM; XP-M 2600; FW-Controller is
> PCI addon card; lspci says Via

Additional note: On the machine where my iPod 3G corruption occure I have a 
VIA K8T800 chipset, but I have also an Athlon (Athlon64 3200+) and a VIA 
IEEE1394 host controller.

regards,
Andreas Micklei

-- 
Andreas Micklei
IVISTAR Kommunikationssysteme AG
Ehrenbergstr. 19 / 10245 Berlin, Germany
http://www.ivistar.de

Handelsregister: Berlin Charlottenburg HRB 75173
Umsatzsteuer-ID: DE207795030
Vorstand: Dr.-Ing. Dirk Elias
Aufsichtsratsvorsitz: Dipl.-Betriebsw. Frank Bindel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23  8:07   ` Andreas Messer
  2007-07-23  9:58     ` Andreas Micklei
@ 2007-07-23 11:30     ` Stefan Richter
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Richter @ 2007-07-23 11:30 UTC (permalink / raw)
  To: Andreas Messer; +Cc: linux1394-devel, Manuel Lauss, linux-kernel

Andreas Messer wrote:
> On Monday, 23. Juli 2007 01:34 Stefan Richter wrote:
>> Are you sure the FireWire controller is from VIA?  Check with lscpi.
>> The NVidia nForce2 chipset has an own FireWire controller, and that one
>> is only "supported" by a gross hack in ohci1394 and at the moment
>> unsupported in firewire-ohci.
...
> The first System, where harddisc don't work at all with the new stack: 
> Nvidia NForce2 Ultra Chipset; 1GB RAM; XP-M 2600; FW-Controller is
> PCI addon card; lspci says Via

OK, if it's an add-on card, then you surely got the right information
from lspci.  I tested 3 different VIA controllers, they all worked for me.

> Vanilla Kernel 2.6.22, apic, Preempt, Dynamic Tics. hmm something else 
> interesting?

I haven't tested dyntics yet, should do so eventually.
-- 
Stefan Richter
-=====-=-=== -=== =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23  8:45       ` Manuel Lauss
@ 2007-07-23 11:34         ` Stefan Richter
  2007-07-23 18:33           ` Manuel Lauss
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Richter @ 2007-07-23 11:34 UTC (permalink / raw)
  To: Manuel Lauss; +Cc: Andreas Messer, linux-kernel, linux1394-devel

Manuel Lauss wrote:
> I noticed the failures start when there are 2 concurrent disk accesses
> (copy something from fw disk on shell 1 and it runs fine; start to
> copy something TO the fw disk on shell 2 and a "management write failed"
> error appears after 1-2 sec. with the orb timeout after a looong time)

Most of my tests have been with a single process, but I already tested
several processes at once in parallel too.  But probably only on the
bigger machine where neither the CPU nor the PCI bus (and this might be
important) would become a bottleneck.  I have to test this again on the
smaller machines.
-- 
Stefan Richter
-=====-=-=== -=== =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23 11:34         ` Stefan Richter
@ 2007-07-23 18:33           ` Manuel Lauss
  2007-07-23 18:44             ` Manuel Lauss
  0 siblings, 1 reply; 14+ messages in thread
From: Manuel Lauss @ 2007-07-23 18:33 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Andreas Messer, linux-kernel, linux1394-devel

On Mon, Jul 23, 2007 at 01:34:28PM +0200, Stefan Richter wrote:
> Manuel Lauss wrote:
> > I noticed the failures start when there are 2 concurrent disk accesses
> > (copy something from fw disk on shell 1 and it runs fine; start to
> > copy something TO the fw disk on shell 2 and a "management write failed"
> > error appears after 1-2 sec. with the orb timeout after a looong time)
> 
> Most of my tests have been with a single process, but I already tested
> several processes at once in parallel too.  But probably only on the
> bigger machine where neither the CPU nor the PCI bus (and this might be
> important) would become a bottleneck.  I have to test this again on the
> smaller machines.

I think I found a way to reliably reproduce the problem (2.6.22)

NFS-export the fw disk, mount it on another host, put a movie on it,
play it with mplayer on the other machine. Seek a little in the movie,
locks up every time here. If it doesnt, copy data to the disk in parallel.

Thanks,
	Manuel Lauss


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23 18:33           ` Manuel Lauss
@ 2007-07-23 18:44             ` Manuel Lauss
  2007-07-24 19:56               ` Stefan Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Manuel Lauss @ 2007-07-23 18:44 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Andreas Messer, linux-kernel, linux1394-devel

On Mon, Jul 23, 2007 at 08:33:36PM +0200, Manuel Lauss wrote:
> On Mon, Jul 23, 2007 at 01:34:28PM +0200, Stefan Richter wrote:
> > Manuel Lauss wrote:
> > > I noticed the failures start when there are 2 concurrent disk accesses
> > > (copy something from fw disk on shell 1 and it runs fine; start to
> > > copy something TO the fw disk on shell 2 and a "management write failed"
> > > error appears after 1-2 sec. with the orb timeout after a looong time)
> > 
> > Most of my tests have been with a single process, but I already tested
> > several processes at once in parallel too.  But probably only on the
> > bigger machine where neither the CPU nor the PCI bus (and this might be
> > important) would become a bottleneck.  I have to test this again on the
> > smaller machines.
> 
> I think I found a way to reliably reproduce the problem (2.6.22)
> 
> NFS-export the fw disk, mount it on another host, put a movie on it,
> play it with mplayer on the other machine. Seek a little in the movie,
> locks up every time here. If it doesnt, copy data to the disk in parallel.

Actually, copying data to the disk while playing/seeking through a moviefile
which is also located on it is already enough. Forget the NFS thing...

Afterwards the firewire_sbp2 module has to be rmmod-ed and modprobed again
or it will continue to throw errors even for single reads.

I hope this helps tracking it down...

Thanks,
	Manuel Lauss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-23 18:44             ` Manuel Lauss
@ 2007-07-24 19:56               ` Stefan Richter
  2007-07-25  5:12                 ` Manuel Lauss
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Richter @ 2007-07-24 19:56 UTC (permalink / raw)
  To: Manuel Lauss
  Cc: Andreas Micklei, Andreas Messer, linux-kernel, linux1394-devel

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

Manuel Lauss wrote:
> Actually, copying data to the disk while playing/seeking through a moviefile
> which is also located on it is already enough. Forget the NFS thing...
> 
> Afterwards the firewire_sbp2 module has to be rmmod-ed and modprobed again
> or it will continue to throw errors even for single reads.
> 
> I hope this helps tracking it down...

I tried this and similar tests on my main PC (PCIe based) and on an
Athlon/KM266 PC, with 1394b and 1394a hardware.  Nothing happened,
except for a single "status write for unknown orb", followed by command
abort from which the disk immediately recovered.  I did many tests and
it didn't happen again.  I.e. it's probable that the supposed bug
happens here too, but very rarely.

Could you (and everyone else who has repeated I/O errors with the new
drivers, but not with the old drivers) test the attached patches, one
patch at a time?  They apply to 2.6.22.
-- 
Stefan Richter
-=====-=-=== -=== ==---
http://arcgraph.de/sr/

[-- Attachment #2: test1-firewire-fw-sbp2-default-to-128k-transfers.patch --]
[-- Type: text/plain, Size: 853 bytes --]

firewire: fw-sbp2: default to 128k transfers

because that's what the old sbp2 driver does per default, to avoid
trouble with buggy devices.

A test on a 1394b hardware RAID0 shows a drop in bandwidth by 10% by
this patch.
---
This should not be hardwired but set by blk_queue_max_sectors() in
sbp2_scsi_slave_configure().

 drivers/firewire/fw-sbp2.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22/drivers/firewire/fw-sbp2.c
===================================================================
--- linux-2.6.22.orig/drivers/firewire/fw-sbp2.c
+++ linux-2.6.22/drivers/firewire/fw-sbp2.c
@@ -1171,6 +1171,7 @@ static struct scsi_host_template scsi_dr
 	.this_id		= -1,
 	.sg_tablesize		= SG_ALL,
 	.use_clustering		= ENABLE_CLUSTERING,
+	.max_sectors		= 255,
 	.cmd_per_lun		= 1,
 	.can_queue		= 1,
 	.sdev_attrs		= sbp2_scsi_sysfs_attrs,

[-- Attachment #3: test2-firewire-fw-sbp2-increase-busy-timeout.patch --]
[-- Type: text/plain, Size: 2132 bytes --]

firewire: fw-sbp2: increase BUSY_TIMEOUT

Increase BUSY_TIMEOUT.retry_limit to a maximum, like the old sbp2 driver
does.  This lets targets retry more times in single phase retry if our
host adapter is too busy to accept packets.
---
 drivers/firewire/fw-sbp2.c |   30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

Index: linux-2.6.22/drivers/firewire/fw-sbp2.c
===================================================================
--- linux-2.6.22.orig/drivers/firewire/fw-sbp2.c
+++ linux-2.6.22/drivers/firewire/fw-sbp2.c
@@ -538,6 +538,30 @@ release_sbp2_device(struct kref *kref)
 	scsi_host_put(host);
 }
 
+static void
+complete_set_busy_timeout(struct fw_card *card, int rcode,
+			  void *payload, size_t length, void *data)
+{
+	if (rcode != RCODE_COMPLETE)
+		fw_error("set_busy_timeout: rcode %x\n", rcode);
+	complete((struct completion *)data);
+}
+
+static void sbp2_set_busy_timeout(struct sbp2_device *sd)
+{
+	struct fw_device *device = fw_device(sd->unit->device.parent);
+	struct fw_transaction t;
+	struct completion done;
+	__be32 data = cpu_to_be32(0xf);
+
+	init_completion(&done);
+	fw_send_request(device->card, &t, TCODE_WRITE_QUADLET_REQUEST,
+			sd->node_id, sd->generation, device->node->max_speed,
+			0xfffff0000210ULL, &data, sizeof(data),
+			complete_set_busy_timeout, &done);
+	wait_for_completion(&done);
+}
+
 static void sbp2_login(struct work_struct *work)
 {
 	struct sbp2_device *sd =
@@ -587,10 +611,7 @@ static void sbp2_login(struct work_struc
 	fw_notify(" - status write address:        0x%012llx\n",
 		  (unsigned long long) sd->address_handler.offset);
 
-#if 0
-	/* FIXME: The linux1394 sbp2 does this last step. */
-	sbp2_set_busy_timeout(scsi_id);
-#endif
+	sbp2_set_busy_timeout(sd);
 
 	PREPARE_DELAYED_WORK(&sd->work, sbp2_reconnect);
 	sbp2_agent_reset(unit);
@@ -752,6 +773,7 @@ static void sbp2_reconnect(struct work_s
 
 	fw_notify("reconnected to unit %s (%d retries)\n",
 		  unit->device.bus_id, sd->retries);
+	sbp2_set_busy_timeout(sd);
 	sbp2_agent_reset(unit);
 	sbp2_cancel_orbs(unit);
 	kref_put(&sd->kref, release_sbp2_device);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-24 19:56               ` Stefan Richter
@ 2007-07-25  5:12                 ` Manuel Lauss
  2007-07-25 16:27                   ` Stefan Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Manuel Lauss @ 2007-07-25  5:12 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Andreas Micklei, Andreas Messer, linux-kernel, linux1394-devel

On Tue, Jul 24, 2007 at 09:56:59PM +0200, Stefan Richter wrote:
> Manuel Lauss wrote:
> > Actually, copying data to the disk while playing/seeking through a moviefile
> > which is also located on it is already enough. Forget the NFS thing...
> > 
> > Afterwards the firewire_sbp2 module has to be rmmod-ed and modprobed again
> > or it will continue to throw errors even for single reads.
> > 
> > I hope this helps tracking it down...
> 
> I tried this and similar tests on my main PC (PCIe based) and on an
> Athlon/KM266 PC, with 1394b and 1394a hardware.  Nothing happened,
> except for a single "status write for unknown orb", followed by command
> abort from which the disk immediately recovered.  I did many tests and
> it didn't happen again.  I.e. it's probable that the supposed bug
> happens here too, but very rarely.

I tried 2.6.23 in the meantime, it's *MUCH* harder to trigger; in fact
I had to skip through movies for ~10 minutes to get the orb timeout.
The disk was inaccessible for a few seconds then recovered fine.

> Could you (and everyone else who has repeated I/O errors with the new
> drivers, but not with the old drivers) test the attached patches, one
> patch at a time?  They apply to 2.6.22.

Will do.

Thanks,
	Manuel Lauss

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] firewire: mass-storage i/o-problems
  2007-07-25  5:12                 ` Manuel Lauss
@ 2007-07-25 16:27                   ` Stefan Richter
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Richter @ 2007-07-25 16:27 UTC (permalink / raw)
  To: Manuel Lauss
  Cc: Andreas Micklei, Andreas Messer, linux-kernel, linux1394-devel

Manuel Lauss wrote:
> I tried 2.6.23 in the meantime, it's *MUCH* harder to trigger; in fact
> I had to skip through movies for ~10 minutes to get the orb timeout.
> The disk was inaccessible for a few seconds then recovered fine.
> 
>> Could you (and everyone else who has repeated I/O errors with the new
>> drivers, but not with the old drivers) test the attached patches, one
>> patch at a time?  They apply to 2.6.22.
> 
> Will do.

There are a few firewire fixes in 2.6.23-rc1 relative to 2.6.22, but I
don't believe they've got something to do with this issue.  Hence it
would be practical to test the patches on 2.6.22, but if you prefer you
could also test them on 2.6.23-*.  Thanks,
-- 
Stefan Richter
-=====-=-=== -=== ==--=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-07-25 16:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-22 21:32 [BUG] firewire: mass-storage i/o-problems Andreas Messer
2007-07-22 23:34 ` Stefan Richter
2007-07-23  6:37   ` Manuel Lauss
2007-07-23  7:40     ` Stefan Richter
2007-07-23  8:45       ` Manuel Lauss
2007-07-23 11:34         ` Stefan Richter
2007-07-23 18:33           ` Manuel Lauss
2007-07-23 18:44             ` Manuel Lauss
2007-07-24 19:56               ` Stefan Richter
2007-07-25  5:12                 ` Manuel Lauss
2007-07-25 16:27                   ` Stefan Richter
2007-07-23  8:07   ` Andreas Messer
2007-07-23  9:58     ` Andreas Micklei
2007-07-23 11:30     ` Stefan Richter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.