* sym53c8xx parity errors on SuSE 9.1's hwscan?
@ 2004-09-22 23:16 Matthias Andree
2004-09-23 2:39 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: Matthias Andree @ 2004-09-22 23:16 UTC (permalink / raw)
To: linux-scsi; +Cc: Matthew Wilcox
Greetings,
SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
tool to scan for hardware.
Whenever this tool, probing the PCI bus, hits my Tekram DC-390U or
DC-310U, the box logs a SCSI parity error, some timed out abort
messages, then the usual reset escalation; device reset (times out), bus
reset (times out), finally HBA reset (succeeds). The whole procedure
takes about 1 to 2 minutes before the bus is usable again.
If the machine is idle and has warm caches, I may occasionally see just
the parity error message, so it may look a bit like a race in sym53c8xx
or perhaps the hardware.
The problem seems to persist through current versions, although they can
_usually_ just fix up a phase error and continue immediately.
The problem did not exist when I was running 2.6.7 on SuSE 8.2.
I find it a bit intimidating that user-space (albeit with root
permissions) causes "SCSI" parity errors, and given the 2.6.9 logging
towards the end of the mail, I am wondering if SuSE's hwinfo stuff
triggers some race condition or manages to bypass the SCSI phase state
machine or if the probe confuses the chip. I haven't yet managed to
isolate (with strace) the cause.
Is there a useful debug setting for sym53c8xx that could shed some light
on what the user-space has attempted that led to the SCSI parity error?
Log from SuSE's 2.6.5-7.108-default during system boot-up:
Sep 22 15:15:09 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 15:15:39 merlin kernel: sym0:1:0: ABORT operation started.
Sep 22 15:15:44 merlin kernel: sym0:1:0: ABORT operation timed-out.
Sep 22 15:15:44 merlin kernel: sym0:1:0: ABORT operation started.
Sep 22 15:15:49 merlin kernel: sym0:1:0: ABORT operation timed-out.
Sep 22 15:15:49 merlin kernel: sym0:1:0: DEVICE RESET operation started.
Sep 22 15:15:54 merlin kernel: sym0:1:0: DEVICE RESET operation timed-out.
Sep 22 15:15:54 merlin kernel: sym0:1:0: BUS RESET operation started.
Sep 22 15:15:54 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 15:15:54 merlin kernel: sym0: SCSI BUS has been reset.
Sep 22 15:15:54 merlin kernel: sym0:1:0: BUS RESET operation complete.
and a while later:
Sep 22 15:18:32 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 15:18:32 merlin kernel: sym0: interrupted SCRIPT address not found.
Sep 22 15:18:32 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 15:18:32 merlin kernel: sym0: SCSI BUS has been reset.
2.6.9-rc2-mm1, although otherwise not useful for me (UDP networking)
logs this instead:
Sep 22 14:00:05 merlin kernel: sym0: SCSI parity error detected: SCR1=3 DBC=50000000 SBCL=0
Sep 22 14:00:05 merlin kernel: sym0: SCSI phase error fixup: CCB already dequeued.
Sep 22 14:00:05 merlin kernel: sym0: SCSI BUS reset detected.
Sep 22 14:00:05 merlin kernel: sym0: SCSI BUS has been reset.
--
Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sym53c8xx parity errors on SuSE 9.1's hwscan?
2004-09-22 23:16 sym53c8xx parity errors on SuSE 9.1's hwscan? Matthias Andree
@ 2004-09-23 2:39 ` Matthew Wilcox
2004-09-23 7:42 ` Olaf Hering
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Matthew Wilcox @ 2004-09-23 2:39 UTC (permalink / raw)
To: Matthias Andree; +Cc: linux-scsi, Matthew Wilcox
On Thu, Sep 23, 2004 at 01:16:51AM +0200, Matthias Andree wrote:
> SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
> milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
> tool to scan for hardware.
Does anyone have the source? I'd be interested to see what it's up to.
> I find it a bit intimidating that user-space (albeit with root
> permissions) causes "SCSI" parity errors, and given the 2.6.9 logging
> towards the end of the mail, I am wondering if SuSE's hwinfo stuff
> triggers some race condition or manages to bypass the SCSI phase state
> machine or if the probe confuses the chip. I haven't yet managed to
> isolate (with strace) the cause.
I have a suggestion. If the probe attempts to size the BARs of the
chip, this is a destructive process that could well cause the chip to
start spewing errors are require a reset to work again.
> Is there a useful debug setting for sym53c8xx that could shed some light
> on what the user-space has attempted that led to the SCSI parity error?
The trouble is that I suspect the probe is completely bypassing the driver.
It might be worth instrumenting drivers/pci/proc.c to see if it's writing
to any of the BARs (particularly the second memory BAR, the one that's 8k).
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sym53c8xx parity errors on SuSE 9.1's hwscan?
2004-09-23 2:39 ` Matthew Wilcox
@ 2004-09-23 7:42 ` Olaf Hering
2004-09-23 8:51 ` Matthias Andree
2004-09-23 8:45 ` Matthias Andree
2004-10-25 8:16 ` Matthias Andree
2 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2004-09-23 7:42 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Matthias Andree, linux-scsi
On Thu, Sep 23, Matthew Wilcox wrote:
> On Thu, Sep 23, 2004 at 01:16:51AM +0200, Matthias Andree wrote:
> > SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
> > milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
> > tool to scan for hardware.
>
> Does anyone have the source? I'd be interested to see what it's up to.
ftp.suse.com/pub/projects/kernel/kotd
--
USB is for mice, FireWire is for men!
sUse lINUX ag, nÜRNBERG
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sym53c8xx parity errors on SuSE 9.1's hwscan?
2004-09-23 2:39 ` Matthew Wilcox
2004-09-23 7:42 ` Olaf Hering
@ 2004-09-23 8:45 ` Matthias Andree
2004-10-25 8:16 ` Matthias Andree
2 siblings, 0 replies; 6+ messages in thread
From: Matthias Andree @ 2004-09-23 8:45 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Matthias Andree, linux-scsi
Matthew Wilcox <matthew@wil.cx> writes:
> On Thu, Sep 23, 2004 at 01:16:51AM +0200, Matthias Andree wrote:
>> SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
>> milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
>> tool to scan for hardware.
>
> Does anyone have the source? I'd be interested to see what it's up
> to.
It's GPL'd stuff, get the source (505,597 bytes) from
ftp://ftp.suse.com/pub/suse/i386/9.1/suse/src/hwinfo-8.38-0.src.rpm
Just run dd reading from some SCSI drive and hwinfo --all. Don't do
that if you need the machine in the following two minutes...
While testing, I got one more variant today:
Sep 23 10:13:58 merlin kernel: sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
Sep 23 10:13:58 merlin kernel: sym0:1: ERROR (81:0) (8-0-0) (10/9d/0) @ (mem 48000818:ffffffff).
Sep 23 10:13:58 merlin kernel: sym0: regdump: da 00 00 9d 47 10 01 07 00 08 81 00 80 00 0f 0a ff 90 e4 0d 02 ff ff ff.
Sep 23 10:13:58 merlin kernel: sym0: SCSI BUS reset detected.
Sep 23 10:13:58 merlin kernel: sym0: SCSI BUS has been reset.
> I have a suggestion. If the probe attempts to size the BARs of the
> chip, this is a destructive process that could well cause the chip to
> start spewing errors are require a reset to work again.
The "BAR" stuff goes well over my head at this time.
--
Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sym53c8xx parity errors on SuSE 9.1's hwscan?
2004-09-23 7:42 ` Olaf Hering
@ 2004-09-23 8:51 ` Matthias Andree
0 siblings, 0 replies; 6+ messages in thread
From: Matthias Andree @ 2004-09-23 8:51 UTC (permalink / raw)
To: Olaf Hering; +Cc: Matthew Wilcox, Matthias Andree, linux-scsi
Olaf Hering <olh@suse.de> writes:
> On Thu, Sep 23, Matthew Wilcox wrote:
>
>> On Thu, Sep 23, 2004 at 01:16:51AM +0200, Matthias Andree wrote:
>> > SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
>> > milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
>> > tool to scan for hardware.
>>
>> Does anyone have the source? I'd be interested to see what it's up to.
>
> ftp.suse.com/pub/projects/kernel/kotd
Given that a vanilla 2.6.7 kernel also shows the problem, I presume
Matthew was interested in the source of hwinfo rather than the kernel
URL. Thanks anyways.
--
Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sym53c8xx parity errors on SuSE 9.1's hwscan?
2004-09-23 2:39 ` Matthew Wilcox
2004-09-23 7:42 ` Olaf Hering
2004-09-23 8:45 ` Matthias Andree
@ 2004-10-25 8:16 ` Matthias Andree
2 siblings, 0 replies; 6+ messages in thread
From: Matthias Andree @ 2004-10-25 8:16 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Matthias Andree, linux-scsi
Matthew Wilcox <matthew@wil.cx> writes:
> On Thu, Sep 23, 2004 at 01:16:51AM +0200, Matthias Andree wrote:
>> SuSE Linux 9.1 (kernel 2.6.5 + SuSE patch set, but also 2.6.7 or a bit
>> milder in 2.6.9-rc2-mm1) uses some SuSE-specific "hwprobe" or "hwinfo"
>> tool to scan for hardware.
>
> Does anyone have the source? I'd be interested to see what it's up
> to.
Matthew,
this thread apparently wasn't followed up to within the last four weeks,
so I'd thought I'd ask if you had the chance to make any progress on
this one.
--
Matthias Andree
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-10-25 8:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22 23:16 sym53c8xx parity errors on SuSE 9.1's hwscan? Matthias Andree
2004-09-23 2:39 ` Matthew Wilcox
2004-09-23 7:42 ` Olaf Hering
2004-09-23 8:51 ` Matthias Andree
2004-09-23 8:45 ` Matthias Andree
2004-10-25 8:16 ` Matthias Andree
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.