* sym53c8xx problem
@ 2001-11-06 2:44 Berkan Eskikaya
2001-11-06 17:56 ` Gérard Roudier
0 siblings, 1 reply; 3+ messages in thread
From: Berkan Eskikaya @ 2001-11-06 2:44 UTC (permalink / raw)
To: linux-kernel; +Cc: berkan
Hi,
Hardware and driver details are at the end; first, the problem:
I've been getting messages like these in the kernel logs on one of our
colocated servers:
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
sym53c875E-0: script cmd = e21c0004
sym53c875E-0: regdump: da 00 00 9d 47 10 00 07 04 08 80 00 80 00 0f 02 63 00 00 00 02 ff ff ff.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
scsi : aborting command due to timeout : pid 7737, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 00 3c 80 00 00 80 00
sym53c8xx_abort: pid=7737 serial_number=7752 serial_number_at_timeout=7752
SCSI host 0 abort (pid 7737) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
sym53c8xx_reset: pid=7737 reset_flags=2 serial_number=7752 serial_number_at_timeout=7752
I thought this might be due to a bad card / cable so yesterday the ISP
replaced the SCSI card and cable with another pair. This morning I've
got these in the logs:
sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
sym53c875E-0: script cmd = e21c0004
sym53c875E-0: regdump: da 00 00 9d 47 10 00 0f 00 08 80 00 80 00 07 02 63 00 00 00 02 ff ff ff.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
I'd really appreciate if somebody could shed some light on this and
recommend a solution. This has already caused a filesystem corruption
and a forced reboot.
I'm not on the list, so please Cc to berkan@runtime-collective.com
if you reply.
Cheers,
Berkan
Kernel: Linux 2.2.20pre11 for Intel x86
SCSI hardware and driver:
[relevant bits from boot messages]
sym53c8xx: at PCI bus 0, device 11, function 0
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c875E detected with Tekram NVRAM
sym53c875E-0: rev 0x26 on pci bus 0 device 11 function 0 irq 10
sym53c875E-0: Tekram format NVRAM, ID 7, Fast-20, Parity Checking
scsi0 : sym53c8xx-1.7.1-20000726
scsi : 1 host.
Vendor: IBM Model: DDYS-T09170N Rev: S80D
Type: Direct-Access ANSI SCSI revision: 03
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi : detected 1 SCSI disk total.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7 GB]
[from lspci -v]
00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
Subsystem: Symbios Logic Inc. (formerly NCR): Unknown device 1000
Flags: bus master, medium devsel, latency 32, IRQ 10
I/O ports at b800
Memory at da800000 (32-bit, non-prefetchable)
Memory at da000000 (32-bit, non-prefetchable)
Capabilities: [40] Power Management version 1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: sym53c8xx problem
2001-11-06 2:44 sym53c8xx problem Berkan Eskikaya
@ 2001-11-06 17:56 ` Gérard Roudier
2001-11-07 11:53 ` Berkan Eskikaya
0 siblings, 1 reply; 3+ messages in thread
From: Gérard Roudier @ 2001-11-06 17:56 UTC (permalink / raw)
To: Berkan Eskikaya; +Cc: linux-kernel
On Tue, 6 Nov 2001, Berkan Eskikaya wrote:
> Hi,
>
> Hardware and driver details are at the end; first, the problem:
>
> I've been getting messages like these in the kernel logs on one of our
> colocated servers:
>
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
> sym53c875E-0: script cmd = e21c0004
> sym53c875E-0: regdump: da 00 00 9d 47 10 00 07 04 08 80 00 80 00 0f 02
> 63 00 00 00 02 ff ff ff.
^^^^^^^^^^^
This IO register value (DSA = Data Struture Address) has been loaded with
some corrupted value . It should look like a valid 32 bit aligned bus
physical address pointing to main memory (assumed little-endian byte
order).
This happens from SCSI SCRIPTS at this place:
SCR_LOAD_ABS (dsa, 4),
PADDRH (startpos), <--- Corrupted value loaded into DSA
SCR_LOAD_REL (temp, 4), <--- Instruction that faulted
4,
}/*-------------------------< GETJOB_BEGIN >------------------*/,{
SCR_STORE_ABS (temp, 4),
PADDRH (startpos),
The corrupted value (from startpos) is taken from a SCRIPTS (scripth) area
that stays in main memory. This memory location gets corrupted by the 32
bit value 0x00000063 (rotated from the dump, since little-endian is
assumed).
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> scsi : aborting command due to timeout : pid 7737, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 00 3c 80 00 00 80 00
> sym53c8xx_abort: pid=7737 serial_number=7752 serial_number_at_timeout=7752
> SCSI host 0 abort (pid 7737) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> sym53c8xx_reset: pid=7737 reset_flags=2 serial_number=7752 serial_number_at_timeout=7752
>
>
> I thought this might be due to a bad card / cable so yesterday the ISP
> replaced the SCSI card and cable with another pair. This morning I've
> got these in the logs:
>
>
> sym53c875E-0: interrupted SCRIPT address not found.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This just above driver complaint has never been seen since day one I care
about the ncr/sym53c8xx drivers. It only can happen if some SCRIPTS area
has been corrupted.
What part did screw up this memory and more generally memory allocated by
the driver ? this is obviously the question. As main memory is shared by
all the kernel code, there are numerous candidates (driver included,
obviously).
Since this does not look like a known problem, I cannot actually help you
a lot. Anyway, I would recommend you to first update your kernel to a
supported version. Final kernel 2.2.20 is what you likely need, it seems.
Once done, if the problem does not go away, I would recommend you to
update the driver to a more recent version. Choices are:
1) sym53c8xx-1.7.3c
2) sym-2.1.16b
Choice #2 works great and is the latest available version. It is a major
version now 2 of sym53c8xx that was version 1. It is not yet widely used
but haven't any glitch be reported (yet) by people using it.
ftp://ftp.tux.org/pub/roudier/drivers/linux/experimental/sym-2.1.16b-for-linx-2.2.20.patch.gz
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0: interrupted SCRIPT address not found.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
> sym53c875E-0: script cmd = e21c0004
> sym53c875E-0: regdump: da 00 00 9d 47 10 00 0f 00 08 80 00 80 00 07 02 63 00 00 00 02 ff ff ff.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0: interrupted SCRIPT address not found.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
>
Looks exactly same problem.
> I'd really appreciate if somebody could shed some light on this and
> recommend a solution. This has already caused a filesystem corruption
> and a forced reboot.
>
> I'm not on the list, so please Cc to berkan@runtime-collective.com
> if you reply.
Will do.
Regards,
Gérard.
> Cheers,
>
> Berkan
>
>
> Kernel: Linux 2.2.20pre11 for Intel x86
>
> SCSI hardware and driver:
>
> [relevant bits from boot messages]
>
> sym53c8xx: at PCI bus 0, device 11, function 0
> sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
> sym53c8xx: 53c875E detected with Tekram NVRAM
> sym53c875E-0: rev 0x26 on pci bus 0 device 11 function 0 irq 10
> sym53c875E-0: Tekram format NVRAM, ID 7, Fast-20, Parity Checking
> scsi0 : sym53c8xx-1.7.1-20000726
> scsi : 1 host.
> Vendor: IBM Model: DDYS-T09170N Rev: S80D
> Type: Direct-Access ANSI SCSI revision: 03
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi : detected 1 SCSI disk total.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7 GB]
>
> [from lspci -v]
>
> 00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
> Subsystem: Symbios Logic Inc. (formerly NCR): Unknown device 1000
> Flags: bus master, medium devsel, latency 32, IRQ 10
> I/O ports at b800
> Memory at da800000 (32-bit, non-prefetchable)
> Memory at da000000 (32-bit, non-prefetchable)
> Capabilities: [40] Power Management version 1
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: sym53c8xx problem
2001-11-06 17:56 ` Gérard Roudier
@ 2001-11-07 11:53 ` Berkan Eskikaya
0 siblings, 0 replies; 3+ messages in thread
From: Berkan Eskikaya @ 2001-11-07 11:53 UTC (permalink / raw)
To: Gerard Roudier; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 413 bytes --]
Hi Gerard,
Thanks for the explanation. The kernel we're running
has been patched for better Oracle performance. The
patch --attached to this mail-- effects SHMMAX, SHMMNI,
SHMSEG, SEMMNI, SEMMSL, SEMMNS, SEMUME, SEMMNU, and SEMMAP;
and I just saw a comment in include/asm-i386/shmparam.h
saying not to touch the default SHMMAX because people
depend on it.
Maybe this explains the mystery?
Cheers,
Berkan
[-- Attachment #2: patch-2.2.19-ora8i --]
[-- Type: text/plain, Size: 2368 bytes --]
diff -ru linux/include/asm-i386/shmparam.h linux.ora8i/include/asm-i386/shmparam.h
--- linux/include/asm-i386/shmparam.h Sun Mar 25 17:31:05 2001
+++ linux.ora8i/include/asm-i386/shmparam.h Mon Oct 1 11:22:04 2001
@@ -33,14 +33,14 @@
* SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS).
*/
-#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
+#define SHMMAX 0x40000000 /* max shared seg size (bytes) */
/* Try not to change the default shipped SHMMAX - people rely on it */
#define SHMMIN 1 /* really PAGE_SIZE */ /* min shared seg size (bytes) */
-#define SHMMNI (1<<_SHM_ID_BITS) /* max num of segs system wide */
+#define SHMMNI 200 /* max num of segs system wide */
#define SHMALL /* max shm system wide (pages) */ \
(1<<(_SHM_IDX_BITS+_SHM_ID_BITS))
#define SHMLBA PAGE_SIZE /* attach addr a multiple of this */
-#define SHMSEG SHMMNI /* max shared segs per process */
+#define SHMSEG 100 /* max shared segs per process */
#endif /* _ASMI386_SHMPARAM_H */
diff -ru linux/include/linux/sem.h linux.ora8i/include/linux/sem.h
--- linux/include/linux/sem.h Sun Mar 25 17:31:03 2001
+++ linux.ora8i/include/linux/sem.h Mon Oct 1 11:22:44 2001
@@ -60,17 +60,17 @@
int semaem;
};
-#define SEMMNI 128 /* ? max # of semaphore identifiers */
-#define SEMMSL 250 /* <= 512 max num of semaphores per id */
-#define SEMMNS (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
+#define SEMMNI 256 /* ? max # of semaphore identifiers */
+#define SEMMSL 256 /* <= 512 max num of semaphores per id */
+#define SEMMNS 2048 /* ? max # of semaphores in system */
#define SEMOPM 32 /* ~ 100 max num of ops per semop call */
#define SEMVMX 32767 /* semaphore maximum value */
/* unused */
-#define SEMUME SEMOPM /* max num of undo entries per process */
-#define SEMMNU SEMMNS /* num of undo structures system wide */
+#define SEMUME 10 /* max num of undo entries per process */
+#define SEMMNU 30 /* num of undo structures system wide */
#define SEMAEM (SEMVMX >> 1) /* adjust on exit max value */
-#define SEMMAP SEMMNS /* # of entries in semaphore map */
+#define SEMMAP 10 /* # of entries in semaphore map */
#define SEMUSZ 20 /* sizeof struct sem_undo */
#ifdef __KERNEL__
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2001-11-07 11:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-06 2:44 sym53c8xx problem Berkan Eskikaya
2001-11-06 17:56 ` Gérard Roudier
2001-11-07 11:53 ` Berkan Eskikaya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).