linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sym53c8xx problem
@ 2001-11-06  2:44 Berkan Eskikaya
  2001-11-06 17:56 ` Gérard Roudier
  0 siblings, 1 reply; 3+ messages in thread
From: Berkan Eskikaya @ 2001-11-06  2:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: berkan

Hi,

Hardware and driver details are at the end; first, the problem:

I've been getting messages like these in the kernel logs on one of our
colocated servers:

sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
sym53c875E-0: script cmd = e21c0004            
sym53c875E-0: regdump: da 00 00 9d 47 10 00 07 04 08 80 00 80 00 0f 02 63 00 00 00 02 ff ff ff. 
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)              
scsi : aborting command due to timeout : pid 7737, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 00 3c 80 00 00 80 00 
sym53c8xx_abort: pid=7737 serial_number=7752 serial_number_at_timeout=7752                  
SCSI host 0 abort (pid 7737) timed out - resetting                                
SCSI bus is being reset for host 0 channel 0.                                     
sym53c8xx_reset: pid=7737 reset_flags=2 serial_number=7752 serial_number_at_timeout=7752      


I thought this might be due to a bad card / cable so yesterday the ISP
replaced the SCSI card and cable with another pair. This morning I've 
got these in the logs:


sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
sym53c875E-0: script cmd = e21c0004
sym53c875E-0: regdump: da 00 00 9d 47 10 00 0f 00 08 80 00 80 00 07 02 63 00 00 00 02 ff ff ff.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
sym53c875E-0: interrupted SCRIPT address not found.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)


I'd really appreciate if somebody could shed some light on this and
recommend a solution. This has already caused a filesystem corruption
and a forced reboot.

I'm not on the list, so please Cc to berkan@runtime-collective.com
if you reply.

Cheers,

Berkan


Kernel: Linux 2.2.20pre11 for Intel x86

SCSI hardware and driver: 

[relevant bits from boot messages]

sym53c8xx: at PCI bus 0, device 11, function 0
sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
sym53c8xx: 53c875E detected with Tekram NVRAM
sym53c875E-0: rev 0x26 on pci bus 0 device 11 function 0 irq 10
sym53c875E-0: Tekram format NVRAM, ID 7, Fast-20, Parity Checking
scsi0 : sym53c8xx-1.7.1-20000726
scsi : 1 host.
  Vendor: IBM       Model: DDYS-T09170N      Rev: S80D
  Type:   Direct-Access                      ANSI SCSI revision: 03
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
scsi : detected 1 SCSI disk total.
sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7 GB]

[from lspci -v]

00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
        Subsystem: Symbios Logic Inc. (formerly NCR): Unknown device 1000
        Flags: bus master, medium devsel, latency 32, IRQ 10
        I/O ports at b800
        Memory at da800000 (32-bit, non-prefetchable)
        Memory at da000000 (32-bit, non-prefetchable)
        Capabilities: [40] Power Management version 1



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sym53c8xx problem
  2001-11-06  2:44 sym53c8xx problem Berkan Eskikaya
@ 2001-11-06 17:56 ` Gérard Roudier
  2001-11-07 11:53   ` Berkan Eskikaya
  0 siblings, 1 reply; 3+ messages in thread
From: Gérard Roudier @ 2001-11-06 17:56 UTC (permalink / raw)
  To: Berkan Eskikaya; +Cc: linux-kernel


On Tue, 6 Nov 2001, Berkan Eskikaya wrote:

> Hi,
>
> Hardware and driver details are at the end; first, the problem:
>
> I've been getting messages like these in the kernel logs on one of our
> colocated servers:
>
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
> sym53c875E-0: script cmd = e21c0004
> sym53c875E-0: regdump: da 00 00 9d 47 10 00 07 04 08 80 00 80 00 0f 02
>                        63 00 00 00 02 ff ff ff.
                         ^^^^^^^^^^^

This IO register value (DSA = Data Struture Address) has been loaded with
some corrupted value . It should look like a valid 32 bit aligned bus
physical address pointing to main memory (assumed little-endian byte
order).

This happens from SCSI SCRIPTS at this place:

	SCR_LOAD_ABS (dsa, 4),
		PADDRH (startpos),   <--- Corrupted value loaded into DSA
	SCR_LOAD_REL (temp, 4),      <--- Instruction that faulted
		4,
}/*-------------------------< GETJOB_BEGIN >------------------*/,{
	SCR_STORE_ABS (temp, 4),
		PADDRH (startpos),

The corrupted value (from startpos) is taken from a SCRIPTS (scripth) area
that stays in main memory. This memory location gets corrupted by the 32
bit value 0x00000063 (rotated from the dump, since little-endian is
assumed).

> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> scsi : aborting command due to timeout : pid 7737, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 00 3c 80 00 00 80 00
> sym53c8xx_abort: pid=7737 serial_number=7752 serial_number_at_timeout=7752
> SCSI host 0 abort (pid 7737) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> sym53c8xx_reset: pid=7737 reset_flags=2 serial_number=7752 serial_number_at_timeout=7752
>
>
> I thought this might be due to a bad card / cable so yesterday the ISP
> replaced the SCSI card and cable with another pair. This morning I've
> got these in the logs:
>
>
> sym53c875E-0: interrupted SCRIPT address not found.
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This just above driver complaint has never been seen since day one I care
about the ncr/sym53c8xx drivers. It only can happen if some SCRIPTS area
has been corrupted.

What part did screw up this memory and more generally memory allocated by
the driver ? this is obviously the question. As main memory is shared by
all the kernel code, there are numerous candidates (driver included,
obviously).

Since this does not look like a known problem, I cannot actually help you
a lot. Anyway, I would recommend you to first update your kernel to a
supported version. Final kernel 2.2.20 is what you likely need, it seems.

Once done, if the problem does not go away, I would recommend you to
update the driver to a more recent version. Choices are:

1) sym53c8xx-1.7.3c
2) sym-2.1.16b

Choice #2 works great and is the latest available version. It is a major
version now 2 of sym53c8xx that was version 1. It is not yet widely used
but haven't any glitch be reported (yet) by people using it.

ftp://ftp.tux.org/pub/roudier/drivers/linux/experimental/sym-2.1.16b-for-linx-2.2.20.patch.gz


> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0: interrupted SCRIPT address not found.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0:0: ERROR (81:0) (8-0-0) (10/9d) @ (script 38:f31c0004).
> sym53c875E-0: script cmd = e21c0004
> sym53c875E-0: regdump: da 00 00 9d 47 10 00 0f 00 08 80 00 80 00 07 02 63 00 00 00 02 ff ff ff.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> sym53c875E-0: interrupted SCRIPT address not found.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
>

Looks exactly same problem.

> I'd really appreciate if somebody could shed some light on this and
> recommend a solution. This has already caused a filesystem corruption
> and a forced reboot.
>
> I'm not on the list, so please Cc to berkan@runtime-collective.com
> if you reply.

Will do.

Regards,
  Gérard.

> Cheers,
>
> Berkan
>
>
> Kernel: Linux 2.2.20pre11 for Intel x86
>
> SCSI hardware and driver:
>
> [relevant bits from boot messages]
>
> sym53c8xx: at PCI bus 0, device 11, function 0
> sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up)
> sym53c8xx: 53c875E detected with Tekram NVRAM
> sym53c875E-0: rev 0x26 on pci bus 0 device 11 function 0 irq 10
> sym53c875E-0: Tekram format NVRAM, ID 7, Fast-20, Parity Checking
> scsi0 : sym53c8xx-1.7.1-20000726
> scsi : 1 host.
>   Vendor: IBM       Model: DDYS-T09170N      Rev: S80D
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi : detected 1 SCSI disk total.
> sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16)
> SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7 GB]
>
> [from lspci -v]
>
> 00:0b.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
>         Subsystem: Symbios Logic Inc. (formerly NCR): Unknown device 1000
>         Flags: bus master, medium devsel, latency 32, IRQ 10
>         I/O ports at b800
>         Memory at da800000 (32-bit, non-prefetchable)
>         Memory at da000000 (32-bit, non-prefetchable)
>         Capabilities: [40] Power Management version 1
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sym53c8xx problem
  2001-11-06 17:56 ` Gérard Roudier
@ 2001-11-07 11:53   ` Berkan Eskikaya
  0 siblings, 0 replies; 3+ messages in thread
From: Berkan Eskikaya @ 2001-11-07 11:53 UTC (permalink / raw)
  To: Gerard Roudier; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

Hi Gerard, 

Thanks for the explanation. The kernel we're running
has been patched for better Oracle performance. The
patch --attached to this mail-- effects SHMMAX, SHMMNI, 
SHMSEG, SEMMNI, SEMMSL, SEMMNS, SEMUME, SEMMNU, and SEMMAP; 
and I just saw a comment in include/asm-i386/shmparam.h 
saying not to touch the default SHMMAX because people 
depend on it.

Maybe this explains the mystery?

Cheers,

Berkan

[-- Attachment #2: patch-2.2.19-ora8i --]
[-- Type: text/plain, Size: 2368 bytes --]

diff -ru linux/include/asm-i386/shmparam.h linux.ora8i/include/asm-i386/shmparam.h
--- linux/include/asm-i386/shmparam.h	Sun Mar 25 17:31:05 2001
+++ linux.ora8i/include/asm-i386/shmparam.h	Mon Oct  1 11:22:04 2001
@@ -33,14 +33,14 @@
  * SHMMAX <= (PAGE_SIZE << _SHM_IDX_BITS).
  */
 
-#define SHMMAX 0x2000000		/* max shared seg size (bytes) */
+#define SHMMAX 0x40000000		/* max shared seg size (bytes) */
 /* Try not to change the default shipped SHMMAX - people rely on it */
 
 #define SHMMIN 1 /* really PAGE_SIZE */	/* min shared seg size (bytes) */
-#define SHMMNI (1<<_SHM_ID_BITS)	/* max num of segs system wide */
+#define SHMMNI 200	/* max num of segs system wide */
 #define SHMALL				/* max shm system wide (pages) */ \
 	(1<<(_SHM_IDX_BITS+_SHM_ID_BITS))
 #define	SHMLBA PAGE_SIZE		/* attach addr a multiple of this */
-#define SHMSEG SHMMNI			/* max shared segs per process */
+#define SHMSEG 100			/* max shared segs per process */
 
 #endif /* _ASMI386_SHMPARAM_H */
diff -ru linux/include/linux/sem.h linux.ora8i/include/linux/sem.h
--- linux/include/linux/sem.h	Sun Mar 25 17:31:03 2001
+++ linux.ora8i/include/linux/sem.h	Mon Oct  1 11:22:44 2001
@@ -60,17 +60,17 @@
 	int semaem;
 };
 
-#define SEMMNI  128             /* ?  max # of semaphore identifiers */
-#define SEMMSL  250              /* <= 512 max num of semaphores per id */
-#define SEMMNS  (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
+#define SEMMNI  256             /* ?  max # of semaphore identifiers */
+#define SEMMSL  256             /* <= 512 max num of semaphores per id */
+#define SEMMNS  2048            /* ? max # of semaphores in system */
 #define SEMOPM  32	        /* ~ 100 max num of ops per semop call */
 #define SEMVMX  32767           /* semaphore maximum value */
 
 /* unused */
-#define SEMUME  SEMOPM          /* max num of undo entries per process */
-#define SEMMNU  SEMMNS          /* num of undo structures system wide */
+#define SEMUME  10              /* max num of undo entries per process */
+#define SEMMNU  30              /* num of undo structures system wide */
 #define SEMAEM  (SEMVMX >> 1)   /* adjust on exit max value */
-#define SEMMAP  SEMMNS          /* # of entries in semaphore map */
+#define SEMMAP  10              /* # of entries in semaphore map */
 #define SEMUSZ  20		/* sizeof struct sem_undo */
 
 #ifdef __KERNEL__

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-11-07 11:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-06  2:44 sym53c8xx problem Berkan Eskikaya
2001-11-06 17:56 ` Gérard Roudier
2001-11-07 11:53   ` Berkan Eskikaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).