[Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
@ 2010-08-31  6:22 ` bugzilla-daemon
  2010-08-31 13:56 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-08-31  6:22 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646

--- Comment #35 from Ninad <nindate@gmail.com>  2010-08-31 06:21:54 ---
Dear Bernd Zeimetz,

Has there been a resolution yet on this issue?
When you said "pci=nomsi seems to work" - has the problem got resolved for you
using pci-nomsi?
I am using Oracle VM 2.2.1 and although we do not see the Mailbox command
timeout message, infact we have not got many messages at all in the
/var/log/messages - but we have observed IO getting stalled to few or most of
our LUNs configured as ocfs2 filesystems.
The port do not show as down (from the SAN logs we have checked) and the LUNs
getting stalled for a server - there is no problem from other servers to write
to that LUNs (as they are shared to other servers being ocfs2).

But for some reason - I am getting a feeling that our problem could well be the
reason of what you are facing, hence request some feedback from you.

Thanks,
Ninad

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
  2010-08-31  6:22 ` [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20 bugzilla-daemon
@ 2010-08-31 13:56 ` bugzilla-daemon
  2012-05-22 14:34 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-08-31 13:56 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #36 from Bernd Zeimetz <bzed@debian.org>  2010-08-31 10:45:16 ---
On 08/31/2010 08:22 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> Has there been a resolution yet on this issue?
> When you said "pci=nomsi seems to work" - has the problem got resolved for you
> using pci-nomsi?

pci=nomsi makes the machine work fine, indeed.
See Debian bug #572322 for details - the Debian Kernel ships a patch to allow
to
disable msi(-x) for the Qlogic cards now.

Cheers,

Bernd

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
  2010-08-31  6:22 ` [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20 bugzilla-daemon
  2010-08-31 13:56 ` bugzilla-daemon
@ 2012-05-22 14:34 ` bugzilla-daemon
  2012-10-30 15:12 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2012-05-22 14:34 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alan@lxorguk.ukuu.org.uk
     Kernel Version|2.6.26.5                    |2.6.32




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2012-05-22 14:34 ` bugzilla-daemon
@ 2012-10-30 15:12 ` bugzilla-daemon
  2014-07-29 19:59 ` bugzilla-daemon
  2014-07-29 20:22 ` bugzilla-daemon
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2012-10-30 15:12 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |OBSOLETE




--- Comment #37 from Alan <alan@lxorguk.ukuu.org.uk>  2012-10-30 15:12:24 ---
If this is still seen on modern kernels then please re-open/update

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2012-10-30 15:12 ` bugzilla-daemon
@ 2014-07-29 19:59 ` bugzilla-daemon
  2014-07-29 20:22 ` bugzilla-daemon
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2014-07-29 19:59 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646

Ravshan DM <mravshan@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mravshan@gmail.com

--- Comment #38 from Ravshan DM <mravshan@gmail.com> ---
I had this issue reproduced on my environment: 2.6.36.4 #848 SMP Thu Jul 17
19:55:17 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It took time to figure out the root cause, which turned to be a bad SFP, once I
replaced it, all on HBA qla2462, the FC switch recognized the FC port(s)
immediately and all LUNs re-appeared on the host. I hope this info helps.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
       [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2014-07-29 19:59 ` bugzilla-daemon
@ 2014-07-29 20:22 ` bugzilla-daemon
  5 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2014-07-29 20:22 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=11646

--- Comment #39 from Alan <alan@lxorguk.ukuu.org.uk> ---
Thanks

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (33 preceding siblings ...)
  2010-03-03  9:59 ` bugzilla-daemon
@ 2010-03-03 10:45 ` bugzilla-daemon
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-03-03 10:45 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #34 from Bernd Zeimetz <bzed@debian.org>  2010-03-03 10:45:41 ---
bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11646
> 
> 
> 
> 
> 
> --- Comment #33 from Csillag Tamas <cstamas@digitus.itk.ppke.hu>  2010-03-03 09:59:55 ---
> Dear Bernd Zeimetz,
> 
> Can you tell us what version your qlogic bios is?

The QLogic Bios was upgraded a few days ago to the latest versionfrom QLogic,
without any changes in the behaviour.

--------------------------------------------------------------------------------
HBA Instance 0: QLA2460 Port 1
--------------------------------------------------------------------------------
Product Identifier               : DS4000 FC 4Gb PCI-X Single Port HBA
Misc. Information                : PW=15W;PCI=66MHZ;PCI-X=266MHz
EFI Driver Version               : 2.04
Firmware Version                 : 4.06.02
BIOS Version                     : 2.10
FCode Version                    : 2.04

--------------------------------------------------------------------------------
HBA Instance 1: QLA2460 Port 1
--------------------------------------------------------------------------------
Product Identifier               : DS4000 FC 4Gb PCI-X Single Port HBA
Misc. Information                : PW=15W;PCI=66MHZ;PCI-X=266MHz
EFI Driver Version               : 2.04
Firmware Version                 : 4.06.02
BIOS Version                     : 2.10
FCode Version                    : 2.04

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (32 preceding siblings ...)
  2010-03-03  9:37 ` bugzilla-daemon
@ 2010-03-03  9:59 ` bugzilla-daemon
  2010-03-03 10:45 ` bugzilla-daemon
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-03-03  9:59 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #33 from Csillag Tamas <cstamas@digitus.itk.ppke.hu>  2010-03-03 09:59:55 ---
Dear Bernd Zeimetz,

Can you tell us what version your qlogic bios is?

If its not recent you can try to upgrade:
http://www-947.ibm.com/systems/support/supportsite.wss/selectproduct?familyind=5305593&typeind=0&osind=0&continue.x=18&continue.y=13&brandind=5000008&oldbrand=5000008&oldfamily=5305593&oldtype=0&taskind=2&matrix=Y&psid=bm#UpdateXpress%20System%20Pack

It seems that this the problem for me.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (31 preceding siblings ...)
  2010-01-31 22:06 ` bugzilla-daemon
@ 2010-03-03  9:37 ` bugzilla-daemon
  2010-03-03  9:59 ` bugzilla-daemon
  2010-03-03 10:45 ` bugzilla-daemon
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-03-03  9:37 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


Bernd Zeimetz <bzed@debian.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bzed@debian.org




--- Comment #32 from Bernd Zeimetz <bzed@debian.org>  2010-03-03 09:37:28 ---
IBM x3950 machines crash badly enough due to this bug that they reboot
instantly after loading the qla2xxx module.

Feb 24 10:33:51 dbsrv01 kernel: [   64.184483] qla2xxx 0000:02:01.0: Performing
ISP error recovery - ha= ffff81086b4e85f8.
Feb 24 10:33:51 dbsrv01 kernel: [   64.324785] scsi(1): **** Load RISC code
****
Feb 24 10:33:52 dbsrv01 kernel: [   64.366386] scsi(1): Verifying Checksum of
loaded RISC code.
Feb 24 10:33:52 dbsrv01 kernel: [   64.605869] scsi(1): Checksum OK, start
firmware.
Feb 24 10:33:52 dbsrv01 kernel: [   65.357677] scsi(1): Issue init firmware.
Feb 24 10:33:55 dbsrv01 kernel: [   71.130990] scsi(2): Loop Down - aborting
the queues before time expire
Feb 24 10:33:56 dbsrv01 kernel: [   73.202082] qla2x00_mailbox_command(2):
timeout calling abort_isp
Feb 24 10:33:56 dbsrv01 kernel: [   73.238667] qla2x00_mailbox_command(2):
timeout calling abort_isp
Feb 24 10:33:56 dbsrv01 kernel: [   73.281349] qla2xxx 0000:10:01.0: Mailbox
command timeout occured. Issuing ISP abort.
Feb 24 10:33:56 dbsrv01 kernel: [   73.333347] qla2xxx 0000:10:01.0: Performing
ISP error recovery - ha= ffff81105ccf05f8.
Feb 24 10:34:12 dbsrv01 kernel: [   95.516679] qla2xxx 0000:02:01.0: Cable is
unplugged...
Feb 24 10:34:12 dbsrv01 kernel: [   95.516679] scsi(1): fw_state=4 curr
time=ffff208e.
Feb 24 10:34:12 dbsrv01 kernel: [   95.516679] scsi(1): Firmware ready ****
FAILED ****.
Feb 24 10:34:12 dbsrv01 kernel: [   95.516679] qla2x00_restart_isp(): Configure
loop done, status = 0x0
Feb 24 10:34:13 dbsrv01 kernel: [   95.516679] qla2xxx 0000:02:01.0: ISP System
Error - mbx1=65h mbx2=2h mbx3=8080h.
Feb 24 10:34:13 dbsrv01 kernel: [   95.516679] qla2xxx 0000:02:01.0: Firmware
dump saved to temp buffer (1/ffffc20007f84000).
Feb 24 10:34:13 dbsrv01 kernel: [   95.516679] qla2x00_abort_isp(1): exiting.
Feb 24 10:34:13 dbsrv01 kernel: [   95.516679] qla2x00_mailbox_command(1):
finished abort_isp
Feb 24 10:34:13 dbsrv01 kernel: [   95.516679] qla2x00_mailbox_command(1):
finished abort_isp
Feb 24 10:34:13 dbsrv01 kernel: [   95.545239] qla2x00_mailbox_command(1): ****
FAILED. mbx0=69, mbx1=8023, mbx2=ffff, cmd=69 ****
Feb 24 10:34:13 dbsrv01 kernel: [   95.613508] qla2x00_get_firmware_state(1):
failed=100.
Feb 24 10:34:13 dbsrv01 kernel: [   95.620441] scsi(1): fw_state=8023 curr
time=ffff2118.
Feb 24 10:34:13 dbsrv01 kernel: [   95.625500] scsi(1): Firmware ready ****
FAILED ****.
Feb 24 10:34:13 dbsrv01 kernel: [   95.687879] scsi(1): qla2x00_loop_resync -
end
Feb 24 10:34:13 dbsrv01 kernel: [   96.232463] scsi(1): dpc: sched
qla2x00_abort_isp ha = ffff81086b4e85f8
Feb 24 10:34:13 dbsrv01 kernel: [   96.232463] qla2xxx 0000:02:01.0: Performing
ISP error recovery - ha= ffff81086b4e85f8.
Feb 24 10:34:13 dbsrv01 kernel: [   96.236463] Calgary: DMA error on Calgary
PHB 0x2, 0x02010000@CSR 0x00008000@PLSSR


Running the kernel with pci=nomsi seems to work, although we didn't test it
under load yet. The issue is still happening in Debian's 2.6.32, but
interestingly not in the Kernels from Redhat, I guess they still ship this
patch:
http://launchpadlibrarian.net/17517188/linux-2.6-scsi-qla2xxx-disable-msi-x-by-default.patch
Its a bit disappointing that this bug is still not handled by upstream properly
- its pretty much impossible to use recent, non-patched Kernels on a lot of
larger IBM machines together with QLogic hardware.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (30 preceding siblings ...)
  2010-01-29  0:46 ` bugzilla-daemon
@ 2010-01-31 22:06 ` bugzilla-daemon
  2010-03-03  9:37 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-01-31 22:06 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #31 from Csillag Tamas <cstamas@digitus.itk.ppke.hu>  2010-01-31 22:04:39 ---
Dear Andrew,

Now that is a good question. I remember that when I experienced this problem
the only solution that helped me was the kernel downgrade. As far as I remember
I tried the nomsi workaround but it did not helped (But I am *not* 100% sure on
this).

Is there a way to get the firmware version info from a live system (maybe
without reboot?)

For the old one I can get it from another server which is from the same order
as this one.

For the new:
I do not know if this info is sufficient:
-rwxr-xr-x 1 root root 1048576 Dec 11  2007 i24af143.bin
72ed710f260788aec4f725659bf54dcd  i24af143.bin

this is the file from the floppy used for the flashing.

If this does not help I can schedule a reboot and get it from the boot screen.

Thanks
--
Regards,
  CSILLAG Tamas

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (29 preceding siblings ...)
  2010-01-28 23:35 ` bugzilla-daemon
@ 2010-01-29  0:46 ` bugzilla-daemon
  2010-01-31 22:06 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-01-29  0:46 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


Andrew Vasquez <andrew.vasquez@qlogic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrew.vasquez@qlogic.com




--- Comment #30 from Andrew Vasquez <andrew.vasquez@qlogic.com>  2010-01-29 00:46:17 ---
Csillag,

Prior to the FW update, were you seeing the failures while
using the 'pci=nomsi' workaround?  What were the firmware
versions used in your testing -- before and after?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (28 preceding siblings ...)
  2009-07-20  8:26 ` bugzilla-daemon
@ 2010-01-28 23:35 ` bugzilla-daemon
  2010-01-29  0:46 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2010-01-28 23:35 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #29 from Csillag Tamas <cstamas@digitus.itk.ppke.hu>  2010-01-28 23:35:43 ---
I upgraded to this kernel @ 2010-01-18:
Linux somehost 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 
GNU/Linux

and I experienced this issue again.

In some forum I get the idea to upgrade the cards firmware (ISP2422):

$ md5sum qlgc_flash_image_multiboot143.img
1f43310c7bb24db53d561b60080b5211  qlgc_flash_image_multiboot143.img

I do not have a problem since I did the upgrade (2010-01-20).

YMMV

--
Regards, 
  cstamas

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (27 preceding siblings ...)
  2009-07-19 14:25 ` bugzilla-daemon
@ 2009-07-20  8:26 ` bugzilla-daemon
  2010-01-28 23:35 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2009-07-20  8:26 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





--- Comment #28 from Ronan Guilfoyle <ronan.guilfoyle@modeva.ie>  2009-07-20 08:26:49 ---
I used the kernel command line 'pci=nomsi' and the problem has not been seen
since.
Three servers that all showed this problem are now running fine for over 6
weeks.  This may not be a fix, but it appears to be a perfectly goo wotkaround
for me.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (26 preceding siblings ...)
  2009-05-12  9:03 ` bugzilla-daemon
@ 2009-07-19 14:25 ` bugzilla-daemon
  2009-07-20  8:26 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2009-07-19 14:25 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


Somsak Sriprayoonsakul <somsaks@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |somsaks@gmail.com




--- Comment #27 from Somsak Sriprayoonsakul <somsaks@gmail.com>  2009-07-19 14:25:27 ---
Hi, we are having about the same problem with about the same log, and we found
something similar with workaround posted at

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/268242

Could this possibly be the same bug?

Anyway, we have add pci=nomsi as suggested in above bug report. Will report
here again if it help (or not).

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (25 preceding siblings ...)
  2009-03-31 16:02 ` bugzilla-daemon
@ 2009-05-12  9:03 ` bugzilla-daemon
  2009-07-19 14:25 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2009-05-12  9:03 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


Ronan Guilfoyle <ronan.guilfoyle@modeva.ie> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ronan.guilfoyle@modeva.ie




--- Comment #26 from Ronan Guilfoyle <ronan.guilfoyle@modeva.ie>  2009-05-12 09:03:28 ---
I had a similar crash last night.
I'm running Ubuntu 8.04, 2.6.24-23-server.

This is an IBM HS21 (Type 8853), with Q-Logic FC adapter.

The system runs MySQL in a heartbeat controlled failover pair with databases on
the SAN.

This error caused a failover but not all of the MySQL transactions were written
to disk.  The backup came up fine but with out of date data causing the slave
to fail because it could not read the correct log position.

I'd appreciate any help with (or links to) replacing integrated qla2xxx drivers
with official rpm ones.

Thanks,
Ronan Guilfoyle

Syslog below,  

May 11 20:17:01 DB1 /USR/SBIN/CRON[7036]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
May 11 20:22:35 DB1 kernel: [3036897.727183] APIC error on CPU1: 00(40)
May 11 20:23:48 DB1 kernel: [3036970.605923] qla2xxx 0000:08:01.1: Mailbox
command timeout occured. Issuing ISP abort.
May 11 20:23:48 DB1 kernel: [3036970.605930] qla2xxx 0000:08:01.1: Performing
ISP error recovery - ha= ffff810222830460.
May 11 20:23:50 DB1 kernel: [3036972.456812] qla2xxx 0000:08:01.1: LOOP UP
detected (4 Gbps).
May 11 20:23:50 DB1 kernel: [3036972.716091] qla2xxx 0000:08:01.1: SNS scan
failed -- assuming zero-entry result...
May 11 20:23:50 DB1 kernel: [3036972.756011] APIC error on CPU6: 00(40)
May 11 20:23:50 DB1 kernel: [3036972.775938] qla2xxx 0000:08:01.1:
scsi(1:1:10): Abort command issued -- 0 5fe51b 2002.
May 11 20:24:23 DB1 kernel: [3037005.515447]  rport-1:0-0: blocked FC remote
port time out: saving binding
May 11 20:24:23 DB1 kernel: [3037005.515512]  rport-1:0-1: blocked FC remote
port time out: saving binding
May 11 20:24:23 DB1 kernel: [3037006.019805] qla2xxx 0000:08:01.1:
scsi(1:1:10): DEVICE RESET ISSUED.
May 11 20:24:53 DB1 kernel: [3037035.942217] qla2xxx 0000:08:01.1: Mailbox
command timeout occured. Issuing ISP abort.
May 11 20:24:53 DB1 kernel: [3037035.942223] qla2xxx 0000:08:01.1: Performing
ISP error recovery - ha= ffff810222830460.
May 11 20:24:55 DB1 kernel: [3037037.801242] qla2xxx 0000:08:01.1: LOOP UP
detected (4 Gbps).
May 11 20:28:30 DB1 syslogd 1.5.0#1ubuntu1: restart.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (24 preceding siblings ...)
  2009-03-04 16:14 ` bugme-daemon
@ 2009-03-31 16:02 ` bugzilla-daemon
  2009-05-12  9:03 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2009-03-31 16:02 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


mike <mb11628@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mb11628@yahoo.com




--- Comment #25 from mike <mb11628@yahoo.com>  2009-03-31 16:02:09 ---
Are there any updates on this bug?  We are planning on installing 4 new
database servers in the next couple of weeks using Debian Lenny amd64 (2.6.26
kernel) on servers with dual Qlogic 2460 HBAs using multipathd, connecting to
an EMC Clarion SAN.

I came across this bug, but couldn't gauge how big of a concern this should be
for us.  Is it recommended to use a kernel version of 2.6.20 or older at this
point or is the behavior seen in this bug a rare/special case?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (23 preceding siblings ...)
  2009-03-03 19:00 ` bugme-daemon
@ 2009-03-04 16:14 ` bugme-daemon
  2009-03-31 16:02 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-03-04 16:14 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #24 from daniel@economicmodeling.com  2009-03-04 08:14 -------
I do not use multipathd, and the qlogic timeouts still crash my system. I
believe Seokman Ju's multipathd errors are caused by the qlogic driver. Notice
how the timestamps for the qlogic events are before the multipath errors.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (22 preceding siblings ...)
  2009-02-27 18:29 ` bugme-daemon
@ 2009-03-03 19:00 ` bugme-daemon
  2009-03-04 16:14 ` bugme-daemon
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-03-03 19:00 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #23 from cstamas@digitus.itk.ppke.hu  2009-03-03 11:00 -------
Well, I am not sure if the multipath issues and the original problems reported
are related.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (21 preceding siblings ...)
  2009-02-27 16:17 ` bugme-daemon
@ 2009-02-27 18:29 ` bugme-daemon
  2009-03-03 19:00 ` bugme-daemon
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-02-27 18:29 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #22 from seokmann.ju@qlogic.com  2009-02-27 10:29 -------
Sorry for the confusion.
I've overlooked the log without having clear understanding the layout.

Could you send the log with 'ql2xextended_error_logging' parameter turned on?
>From the information on the log file at #19, not sure where the failure started
and which command caused it.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (20 preceding siblings ...)
  2009-02-27 10:28 ` bugme-daemon
@ 2009-02-27 16:17 ` bugme-daemon
  2009-02-27 18:29 ` bugme-daemon
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-02-27 16:17 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646

------- Comment #21 from grin@grin.hu  2009-02-27 08:17 -------
#20, enlighten me in the internals, please. Can multipathd actually _cause_
driver timeouts? 

As much as I understood directio checker does nothing more than reads first and
last sector of the device by direct IO (O_DIRECT) calls, and if that fails it
fires an uevent towards userspace to handle the situation.  Can it do anything
"below", like the qla driver, or the scsi device itself?

I tried 'tur' checker in the past with mixed results, and I'm not sure it meant
the path was really down that much or the checker failed, but directio seemed
the most generic.

I thought it's the other way around, eg. qla timoeuts which makes multipathd
cry.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (19 preceding siblings ...)
  2009-02-27  9:50 ` bugme-daemon
@ 2009-02-27 10:28 ` bugme-daemon
  2009-02-27 16:17 ` bugme-daemon
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-02-27 10:28 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #20 from seokmann.ju@qlogic.com  2009-02-27 02:28 -------
>From the log from #19, the multipathd caused recursive
interventions/interruptions to the target so that no more stable path to it is
available.
---
Feb 26 23:50:42 fred multipathd: sdh: directio checker reports path is down
Feb 26 23:50:42 fred multipathd: checker failed path 8:112 in map multi_3_db
Feb 26 23:50:42 fred multipathd: multi_3_db: remaining active paths: 1
Feb 26 23:50:42 fred kernel: [14020652.739423] device-mapper: multipath:
Failing path 8:112.
Feb 26 23:50:42 fred multipathd: sdj: directio checker reports path is down
Feb 26 23:50:42 fred multipathd: checker failed path 8:144 in map mpath7
Feb 26 23:50:42 fred multipathd: mpath7: remaining active paths: 0
---


And it, in turn, caused to trigger timout event followed by aborting commands,
as below.
---
Feb 26 23:50:41 fred kernel: [14020651.786979] qla2xxx_eh_abort(3): aborting sp
ffff81003c1360c0 from RISC. pid=334462.
Feb 26 23:50:41 fred kernel: [14020651.787849] scsi(3): ABORT status detected
0x5-0x0.
Feb 26 23:50:41 fred kernel: [14020651.788110] qla2xxx 0000:08:01.0:
scsi(3:0:3): Abort command issued -- 1 51a7e 2002.
Feb 26 23:50:41 fred kernel: [14020651.847108] qla2xxx_eh_abort(3): aborting sp
ffff81003c136dc0 from RISC. pid=334463.
Feb 26 23:50:41 fred kernel: [14020651.847973] scsi(3): ABORT status detected
0x5-0x0.
Feb 26 23:50:41 fred kernel: [14020651.848242] qla2xxx 0000:08:01.0:
scsi(3:0:9): Abort command issued -- 1 51a7f 2002.
---


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (18 preceding siblings ...)
  2009-02-23  0:54 ` bugme-daemon
@ 2009-02-27  9:50 ` bugme-daemon
  2009-02-27 10:28 ` bugme-daemon
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-02-27  9:50 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646

------- Comment #19 from grin@grin.hu  2009-02-27 01:50 -------
Created an attachment (id=20377)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=20377&action=view)
syslog when starting up multipath

Maybe this contains some info, because the debug values don't say anything to
me. To prevent things from crashing we actualy keep one path down by switching
off one (qlogic fibre) switch. This way we only have occasional crashes (but
since I cannot conjure external console, I'm stuck at that point). 

Someone switched on accidentally the switch, activating the multipath. I'm
attaching what happens. (It goes on and on further, end position of the log is
arbitrary.)

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (17 preceding siblings ...)
  2008-11-23 19:21 ` bugme-daemon
@ 2009-02-23  0:54 ` bugme-daemon
  2009-02-27  9:50 ` bugme-daemon
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2009-02-23  0:54 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #18 from zoltan.kiss@serverside.hu  2009-02-22 16:54 -------
Hi, i also suffer from this bug.

Here is the qla section of /var/log/messages:
Feb 22 21:28:36 bafs1 kernel: [71282.592558] qla2xxx 0000:08:01.0: Mailbox
command timeout occured. Issuing ISP abort.
Feb 22 21:28:36 bafs1 kernel: [71282.592611] qla2xxx 0000:08:01.0: Performing
ISP error recovery - ha= ffff81012e5105f8.
Feb 22 21:28:37 bafs1 kernel: [71283.546086] qla2xxx 0000:08:01.0: LOOP UP
detected (4 Gbps).
Feb 22 21:28:37 bafs1 kernel: [71283.685581] qla2xxx 0000:08:01.0: scsi(0:0:2):
Abort command issued -- 0 2457f0 2002.
Feb 22 21:29:06 bafs1 kernel: [71325.253361] qla2xxx 0000:08:01.0: scsi(0:0:2):
DEVICE RESET ISSUED.
Feb 22 21:29:36 bafs1 kernel: [71412.254597] qla2xxx 0000:08:01.0: Mailbox
command timeout occured. Issuing ISP abort.
Feb 22 21:29:36 bafs1 kernel: [71412.254597] qla2xxx 0000:08:01.0: Performing
ISP error recovery - ha= ffff81012e5105f8.
Feb 22 21:29:37 bafs1 kernel: [71414.455676] qla2xxx 0000:08:01.0: LOOP UP
detected (4 Gbps).
Feb 22 21:29:37 bafs1 kernel: [71414.716347] qla2xxx 0000:08:01.0: scsi(0:0:2):
DEVICE RESET FAILED: Task management failed.
Feb 22 21:29:37 bafs1 kernel: [71414.716347] qla2xxx 0000:08:01.0: scsi(0:0:2):
TARGET RESET ISSUED.
Feb 22 21:30:07 bafs1 kernel: [71488.627662] qla2xxx 0000:08:01.0: Mailbox
command timeout occured. Issuing ISP abort.
Feb 22 21:30:07 bafs1 kernel: [71488.627662] qla2xxx 0000:08:01.0: Performing
ISP error recovery - ha= ffff81012e5105f8.


This happens on IBM BladeCenter HS21 Blade, Debian Lenny, stock kernel:
2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009 x86_64 GNU/Linux

My storage is IBM TS DS4300.

Now, i compiling the 2.6.20 kernel, with RHEL drivers, hope thats help.

Regards,
Zoltan Kiss
Bardi Auto - Hungary


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (16 preceding siblings ...)
  2008-11-19 22:10 ` bugme-daemon
@ 2008-11-23 19:21 ` bugme-daemon
  2009-02-23  0:54 ` bugme-daemon
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-11-23 19:21 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


cstamas@digitus.itk.ppke.hu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cstamas@digitus.itk.ppke.hu




------- Comment #17 from cstamas@digitus.itk.ppke.hu  2008-11-23 11:21 -------
I also suffer from this bug:

Linux version 2.6.24-1-pve (root@oahu) (gcc version 4.1.2 20061115 (prerelease)
(Debian 4.1.1-21)) #1 SMP PREEMPT Fri Oct 24 11:34:13 CEST 2008 (Ubuntu
2.6.24-4.6-server)

Nov 22 07:35:37 somehost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout
occured. Issuing ISP abort.
Nov 22 07:35:37 somehost kernel: qla2xxx 0000:08:01.0: Performing ISP error
recovery - ha= ffff8101618f0468.
Nov 22 07:35:38 somehost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (2
Gbps).
Nov 22 07:35:38 somehost kernel: qla2xxx 0000:08:01.0: SNS scan failed --
assuming zero-entry result...
Nov 22 07:35:38 somehost kernel: APIC error on CPU1: 00(40)
Nov 22 07:35:38 somehost kernel: qla2xxx 0000:08:01.0: scsi(1:0:1): Abort
command issued -- 0 1acc93 2002.
Nov 22 07:36:12 somehost kernel:  rport-1:0-4: blocked FC remote port time out:
saving binding
Nov 22 07:36:13 somehost kernel: qla2xxx 0000:08:01.0: scsi(1:0:1): DEVICE
RESET ISSUED.
Nov 22 07:36:37 somehost kernel:  rport-1:0-0: blocked FC remote port time out:
removing rport
Nov 22 07:36:37 somehost kernel:  rport-1:0-1: blocked FC remote port time out:
removing rport
Nov 22 07:36:37 somehost kernel:  rport-1:0-2: blocked FC remote port time out:
removing rport
Nov 22 07:36:37 somehost kernel:  rport-1:0-3: blocked FC remote port time out:
removing rport

This is a HS21 with a Qlogic card:
08:01.0 Fibre Channel: QLogic Corp. QLA2422 Fibre Channel Adapter (rev 02)
08:01.1 Fibre Channel: QLogic Corp. QLA2422 Fibre Channel Adapter (rev 02)

I am using a DS4700 and the other machines works fine at the same time.

Another machine connected to the same fibre channel switch (the one which works
fine) has debugging mode enabled I include its logs here if it can give a hit
what event drove the HS21 machine crazy:

(log from a HS20 2.6.24.7 stock kernel)
06:01.0 Fibre Channel: QLogic Corp. ISP2312-based 2Gb Fibre Channel to PCI-X
HBA (rev 02)
06:01.1 Fibre Channel: QLogic Corp. ISP2312-based 2Gb Fibre Channel to PCI-X
HBA (rev 02)

2008-11-22_06:35:37.55516 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:35:37.55520 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:35:38.18991 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:35:38.18998 kern.warn: scsi(1): F/W Ready - OK
2008-11-22_06:35:38.23641 kern.warn: scsi(1): fw_state=3 curr time=69fc0e46.
2008-11-22_06:35:38.23646 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:35:38.48711 kern.warn: scsi(1): RSCN queue entry[30] =
[00/010600].
2008-11-22_06:35:38.48713 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:35:38.48716 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:35:38.48718 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:35:38.48719 kern.warn: scsi(1): GID_PT entry - nn
2000001b3205b641 pn 2100001b3205b641 portid=010600.
2008-11-22_06:35:38.48720 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:35:38.48721 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:35:38.48723 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:35:38.48725 kern.warn: scsi(1): Trying Fabric Login w/loop id
0x0083 for port 010600.
2008-11-22_06:35:38.48726 kern.warn: scsi(1): LOOP READY
2008-11-22_06:35:38.48727 kern.warn: scsi(1): qla2x00_loop_resync - end
2008-11-22_06:35:38.54288 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:35:38.54294 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:35:39.18405 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:35:39.18412 kern.warn: scsi(1): F/W Ready - OK
2008-11-22_06:35:39.21856 kern.warn: scsi(1): fw_state=3 curr time=69fc0f3d.
2008-11-22_06:35:39.21862 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:35:39.25176 kern.warn: scsi(1): RSCN queue entry[31] =
[00/010600].
2008-11-22_06:35:39.27638 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:35:39.29344 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:35:39.30982 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:35:39.32600 kern.warn: scsi(1): GID_PT entry - nn
2000001b3205b641 pn 2100001b3205b641 portid=010600.
2008-11-22_06:35:39.34160 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:35:39.35693 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:35:39.35699 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:35:39.38467 kern.warn: scsi(1): Trying Fabric Login w/loop id
0x0083 for port 010600.
2008-11-22_06:35:39.39794 kern.warn: scsi(1): LOOP READY
2008-11-22_06:35:39.39801 kern.warn: scsi(1): qla2x00_loop_resync - end
2008-11-22_06:36:43.23921 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:36:43.23925 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:36:44.17780 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:36:44.17787 kern.warn: scsi(1): F/W Ready - OK
2008-11-22_06:36:44.19975 kern.warn: scsi(1): fw_state=3 curr time=69fc4eb4.
2008-11-22_06:36:44.19981 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:36:44.22040 kern.warn: scsi(1): RSCN queue entry[0] =
[00/010600].
2008-11-22_06:36:44.23846 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:36:44.24914 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:36:44.25921 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:36:44.26910 kern.warn: scsi(1): GID_PT entry - nn
2000001b3205b641 pn 2100001b3205b641 portid=010600.
2008-11-22_06:36:44.28411 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:36:44.28776 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:36:44.28783 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:36:44.30339 kern.warn: scsi(1): Trying Fabric Login w/loop id
0x0083 for port 010600.
2008-11-22_06:36:44.31129 kern.warn: scsi(1): LOOP READY
2008-11-22_06:36:44.31136 kern.warn: scsi(1): qla2x00_loop_resync - end
2008-11-22_06:36:44.34305 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:36:44.34306 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:36:45.06317 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:36:45.06319 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:36:45.17397 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:36:45.17404 kern.warn: scsi(1): F/W Ready - OK 
2008-11-22_06:36:45.18965 kern.warn: scsi(1): fw_state=3 curr time=69fc4fad.
2008-11-22_06:36:45.18972 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:36:45.20585 kern.warn: scsi(1): RSCN queue entry[1] =
[00/010600].
2008-11-22_06:36:45.20592 kern.warn: scsi(1): Skipping duplicate RSCN queue
entry found at [2].
2008-11-22_06:36:45.22195 kern.warn: scsi(1): RSCN queue entry[2] =
[00/010600].
2008-11-22_06:36:45.23816 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:36:45.24727 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:36:45.25633 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:36:45.26791 kern.warn: scsi(1): GID_PT entry - nn
2000001b3205b641 pn 2100001b3205b641 portid=010600.
2008-11-22_06:36:45.27679 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:36:45.28572 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:36:45.28578 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:36:45.30173 kern.warn: scsi(1): Trying Fabric Login w/loop id
0x0083 for port 010600.
2008-11-22_06:36:45.30980 kern.warn: scsi(1): LOOP READY
2008-11-22_06:36:45.30985 kern.warn: scsi(1): qla2x00_loop_resync - end
2008-11-22_06:37:45.99458 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:37:45.99466 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:37:46.17424 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:37:46.17431 kern.warn: scsi(1): F/W Ready - OK 
2008-11-22_06:37:46.19055 kern.warn: scsi(1): fw_state=3 curr time=69fc8b3f.
2008-11-22_06:37:46.19062 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:37:46.20666 kern.warn: scsi(1): RSCN queue entry[3] =
[00/010600].
2008-11-22_06:37:46.22193 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:37:46.23109 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:37:46.24022 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:37:46.24921 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:37:46.25818 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:37:46.25825 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:37:46.27411 kern.warn: scsi(1): LOOP READY
2008-11-22_06:37:46.27418 kern.warn: scsi(1): qla2x00_loop_resync - end
2008-11-22_06:37:47.08468 kern.warn: scsi(1): Asynchronous RSCR UPDATE.
2008-11-22_06:37:47.08470 kern.info: scsi(1): RSCN database changed -- 0001
0600.
2008-11-22_06:37:47.17403 kern.warn: scsi(1): qla2x00_loop_resync()
2008-11-22_06:37:47.17410 kern.warn: scsi(1): F/W Ready - OK 
2008-11-22_06:37:47.18983 kern.warn: scsi(1): fw_state=3 curr time=69fc8c39.
2008-11-22_06:37:47.18994 kern.warn: scsi(1): Configure loop -- dpc flags
=0x40000a0
2008-11-22_06:37:47.20616 kern.warn: scsi(1): RSCN queue entry[4] =
[00/010600].
2008-11-22_06:37:47.22248 kern.warn: scsi(1): GID_PT entry - nn
200000112593fc1c pn 210000112593fc1c portid=010100.
2008-11-22_06:37:47.23159 kern.warn: scsi(1): GID_PT entry - nn
200000112593f89c pn 210000112593f89c portid=010200.
2008-11-22_06:37:47.24107 kern.warn: scsi(1): GID_PT entry - nn
200000145e241c2c pn 210000145e241c2c portid=010400.
2008-11-22_06:37:47.24995 kern.warn: scsi(1): GID_PT entry - nn
2000001b3205b641 pn 2100001b3205b641 portid=010600.
2008-11-22_06:37:47.25876 kern.warn: scsi(1): GID_PT entry - nn
2000001b32056b41 pn 2100001b32056b41 portid=010700.
2008-11-22_06:37:47.26760 kern.warn: scsi(1): GID_PT entry - nn
200400a0b8293358 pn 202400a0b8293358 portid=010f00.
2008-11-22_06:37:47.26766 kern.warn: scsi(1): device wrap (010f00)
2008-11-22_06:37:47.28333 kern.warn: scsi(1): Trying Fabric Login w/loop id
0x0083 for port 010600.
2008-11-22_06:37:47.29128 kern.warn: scsi(1): LOOP READY
2008-11-22_06:37:47.29135 kern.warn: scsi(1): qla2x00_loop_resync - end

I downgraded the buggy machine as a workaround to an earlier kernel hoping it
will fix the problems outlined here.

Regards,
   cstamas
--
Csillag Tamas (cstamas)
http://digitus.itk.ppke.hu/~cstamas


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (15 preceding siblings ...)
  2008-10-21  7:13 ` bugme-daemon
@ 2008-11-19 22:10 ` bugme-daemon
  2008-11-23 19:21 ` bugme-daemon
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-11-19 22:10 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646

------- Comment #16 from daniel@economicmodeling.com  2008-11-19 14:10 -------
I have experienced this bug on IBM HS21 Blades running Debian Lenny/2.6.22
connected to IBM DS3400 storage via qlogic switch. The crashes occurred during
cp and rsync operations from one array to another.

I solved the problem by replacing the Linux qla2xxx module with the official
qlogic RHEL/SUSE driver and hacking it to work as a module in Debian. The
mailbox timeouts stopped after switching drivers. This suggests a bug in the
current Linux qla2xxx driver- NOT a hardware problem.

Here is syslog output from a typical crash:

Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout
occured. Issuing ISP abort.
Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error
recovery - ha= ffff810223d0c530.
Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4 Gbps).
Jan 27 19:41:54 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed --
assuming zero-entry result...
Jan 27 19:41:54 hqhost kernel: APIC error on CPU0: 00(40)
Jan 27 19:41:54 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:1): Abort command
issued -- 0 9a776 2002.
Jan 27 19:42:28 hqhost kernel:  rport-0:0-0: blocked FC remote port time out:
removing target and saving binding
Jan 27 19:42:28 hqhost kernel:  rport-0:0-4: blocked FC remote port time out:
removing target and saving binding
Jan 27 19:42:28 hqhost kernel:  rport-0:0-5: blocked FC remote port time out:
removing target and saving binding
Jan 27 19:42:28 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE RESET
ISSUED.
Jan 27 19:42:28 hqhost kernel: APIC error on CPU5: 00(40)
Jan 27 19:42:28 hqhost kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (14 preceding siblings ...)
  2008-10-13 11:45 ` bugme-daemon
@ 2008-10-21  7:13 ` bugme-daemon
  2008-11-19 22:10 ` bugme-daemon
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-21  7:13 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #15 from grin@grin.hu  2008-10-21 00:13 -------
Just to note that I segmented one live server from the others in question and
it did not help [the separated one keeps crashing / locking up], now I am
trying to freeze the test server (which runs a stock kernel instead of the
openvz one), but probably due to the lack of real server load it's hard, and so
far the freezes were total. I am about to create a crashdump kernel, maybe I
can catch a glimpse of what happens. Stand by...

By the way I have some weird "RISC paused / firmware dumped" case on the live
machine, where it happens right after reboot, the system goes on just fine but
it shouldn't happen anyway I guess. I'll send the dumps by email.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (13 preceding siblings ...)
  2008-10-07 21:27 ` bugme-daemon
@ 2008-10-13 11:45 ` bugme-daemon
  2008-10-21  7:13 ` bugme-daemon
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-13 11:45 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #14 from seokmann.ju@qlogic.com  2008-10-13 04:45 -------
If you have any updates, please let us know.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (12 preceding siblings ...)
  2008-10-07 20:52 ` bugme-daemon
@ 2008-10-07 21:27 ` bugme-daemon
  2008-10-13 11:45 ` bugme-daemon
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-07 21:27 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #13 from seokmann.ju@qlogic.com  2008-10-07 14:27 -------
OK. I see your point.
Could you provide feedback on the questions that I've raised in the email?
Let's continue to narrow down the problem further.
One thing, we would need to have console output redirected to serial as it
reveals most accurate clues for us.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (11 preceding siblings ...)
  2008-10-07 20:38 ` bugme-daemon
@ 2008-10-07 20:52 ` bugme-daemon
  2008-10-07 21:27 ` bugme-daemon
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-07 20:52 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #12 from grin@grin.hu  2008-10-07 13:52 -------
Okay, but _which_ part you mean? One HBA? As I've mentioned the problem
happened on multiple machines, and they have dual HBAs. (Or can one bad HBA
mess up the others? How could be spotted which one is bad? Is there a way to
test this particular problem?) 

By the reply I guess they talk about the dumps, and #4 was the second card of
the machine in question. But originally this wasn't the server I had most
problem with, but that one locks usually up alright on newer kernels and reboot
clears the firmware dumps you mentioned. So if machine#3 have bad HBA#2, why
did machine #1 lock up every 30 minutes? Still not clear to me. 


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (10 preceding siblings ...)
  2008-10-06 19:21 ` bugme-daemon
@ 2008-10-07 20:38 ` bugme-daemon
  2008-10-07 20:52 ` bugme-daemon
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-07 20:38 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #11 from seokmann.ju@qlogic.com  2008-10-07 13:38 -------
Thanks.
Below is the feedback from our firmware folks.
---
Looks like you have a bad part.

b.txt (fw_dump_4) showed us reading a register to go into a jmp table.  I
should
have read '1' (was at the time of the dump), but looks like I got 0.
RISC then paused on the parity error.
---

So, please go ahead and contact QLogic services to get serviced or replaced it.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (9 preceding siblings ...)
  2008-10-03 14:42 ` bugme-daemon
@ 2008-10-06 19:21 ` bugme-daemon
  2008-10-07 20:38 ` bugme-daemon
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-06 19:21 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #10 from grin@grin.hu  2008-10-06 12:21 -------
Sent by email.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (8 preceding siblings ...)
  2008-10-03  0:23 ` bugme-daemon
@ 2008-10-03 14:42 ` bugme-daemon
  2008-10-06 19:21 ` bugme-daemon
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-03 14:42 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #9 from seokmann.ju@qlogic.com  2008-10-03 07:42 -------
Yes, please forward the syslog to us.
>From the log in comment #8, the RISC attempted to dump firmware right after the
RISC pause.
The dump image might contain clues explaining what was going on the time spot.
Could you forward the firmware dump to us? 
Here is the steps how to get the dump,
---
When a firmware dump is performed, a message similar to:

       Firmware dump saved to temp buffer (1/adcdabcd)

will be logged by the driver.

To retrieve the dump (do this *BEFORE* you unload the driver and
before the machine is reset), go to a console and type the following:

       $ wget ftp://ftp.qlogic.com/outgoing/linux/beta/8.x/test/qla_dmp.sh
       $ chmod 755 qla_dmp.sh
       $ ./qla_dmp.sh <host_no>

The value passed to qla_dmp.sh should be the same as the first integer
in the 'saved to temp buffer' string (in this example, 1).  If the
operation was successful, a message like to following should be
displayed:

       Firmware dumped to file fw_dump_1_20041217_023222.txt.gz

Send us the file and we can have the firmware folks take a look to see
what's going on.
---


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (7 preceding siblings ...)
  2008-10-01 22:40 ` bugme-daemon
@ 2008-10-03  0:23 ` bugme-daemon
  2008-10-03 14:42 ` bugme-daemon
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-03  0:23 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


akpm@osdl.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Regression|0                           |1




-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (6 preceding siblings ...)
  2008-09-30  7:49 ` bugme-daemon
@ 2008-10-01 22:40 ` bugme-daemon
  2008-10-03  0:23 ` bugme-daemon
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-10-01 22:40 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #8 from grin@grin.hu  2008-10-01 15:40 -------
Hm, I go some logs which contain messages like

Oct  2 00:23:05 galamb kernel: [139240.696070] qla2xxx 0000:08:01.1: RISC
paused -- HCCR=0, Dumping firmware!
Oct  2 00:23:05 galamb kernel: [139240.696097] qla2xxx 0000:08:01.1: Firmware
has been previously dumped (ffffc20000bcc000) -- ignoring request...
Oct  2 00:23:05 galamb kernel: [139241.494343] scsi(4): dpc: sched
qla2x00_abort_isp ha = ffff81007bd84460
Oct  2 00:23:05 galamb kernel: [139241.494350] qla2xxx 0000:08:01.1: Performing
ISP error recovery - ha= ffff81007bd84460.
Oct  2 00:23:05 galamb kernel: [139241.530998] scsi(4): **** Load RISC code
****
Oct  2 00:23:05 galamb kernel: [139241.547277] scsi(4): Verifying Checksum of
loaded RISC code.
Oct  2 00:23:05 galamb kernel: [139241.564201] scsi(4): Checksum OK, start
firmware.
Oct  2 00:23:06 galamb kernel: [139241.747606] scsi(4): Issue init firmware.
Oct  2 00:23:06 galamb kernel: [139242.296514] scsi(4): Asynchronous P2P MODE
received.
Oct  2 00:23:06 galamb kernel: [139242.316473] scsi(4): Asynchronous LOOP UP (4
Gbps).
Oct  2 00:23:06 galamb kernel: [139242.316479] qla2xxx 0000:08:01.1: LOOP UP
detected (4 Gbps).
Oct  2 00:23:06 galamb kernel: [139242.336435] scsi(4): Asynchronous PORT
UPDATE.
Oct  2 00:23:06 galamb kernel: [139242.336440] scsi(4): Port database changed
ffff 0006 0000.
Oct  2 00:23:06 galamb kernel: [139242.356395] scsi(4): Asynchronous PORT
UPDATE ignored 0000/0004/0600.
Oct  2 00:23:06 galamb kernel: [139242.376358] scsi(4): Asynchronous PORT
UPDATE ignored 0000/0007/0b00.
Oct  2 00:23:06 galamb kernel: [139242.396353] scsi(4): F/W Ready - OK 
Oct  2 00:23:06 galamb kernel: [139242.416315] scsi(4): fw_state=3 curr
time=100d44784.
Oct  2 00:23:06 galamb kernel: [139242.416321] qla2x00_restart_isp(): Start
configure loop, status = 0
Oct  2 00:23:06 galamb kernel: [139242.436258] scsi(4): Configure loop -- dpc
flags =0x4080048
Oct  2 00:23:06 galamb kernel: [139242.456218] scsi(4): RSCN queue entry[0] =
[00/000000].
Oct  2 00:23:06 galamb kernel: [139242.456223] scsi(4): device_resync: rscn
overflow.
Oct  2 00:23:06 galamb kernel: [139242.492382] scsi(4): fcport-0 - port retry
count: 2 remaining
Oct  2 00:23:06 galamb kernel: [139242.492406] scsi(4): RFT_ID exiting
normally.
Oct  2 00:23:06 galamb kernel: [139242.512366] scsi(4): RFF_ID exiting
normally.
Oct  2 00:23:06 galamb kernel: [139242.532324] scsi(4): RNN_ID exiting
normally.
Oct  2 00:23:06 galamb kernel: [139242.556047] scsi(4): RSNN_NN exiting
normally.
Oct  2 00:23:07 galamb kernel: [139242.632113] scsi(4): GID_PT entry - nn
200100e08bba4036 pn 210100e08bba4036 portid=010400.
Oct  2 00:23:07 galamb kernel: [139242.655856] scsi(4): GID_PT entry - nn
200400a0b8263784 pn 200500a0b8263785 portid=011300.
Oct  2 00:23:07 galamb kernel: [139242.731982] scsi(4): GPSC ext entry - fpn
200400c0dd0daf7b speeds=6000 speed=2000.
Oct  2 00:23:07 galamb kernel: [139242.755684] scsi(4): GPSC ext entry - fpn
201300c0dd0daf7b speeds=e000 speed=2000.
Oct  2 00:23:07 galamb kernel: [139242.775629] qla24xx_fabric_logout(4): failed
to complete IOCB -- completion status (31)  ioparam=a/0.
Oct  2 00:23:07 galamb kernel: [139242.775634] scsi(4): device wrap (011300)
Oct  2 00:23:07 galamb kernel: [139242.775639] scsi(4): Trying Fabric Login
w/loop id 0x0081 for port 011300.
Oct  2 00:23:07 galamb kernel: [139242.831751] qla2xxx 0000:08:01.1: iIDMA
adjusted to 4 GB/s on 200500a0b8263785.
Oct  2 00:23:07 galamb kernel: [139242.831787] scsi(4): LOOP READY
Oct  2 00:23:07 galamb kernel: [139242.831789] qla2x00_restart_isp(): Configure
loop done, status = 0x0
Oct  2 00:23:07 galamb kernel: [139242.833926] qla2xxx 0000:08:01.1:
scsi(4:0:0:6): Mid-layer underflow detected (40000 of 40000 bytes)...returning
error status.
Oct  2 00:23:07 galamb kernel: [139242.843912] qla2xxx 0000:08:01.1:
scsi(4:0:0:3): Mid-layer underflow detected (10000 of 10000 bytes)...returning
error status.

under 2.6.24+openvz. It was repeatedly generated by asking LVM to move a whole
physical volume (PV) to another one, which caused a constant, medium rate
dataflow in both directions. The link went up later, and the move so far did
not crash the machine.

It may be important to mention that FC#0 is link down (really), FC#1 is active.
When FC1 reports link down, mailbox timeouts, etc, FC0 logs _lots_ of firmware
dump requests (thousands), which I guess could eventually crash the machine
(but so far didn't).

If anyone requests I can provide the full syslog (not as an attachment though).


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (5 preceding siblings ...)
  2008-09-27  8:17 ` bugme-daemon
@ 2008-09-30  7:49 ` bugme-daemon
  2008-10-01 22:40 ` bugme-daemon
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-30  7:49 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646

------- Comment #7 from grin@grin.hu  2008-09-30 00:49 -------
Created an attachment (id=18108)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=18108&action=view)
Some further logfile around a lockup

I tried and failed to manually lockup a kernel (tried 10 bonnie++ and
tiobench), but another one (2.6.24 with openvz patch but I believe openvz
shouldn't really matter to qla2xxx driver; you're free to disagree) locked up,
maybe there's something useful in the logs (the log drive wasn't stuck, no
OOPS).

The links seem to go down, but they most probably did not, since all other
servers went on completely fine at the time. Links seemed stayed down [and
locked up IO], but came up again after a reboot. I do not believe the links
were _really_ down at all, so this - by my guess - is the same problem.
Everything went to syslog. 

Plus I attached some proc/interrupts and ps tree and whatnot.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (4 preceding siblings ...)
  2008-09-26 13:59 ` bugme-daemon
@ 2008-09-27  8:17 ` bugme-daemon
  2008-09-30  7:49 ` bugme-daemon
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-27  8:17 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646

------- Comment #6 from grin@grin.hu  2008-09-27 01:17 -------
Unfortunately no. 

The system in question does not have physical serIO, and said to have a
serial-over-IP feature, which in fact does not work (and it's a pretty stupid
thing anyway, since you should have to telnet(!) in and capture the output
somehow; but the connection breaks after 1-2 minutes anyway).

I've tried netconsole but unfortunately [and naturally] it dies along with
eth0. 

I was thinking about usb serial port, but it probably requires IRQs alive
either.

But, as I mentioned, I backed up the live system to 2.4.20 to prevent further
lockups, and I do not really have a way to kill the test system manually. (So
far I've tried 2-3 runs of bonnie++ and tiobench, neither locked it up but I'll
try to run then in endless loop and see what happens.)

Which version of kernel have in your opinion a good chance to have the change?
I see there was a big version change somewhere, if you could point out the
kernel version I'd try to shoot around it.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (3 preceding siblings ...)
  2008-09-26 13:48 ` bugme-daemon
@ 2008-09-26 13:59 ` bugme-daemon
  2008-09-27  8:17 ` bugme-daemon
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-26 13:59 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


seokmann.ju@qlogic.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |seokmann.ju@qlogic.com




------- Comment #5 from seokmann.ju@qlogic.com  2008-09-26 06:59 -------
One thing, could you redirect console output to serial port so that we could
grab as much information as the kernel provides?
If I understood correctly, this helps in the situation where the system gets
locked up or hung.
Hope it will provide further direction for us to go.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
                   ` (2 preceding siblings ...)
  2008-09-25 15:04 ` bugme-daemon
@ 2008-09-26 13:48 ` bugme-daemon
  2008-09-26 13:59 ` bugme-daemon
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-26 13:48 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #4 from grin@grin.hu  2008-09-26 06:48 -------
I have put some screenshots of the crashes to the same location, though I am
not sure anything useful could be prayed out of them. 

Can I provide any more info? Were there anything useful? Or any idea on a way I
could deliberately crash it (so I can try on a test machine any kernel
versions/patches you like)?


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
  2008-09-25 14:10 ` [Bug 11646] " bugme-daemon
  2008-09-25 15:00 ` bugme-daemon
@ 2008-09-25 15:04 ` bugme-daemon
  2008-09-26 13:48 ` bugme-daemon
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-25 15:04 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #3 from grin@grin.hu  2008-09-25 08:04 -------
Okay Bugzilla doesn't let me, so please get it from
http://foobar.grin.hu/tmp/


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
  2008-09-25 14:10 ` [Bug 11646] " bugme-daemon
@ 2008-09-25 15:00 ` bugme-daemon
  2008-09-25 15:04 ` bugme-daemon
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-25 15:00 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646


grin@grin.hu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |grin@grin.hu




------- Comment #2 from grin@grin.hu  2008-09-25 08:00 -------
Many updates: unfortunately I realised. 

2.6.26.5: yes, it is broken as well. I have no knowledge of released 2.6.27.
:-)

Messages: alright, I'll attach some, but please realise that when the IO is
blocked there is no log. :-) I have some screenshots of the crashes, but most
of them not even relevant, only showing that processes stuck in D state for
much too long.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20
  2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
@ 2008-09-25 14:10 ` bugme-daemon
  2008-09-25 15:00 ` bugme-daemon
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 41+ messages in thread
From: bugme-daemon @ 2008-09-25 14:10 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11646





------- Comment #1 from seokmann.ju@qlogic.com  2008-09-25 07:10 -------
There have been many updates/changes applied to the qla2xxx module since
2.6.20.
Does it happen with later 2.6.26.5 or latest 2.6.27 kernels?
Yes, please provide /var/log/messages file of the system.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2014-07-29 20:22 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-11646-11613@https.bugzilla.kernel.org/>
2010-08-31  6:22 ` [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20 bugzilla-daemon
2010-08-31 13:56 ` bugzilla-daemon
2012-05-22 14:34 ` bugzilla-daemon
2012-10-30 15:12 ` bugzilla-daemon
2014-07-29 19:59 ` bugzilla-daemon
2014-07-29 20:22 ` bugzilla-daemon
2008-09-25 13:55 [Bug 11646] New: " bugme-daemon
2008-09-25 14:10 ` [Bug 11646] " bugme-daemon
2008-09-25 15:00 ` bugme-daemon
2008-09-25 15:04 ` bugme-daemon
2008-09-26 13:48 ` bugme-daemon
2008-09-26 13:59 ` bugme-daemon
2008-09-27  8:17 ` bugme-daemon
2008-09-30  7:49 ` bugme-daemon
2008-10-01 22:40 ` bugme-daemon
2008-10-03  0:23 ` bugme-daemon
2008-10-03 14:42 ` bugme-daemon
2008-10-06 19:21 ` bugme-daemon
2008-10-07 20:38 ` bugme-daemon
2008-10-07 20:52 ` bugme-daemon
2008-10-07 21:27 ` bugme-daemon
2008-10-13 11:45 ` bugme-daemon
2008-10-21  7:13 ` bugme-daemon
2008-11-19 22:10 ` bugme-daemon
2008-11-23 19:21 ` bugme-daemon
2009-02-23  0:54 ` bugme-daemon
2009-02-27  9:50 ` bugme-daemon
2009-02-27 10:28 ` bugme-daemon
2009-02-27 16:17 ` bugme-daemon
2009-02-27 18:29 ` bugme-daemon
2009-03-03 19:00 ` bugme-daemon
2009-03-04 16:14 ` bugme-daemon
2009-03-31 16:02 ` bugzilla-daemon
2009-05-12  9:03 ` bugzilla-daemon
2009-07-19 14:25 ` bugzilla-daemon
2009-07-20  8:26 ` bugzilla-daemon
2010-01-28 23:35 ` bugzilla-daemon
2010-01-29  0:46 ` bugzilla-daemon
2010-01-31 22:06 ` bugzilla-daemon
2010-03-03  9:37 ` bugzilla-daemon
2010-03-03  9:59 ` bugzilla-daemon
2010-03-03 10:45 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.