Linux-SCSI Archive on lore.kernel.org
 help / color / Atom feed
* [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays
@ 2020-07-19  6:36 bugzilla-daemon
  2020-07-19  7:32 ` [Bug 208605] " bugzilla-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-19  6:36 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

            Bug ID: 208605
           Summary: AACRAID frequent hos bus reset with intensive IO on
                    large arrays
           Product: SCSI Drivers
           Version: 2.5
    Kernel Version: 4.14 - 5.7.8
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: AACRAID
          Assignee: scsi_drivers-aacraid@kernel-bugs.osdl.org
          Reporter: janpieter.sollie@edpnet.be
        Regression: No

Created attachment 290345
  --> https://bugzilla.kernel.org/attachment.cgi?id=290345&action=edit
quick and dirty patch to fix the issue

On a large array (>15 drives), it is impossible to backup the storage to a SAS
tape without the driver detecting a lockup, and causing a bus reset.
This seems to be a false detection, as the host controller actually is not
locking up anything.  It's just a bit delayed.
This issue seems to go back to 4.14.

I reverted some cleanup stuff introduced in 4.14, and the driver is working
correctly.

I attached a patch for it, but this is just to show where the bug may be, it is
not ready for production (though it works, but this may be for 7 series only). 
I also have no idea what exactly causes this issue

Bug observed on a series 7 controller with a 12-drive RAID6 array.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
@ 2020-07-19  7:32 ` bugzilla-daemon
  2020-07-19 10:08 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-19  7:32 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

--- Comment #1 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Sorry, this patch seems to be a false positive ... the error still occurs:
scsi_eh_handler still appears, though a little later

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
  2020-07-19  7:32 ` [Bug 208605] " bugzilla-daemon
@ 2020-07-19 10:08 ` bugzilla-daemon
  2020-07-19 12:31 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-19 10:08 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

Andrey Jr. Melnikov (temnota.am@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |temnota.am@gmail.com

--- Comment #2 from Andrey Jr. Melnikov (temnota.am@gmail.com) ---
check this https://patchwork.kernel.org/patch/11038347/

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
  2020-07-19  7:32 ` [Bug 208605] " bugzilla-daemon
  2020-07-19 10:08 ` bugzilla-daemon
@ 2020-07-19 12:31 ` bugzilla-daemon
  2020-07-20 10:22 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-19 12:31 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

--- Comment #3 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I saw that, the modifications are included in this patch (but for 7 series
instead of 6), but they do not seem to work.  There must be another issue.
I know that the controller works fine when issuing commands like create / erase
/ repair etc ... but during large IO, it fails.  So there must be some sync
issue between the scsi subsystem (or the aacraid driver) and the adapter.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-07-19 12:31 ` bugzilla-daemon
@ 2020-07-20 10:22 ` bugzilla-daemon
  2020-07-29  7:43 ` bugzilla-daemon
  2020-08-12  9:10 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-20 10:22 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

--- Comment #4 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Created attachment 290373
  --> https://bugzilla.kernel.org/attachment.cgi?id=290373&action=edit
modification to make Microsemi driver work with 5.7 kernel

I know this is bad practice, but at least it produces some results:
I tried the proprietary Microsemi driver (58012).  Of course it does not work
with recent kernels, but after modifying the code a bit, I made "something"
that works.
Patch in attachment.  Any idea why this one works but the open source variant
does not? When I take a look at the amount of abandoned / junk in the code of
Microsemi after modifying, I'd expect the opposite.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-07-20 10:22 ` bugzilla-daemon
@ 2020-07-29  7:43 ` bugzilla-daemon
  2020-08-12  9:10 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-07-29  7:43 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

--- Comment #5 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I think I found a solution:
When I force sync mode, the driver handles everything perfectly.  Off course
this has a performance impact, so if anyone could help me debug this driver in
async mode, it would be very much appreciated ...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 208605] AACRAID frequent hos bus reset with intensive IO on large arrays
  2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-07-29  7:43 ` bugzilla-daemon
@ 2020-08-12  9:10 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-08-12  9:10 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=208605

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
the previous setting was no solution.  The functionality of the driver is
largely reduced. aacraid cache=3 & arcconf setcache ld 1 coff & echo "write
through" > /sys/block/sdc/queue/write_cache fixed the issue.  This is most
probably hardware related.  No linux bug

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-19  6:36 [Bug 208605] New: AACRAID frequent hos bus reset with intensive IO on large arrays bugzilla-daemon
2020-07-19  7:32 ` [Bug 208605] " bugzilla-daemon
2020-07-19 10:08 ` bugzilla-daemon
2020-07-19 12:31 ` bugzilla-daemon
2020-07-20 10:22 ` bugzilla-daemon
2020-07-29  7:43 ` bugzilla-daemon
2020-08-12  9:10 ` bugzilla-daemon

Linux-SCSI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-scsi/0 linux-scsi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-scsi linux-scsi/ https://lore.kernel.org/linux-scsi \
		linux-scsi@vger.kernel.org
	public-inbox-index linux-scsi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-scsi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git