* [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset
@ 2020-04-19 18:25 bugzilla-daemon
2020-04-19 20:24 ` [Bug 207359] " bugzilla-daemon
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-04-19 18:25 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=207359
Bug ID: 207359
Summary: MegaRAID SAS 9361 controller hang/reset
Product: Platform Specific/Hardware
Version: 2.5
Kernel Version: >=v5.4
Hardware: PPC-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: PPC-64
Assignee: platform_ppc-64@kernel-bugs.osdl.org
Reporter: cam@neo-zeon.de
Regression: No
Created attachment 288623
--> https://bugzilla.kernel.org/attachment.cgi?id=288623&action=edit
dmesg output for controller hang
On a Talos II 2x 36 core (144 thread) POWER9 box, MegaRAID SAS 9361-16i PCIE
controller can be made to pretty consistently hang with "heavy IO" on kernel
versions greater than 5.3.18.
I am unable to reproduce this on a 16/32 core/thread amd64 box with a MegaRAID
SAS 9361-16i PCIE with the exact same firmware revision.
The box also has a Microsemi SAS HBA which seems unaffected by this.
System details:
Talos II motherboard
2x 36 core (144 thread) POWER9 processors
512GB memory
4k page size
MegaRAID SAS 9361-16i PCIE controller (4 disk RAID10 volume, megaraid_sas
driver)
Microsemi HBA w/4x SSD's
The relevant dmesg messages are attached.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 207359] MegaRAID SAS 9361 controller hang/reset
2020-04-19 18:25 [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset bugzilla-daemon
@ 2020-04-19 20:24 ` bugzilla-daemon
2020-04-19 20:55 ` bugzilla-daemon
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-04-19 20:24 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=207359
gyakovlev@gentoo.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gyakovlev@gentoo.org
--- Comment #1 from gyakovlev@gentoo.org ---
In my case I see similar problem on same motherboard but with aacraid driver
(microsemi one)
https://bugzilla.kernel.org/show_bug.cgi?id=206123
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 207359] MegaRAID SAS 9361 controller hang/reset
2020-04-19 18:25 [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset bugzilla-daemon
2020-04-19 20:24 ` [Bug 207359] " bugzilla-daemon
@ 2020-04-19 20:55 ` bugzilla-daemon
2020-05-10 3:02 ` bugzilla-daemon
2020-08-06 17:56 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-04-19 20:55 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=207359
--- Comment #2 from Cameron (cam@neo-zeon.de) ---
Looking at bug 206123 above, it's worth noting that the amd64 box I'm using for
comparison has SATA disks, though this is probably still a PPC specific issue.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 207359] MegaRAID SAS 9361 controller hang/reset
2020-04-19 18:25 [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset bugzilla-daemon
2020-04-19 20:24 ` [Bug 207359] " bugzilla-daemon
2020-04-19 20:55 ` bugzilla-daemon
@ 2020-05-10 3:02 ` bugzilla-daemon
2020-08-06 17:56 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-05-10 3:02 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=207359
--- Comment #3 from Cameron (cam@neo-zeon.de) ---
Created attachment 289041
--> https://bugzilla.kernel.org/attachment.cgi?id=289041&action=edit
5.6.11 megaraid POWER hang
Still happens with 5.6.11. There seems to be potentially a bit more output this
time, and I've included output from shutting down too in case it's useful.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 207359] MegaRAID SAS 9361 controller hang/reset
2020-04-19 18:25 [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset bugzilla-daemon
` (2 preceding siblings ...)
2020-05-10 3:02 ` bugzilla-daemon
@ 2020-08-06 17:56 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2020-08-06 17:56 UTC (permalink / raw)
To: linuxppc-dev
https://bugzilla.kernel.org/show_bug.cgi?id=207359
--- Comment #4 from Cameron (cam@neo-zeon.de) ---
I converted the box's filesystems from BTRFS to XFS, and switched the page size
from 4k to 64k. The problem appears to be entirely gone now. I am able to
conclusively run 5.7.13 without issue, which I verified as having the
megaraid_sas controller hang problem while still running my previous BTRFS+4k
page configuration.
Unfortunately, it took a great deal of time to perform this conversion, and I
wasn't able to keep the box down even longer to test if converting to XFS and
64k pages individually resolved the issue. All I can say for certain is that
either switching to XFS, to a 64k page size, or both has fixed the problem for
me.
The backup volume is a single SATA disk that is still using BTRFS (for
snapshotting), and is not giving me any trouble. But if this has any relation
to https://bugzilla.kernel.org/show_bug.cgi?id=206123, then this may not be
conclusive due to being that SATA disks potentially may not trigger the issue.
The single disk also can't push as much IO as the RAID10 volume so that may be
another reason.
My quasi educated non-kernel-dev guess is that this is probably a bug relating
to the 4k page size. Whether or not the regular behavior of BTRFS exacerbates
this (making it easier to reproduce), is possible, but unknown.
Hopefully someone else encountering this issue will find this helpful.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-06 17:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 18:25 [Bug 207359] New: MegaRAID SAS 9361 controller hang/reset bugzilla-daemon
2020-04-19 20:24 ` [Bug 207359] " bugzilla-daemon
2020-04-19 20:55 ` bugzilla-daemon
2020-05-10 3:02 ` bugzilla-daemon
2020-08-06 17:56 ` bugzilla-daemon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.