All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Fjellstrom <thomas@fjellstrom.ca>
To: adam radford <aradford@gmail.com>
Cc: lkml <linux-kernel@vger.kernel.org>, linux-scsi@vger.kernel.org
Subject: Re: stuck in megaraid_sas.c megasas_adp_reset_gen2
Date: Wed, 11 Apr 2012 14:17:28 -0600	[thread overview]
Message-ID: <201204111417.28893.thomas@fjellstrom.ca> (raw)
In-Reply-To: <201203211752.05576.thomas@fjellstrom.ca>

On Wed Mar 21, 2012, you wrote:
> On Wed Mar 21, 2012, adam radford wrote:
> > On Wed, Mar 21, 2012 at 4:16 PM, Thomas Fjellstrom <thomas@fjellstrom.ca>
> 
> wrote:
> > > I recently got an IBM M1015 (MegaRaid 9240-8i) card, and after getting
> > > a new motherboard, the system now boots, but the megaraid_sas driver
> > > seems to be getting stuck when trying to initialize the card.
> > > 
> > > Looking through the source, it seems to be stuck in the
> > > megasas_adp_reset_gen2 function, in the while loop at the end. Now,
> > > according to the code it can't actually get stuck there permanently,
> > > but it does take quite a while for the loop to finish, and the udev
> > > timeout messages to stop.
> > > 
> > > I've looked around quite a bit, but haven't found any solutions thus
> > > far. If anyone could point me in the right direction I'd appreciate
> > > it.
> > 
> > If you are getting controller resets during driver load, you must not
> > be getting interrupts or firmware is not responding to the inquiry
> > roll-call.  Make sure you have the latest firmware.
> 
> I updated to the latest on LSI's site today before emailing. It changes the
> behavior slightly. With the older firmware, it would not print any of the
> initial reset messages, but would once udev decides to start killing
> modprobe. With the new firmware, I get a:
> 
> ADP_RESET_GEN2: HostDiag=a0
> 
> followed by a bunch of:
> 
> RESET_GEN2: retry=%x, hostdiag=a4
> 
> Now I'm not sure the hostdiag should be different between the two. if this
> aN identifier is similar to the aN identifiers in the MegaCli tool, then
> it would mean its trying to reset a device that doesn't exist? I only have
> a single M1015 card installed.
> 
> > The code at the end of megasas_adp_reset_gen2() just looks for
> > DIAG_RESET_ADAPTER flag to clear on the host diag register when
> > issuing a controller reset... that should happen almost immediately
> > unless there is a hardware or firmware issue.
> > 
> > Are you sure your 'new' motherboard is actually good ?
> 
> It boots and runs fine without the sas card installed. I haven't run any
> heavy load tests, but it seems ok.

Machine has been solid as a rock (sans 9240-8i) for the past month with mild 
to half load. It runs several virtual machines, a nfs share, my firewall, a 
minecraft server, and some other miscellaneous stuff. Not a single hiccup.

> > Can you move your megaraid 9240-8i into a 'known working' system and
> > re-test ?
> 
> Nope. This is the furthest I've gotten it to get with this card installed.
> The old system would fail to boot into grub properly, let alone linux.
> These cards seem to be /very/ picky about what motherboard you install
> them in.
> 
> > -Adam

I just got a second M1015 card in today and gave it a go. Similar issues, 
different log messages. (hand typed from picture taken of screens)

Lots of:

megasas: Waiting for 1 commands to complete

for quite a while (5-10 minutes), along with udevd trying to kill modprobe. 
Then:

megasas: moving cmd[0]:hexstringherewithcolons queue as internal
megaraid_sas: FW detected to be in fault state, restarting it...
ADP_RESET_GEN2: HostDiag=a0
megaraid_sas: FW restarted successfully,initializing next stage...
megaraid_sas: HBA recovery state machine,state 1 starting...
(sits here for a while)
megasas: Waiting for FW to come to ready state
megasas: FW now in ready state
megaraid_sas: command hexstringhere, hexstringhere detected (something?) while 
HBA reset
megasas: command hexstring scsi cmd [12]detected on the internal (something?) 
again
megasas: reset successful
scsi:0:0:0:0: megasas: RESET cmd=12 retries=0
megaraid_sas: no pending cmds after reset
megasas: reset successful
scsi:0:0:0:0: megasas: RESET cmd=12 retries=0
megaraid_sas: no pending cmds after reset
megasas: reset successful
scsi:0:0:0:0: Device offlined - not ready after error recovery
(other scsi devices are detected)
(bootup hangs here)

Eventually theres some "hung task" timeout backtraces. This is where I tried 
to kill udevd, CTRL+C didn't stop it from trying to kill modprobe, and 
ALT+SYSRQ+K caused a silent oops (keyboard leds blinking, no backtrace or OOPS 
text). If its similar to last time, eventually the kernel will outright OOPS 
without any intervention.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

  reply	other threads:[~2012-04-11 20:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-21 23:16 Thomas Fjellstrom
2012-03-21 23:36 ` adam radford
2012-03-21 23:52   ` Thomas Fjellstrom
2012-04-11 20:17     ` Thomas Fjellstrom [this message]
2012-04-11 20:57       ` adam radford
2012-04-11 21:44         ` Thomas Fjellstrom
2012-04-12  8:11           ` adam radford
2012-04-12 18:16             ` Thomas Fjellstrom
2012-04-13 18:50               ` Thomas Fjellstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201204111417.28893.thomas@fjellstrom.ca \
    --to=thomas@fjellstrom.ca \
    --cc=aradford@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --subject='Re: stuck in megaraid_sas.c megasas_adp_reset_gen2' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.