linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Helge Deller <deller@gmx.de>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-parisc <linux-parisc@vger.kernel.org>
Subject: Re: Kernel 5.8 and 5.9 fail to boot on C8000
Date: Wed, 21 Oct 2020 18:10:45 +0200	[thread overview]
Message-ID: <bbdfecf6-b13f-561f-82f6-1f5e594e02b2@gmx.de> (raw)
In-Reply-To: <37ee0636688c782a59e8b50eae5c41b96926e7ab.camel@HansenPartnership.com>

On 10/21/20 5:52 PM, James Bottomley wrote:
> On Tue, 2020-10-20 at 15:45 +0200, Helge Deller wrote:
>> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
>> machines with this error:
>>  mptspi: probe of 0000:40:01.0 failed with error -12
>>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>>  mptspi: probe of 0000:40:01.1 failed with error -12
>
> I think you've already figured out that this is an allocation issue.
> However, it does seem fishy, the code is
>
> 	ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_KERNEL);
> 	if (ioc == NULL) {
> 		printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to
> add adapter!\n");
> 		return -ENOMEM;
> 	}
>
> And MPT_ADAPTER should be just under a page which looks like a very odd
> allocation to fail so early in boot.  The memory subsystem should have
> also printed out a trace explaining why it failed the allocation.

I think there are a few issues here.
First, the allocation issue as seen above is from a current git head,
where it seems memory allocation is somewhat broken. For now I would ignore it
until git head stabilizes...

Then, in my machine I have two U320 drives, one "SEAGATE ST373307LW", and one
"HP 73.4GMAW3073NP". It seems both drives start to fail, because
even in the firmware when running "search for boot devices", they sometime
fail to be detected.

The good thing with bad drives is, that with those it's now possible to
debug error code paths in the drivers. In my case the last syslog
looks like this (I'm currently testing with Linus plain v5.9 kernel now).

+[ 1126.041880] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
+Begin: Waiting for root file system ...
+[ 1127.069515] scsi host2: error handler thread failed to spawn, error = -4
+[ 1127.069515] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
+<Cpu1> 78000c6201e00000  a0e008c01100b009  CC_PAT_ENCODED_FIELD_WARNING
+<Cpu1> 76000c6801e00000  0000000000000520  CC_PAT_DATA_FIELD_WARNING
<XXX: here is something missing - serial port is often not fast enough....>
+[ 1127.069515] Backtrace:
+[ 1127.069515]  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
+[ 1127.069515]  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
+[ 1127.069515]  [<0000000040add668>] really_probe+0x1bc/0x988
+[ 1127.069515]  [<0000000040ade704>] driver_probe_device+0x160/0x218
+[ 1127.069515]  [<0000000040adee24>] device_driver_attach+0x160/0x188
+[ 1127.069515]  [<0000000040adef90>] __driver_attach+0x144/0x320
+[ 1127.069515]  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
+[ 1127.069515]  [<0000000040adc138>] driver_attach+0x4c/0x80
+[ 1127.069515]  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
+[ 1127.069515]  [<0000000040ae0130>] driver_register+0xf4/0x298
+[ 1127.069515]  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
+[ 1127.069515]  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]
+[ 1127.069515]  [<0000000040200f18>] do_one_initcall+0x74/0x314
+[ 1127.069515]  [<00000000403528c0>] do_init_module+0xb4/0x640
+[ 1127.069515]  [<0000000040356a24>] load_module+0x3a48/0x493c
+[ 1127.069515]  [<0000000040357d58>] __do_sys_finit_module+0x120/0x1bc
+[ 1127.069515]  [<0000000040357e84>] sys_finit_module+0x30/0xa0
+[ 1127.069515]  [<0000000040210054>] syscall_exit+0x0/0x14
+[ 1127.069515]
+[ 1127.069515] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000000007d0
+[ 1127.069515] CPU: 1 PID: 94 Comm: systemd-udevd Tainted: G            E     5.9.0-1-parisc64 #1 Debian 5.9.1-1
+[ 1127.069515] Hardware name: 9000/785/C8000
+[ 1127.069515]
+[ 1127.069515]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
+[ 1127.069515] PSW: 00001000000011101111111000001111 Tainted: G            E
+[ 1127.069515] r00-03  000000ff080efe0f 000000413a6a4d60 000000000c1f8be8 000000413a6a4e00
+[ 1127.069515] r04-07  000000000c1f7000 0000004087ce3000 000000007f41e000 0000000000000000
+[ 1127.069515] r08-11  0000004087ce3000 000000001045e500 000000001045e6f8 000000004158ea68
+[ 1127.069515] r12-15  0000000000000002 0000000000000000 000000413a6a44a0 0000000040f92680
+[ 1127.069515] r16-19  0000000000000cc0 0000000000000002 000000001045eaa0 0000000005c47000
+[ 1127.069515] r20-23  000000000800000e 000000004c2ce5ae 0000000000000384 0000000000000000
+[ 1127.069515] r24-27  0000000000000143 000000000800000e 0000000000000000 000000000c1f7000
+[ 1127.069515] r28-31  00000000000005c8 000000413a6a4e70 000000413a6a4ea0 0000000041430aa0
+[ 1127.069515] sr00-03  0000000000002800 0000000000000000 0000000000000000 0000000000019000

The string "WARNING - Unable to register controller with SCSI subsystem" is
from drivers/message/fusion/mptspi.c: mptspi_probe():
        sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
        if (!sh) {
                printk(MYIOC_s_WARN_FMT
                        "Unable to register controller with SCSI subsystem\n",
                        ioc->name);
                error = -1;
                goto out_mptspi_probe;
        }

so, the kernel jumps to:
out_mptspi_probe:
        mptscsih_remove(pdev);
        return error;

Somewhere inside mptscsih_remove() the kernel crashes with a "Data memory access rights trap".
At first thought I assumed ioc->sh had an invalid value, but debugging showed that it's 0UL.
Do you have an idea what's going wrong in mptscsih_remove().
I'd expect the kernel to free all memory, ignore those drives and continue booting (and fail
later in the boot process because the root drive isn't found then).

Any idea what I could test?

Helge

  reply	other threads:[~2020-10-21 16:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
2020-10-20 14:23 ` John David Anglin
2020-10-20 17:06 ` Jeroen Roovers
2020-10-21 15:52 ` James Bottomley
2020-10-21 16:10   ` Helge Deller [this message]
2020-10-21 17:12     ` John David Anglin
2020-10-21 18:02       ` Helge Deller
2020-10-21 18:29         ` John David Anglin
2020-10-21 22:32       ` James Bottomley
2020-10-22  7:28     ` Helge Deller
2020-10-21 19:23 ` Meelis Roos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbdfecf6-b13f-561f-82f6-1f5e594e02b2@gmx.de \
    --to=deller@gmx.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=linux-parisc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).