linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel 5.8 and 5.9 fail to boot on C8000
@ 2020-10-20 13:45 Helge Deller
  2020-10-20 14:23 ` John David Anglin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Helge Deller @ 2020-10-20 13:45 UTC (permalink / raw)
  To: linux-parisc; +Cc: James Bottomley

Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
machines with this error:
 mptspi: probe of 0000:40:01.0 failed with error -12
 mptbase: ioc1: ERROR - Insufficient memory to add adapter!
 mptspi: probe of 0000:40:01.1 failed with error -12

The c8000 has a built-in Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual
Ultra320 SCSI controller.

Do other people see this as well?
Any idea?

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
@ 2020-10-20 14:23 ` John David Anglin
  2020-10-20 17:06 ` Jeroen Roovers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2020-10-20 14:23 UTC (permalink / raw)
  To: Helge Deller, linux-parisc; +Cc: James Bottomley

On 2020-10-20 9:45 a.m., Helge Deller wrote:
> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> machines with this error:
>  mptspi: probe of 0000:40:01.0 failed with error -12
>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>  mptspi: probe of 0000:40:01.1 failed with error -12
>
> The c8000 has a built-in Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual
> Ultra320 SCSI controller.
>
> Do other people see this as well?
> Any idea?
I'm not seeing on my c8000:

[   25.403687] mptbase alternatives: applied 0 out of 3 patches
...
[   26.362776] Fusion MPT base driver 3.04.20
...
[   27.234708] Fusion MPT SPI Host driver 3.04.20
[   27.322420] ohci_hcd alternatives: applied 0 out of 13 patches
[   27.344976] mptbase: ioc0: Initiating bringup
...

Recent versions of 5.8 don't seem very stable on rp3440:
Log Entry 105: 20 Oct 2020 03:52:27
Alert Level 2: Informational
Keyword: Type-02 127002 1208322
Soft Reset
Logged by: Baseboard Management Controller;
Sensor: System Event
0x205F8E5EFB0208D0 FFFF027000120300


Log Entry 104: 20 Oct 2020 03:52:27
Alert Level 5: Critical
Keyword: Type-02 236f01 2322177
Watchdog timer expired - hard reset
Logged by: Baseboard Management Controller;
Sensor: Watchdog 2 - Watchdog Timer
Data1: Hard Reset
Data2: EVT Ext1: 0x04
0x205F8E5EFB0208C0 FF04C16F0C230300

Earlier versions seemed better.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
  2020-10-20 14:23 ` John David Anglin
@ 2020-10-20 17:06 ` Jeroen Roovers
  2020-10-21 15:52 ` James Bottomley
  2020-10-21 19:23 ` Meelis Roos
  3 siblings, 0 replies; 11+ messages in thread
From: Jeroen Roovers @ 2020-10-20 17:06 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc, James Bottomley

On Tue, 20 Oct 2020 15:45:27 +0200
Helge Deller <deller@gmx.de> wrote:

> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> machines with this error:
>  mptspi: probe of 0000:40:01.0 failed with error -12
>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>  mptspi: probe of 0000:40:01.1 failed with error -12

> Do other people see this as well?

No, it works fine for me on a C8000. I am seeing another "memory
related" problem in 5.9 that I will report shortly, though.


Regards,
     jer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
  2020-10-20 14:23 ` John David Anglin
  2020-10-20 17:06 ` Jeroen Roovers
@ 2020-10-21 15:52 ` James Bottomley
  2020-10-21 16:10   ` Helge Deller
  2020-10-21 19:23 ` Meelis Roos
  3 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2020-10-21 15:52 UTC (permalink / raw)
  To: Helge Deller, linux-parisc

On Tue, 2020-10-20 at 15:45 +0200, Helge Deller wrote:
> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> machines with this error:
>  mptspi: probe of 0000:40:01.0 failed with error -12
>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>  mptspi: probe of 0000:40:01.1 failed with error -12

I think you've already figured out that this is an allocation issue. 
However, it does seem fishy, the code is

	ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_KERNEL);
	if (ioc == NULL) {
		printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to
add adapter!\n");
		return -ENOMEM;
	}

And MPT_ADAPTER should be just under a page which looks like a very odd
allocation to fail so early in boot.  The memory subsystem should have
also printed out a trace explaining why it failed the allocation.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 15:52 ` James Bottomley
@ 2020-10-21 16:10   ` Helge Deller
  2020-10-21 17:12     ` John David Anglin
  2020-10-22  7:28     ` Helge Deller
  0 siblings, 2 replies; 11+ messages in thread
From: Helge Deller @ 2020-10-21 16:10 UTC (permalink / raw)
  To: James Bottomley, linux-parisc

On 10/21/20 5:52 PM, James Bottomley wrote:
> On Tue, 2020-10-20 at 15:45 +0200, Helge Deller wrote:
>> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
>> machines with this error:
>>  mptspi: probe of 0000:40:01.0 failed with error -12
>>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>>  mptspi: probe of 0000:40:01.1 failed with error -12
>
> I think you've already figured out that this is an allocation issue.
> However, it does seem fishy, the code is
>
> 	ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_KERNEL);
> 	if (ioc == NULL) {
> 		printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to
> add adapter!\n");
> 		return -ENOMEM;
> 	}
>
> And MPT_ADAPTER should be just under a page which looks like a very odd
> allocation to fail so early in boot.  The memory subsystem should have
> also printed out a trace explaining why it failed the allocation.

I think there are a few issues here.
First, the allocation issue as seen above is from a current git head,
where it seems memory allocation is somewhat broken. For now I would ignore it
until git head stabilizes...

Then, in my machine I have two U320 drives, one "SEAGATE ST373307LW", and one
"HP 73.4GMAW3073NP". It seems both drives start to fail, because
even in the firmware when running "search for boot devices", they sometime
fail to be detected.

The good thing with bad drives is, that with those it's now possible to
debug error code paths in the drivers. In my case the last syslog
looks like this (I'm currently testing with Linus plain v5.9 kernel now).

+[ 1126.041880] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
+Begin: Waiting for root file system ...
+[ 1127.069515] scsi host2: error handler thread failed to spawn, error = -4
+[ 1127.069515] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
+<Cpu1> 78000c6201e00000  a0e008c01100b009  CC_PAT_ENCODED_FIELD_WARNING
+<Cpu1> 76000c6801e00000  0000000000000520  CC_PAT_DATA_FIELD_WARNING
<XXX: here is something missing - serial port is often not fast enough....>
+[ 1127.069515] Backtrace:
+[ 1127.069515]  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
+[ 1127.069515]  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
+[ 1127.069515]  [<0000000040add668>] really_probe+0x1bc/0x988
+[ 1127.069515]  [<0000000040ade704>] driver_probe_device+0x160/0x218
+[ 1127.069515]  [<0000000040adee24>] device_driver_attach+0x160/0x188
+[ 1127.069515]  [<0000000040adef90>] __driver_attach+0x144/0x320
+[ 1127.069515]  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
+[ 1127.069515]  [<0000000040adc138>] driver_attach+0x4c/0x80
+[ 1127.069515]  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
+[ 1127.069515]  [<0000000040ae0130>] driver_register+0xf4/0x298
+[ 1127.069515]  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
+[ 1127.069515]  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]
+[ 1127.069515]  [<0000000040200f18>] do_one_initcall+0x74/0x314
+[ 1127.069515]  [<00000000403528c0>] do_init_module+0xb4/0x640
+[ 1127.069515]  [<0000000040356a24>] load_module+0x3a48/0x493c
+[ 1127.069515]  [<0000000040357d58>] __do_sys_finit_module+0x120/0x1bc
+[ 1127.069515]  [<0000000040357e84>] sys_finit_module+0x30/0xa0
+[ 1127.069515]  [<0000000040210054>] syscall_exit+0x0/0x14
+[ 1127.069515]
+[ 1127.069515] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000000007d0
+[ 1127.069515] CPU: 1 PID: 94 Comm: systemd-udevd Tainted: G            E     5.9.0-1-parisc64 #1 Debian 5.9.1-1
+[ 1127.069515] Hardware name: 9000/785/C8000
+[ 1127.069515]
+[ 1127.069515]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
+[ 1127.069515] PSW: 00001000000011101111111000001111 Tainted: G            E
+[ 1127.069515] r00-03  000000ff080efe0f 000000413a6a4d60 000000000c1f8be8 000000413a6a4e00
+[ 1127.069515] r04-07  000000000c1f7000 0000004087ce3000 000000007f41e000 0000000000000000
+[ 1127.069515] r08-11  0000004087ce3000 000000001045e500 000000001045e6f8 000000004158ea68
+[ 1127.069515] r12-15  0000000000000002 0000000000000000 000000413a6a44a0 0000000040f92680
+[ 1127.069515] r16-19  0000000000000cc0 0000000000000002 000000001045eaa0 0000000005c47000
+[ 1127.069515] r20-23  000000000800000e 000000004c2ce5ae 0000000000000384 0000000000000000
+[ 1127.069515] r24-27  0000000000000143 000000000800000e 0000000000000000 000000000c1f7000
+[ 1127.069515] r28-31  00000000000005c8 000000413a6a4e70 000000413a6a4ea0 0000000041430aa0
+[ 1127.069515] sr00-03  0000000000002800 0000000000000000 0000000000000000 0000000000019000

The string "WARNING - Unable to register controller with SCSI subsystem" is
from drivers/message/fusion/mptspi.c: mptspi_probe():
        sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
        if (!sh) {
                printk(MYIOC_s_WARN_FMT
                        "Unable to register controller with SCSI subsystem\n",
                        ioc->name);
                error = -1;
                goto out_mptspi_probe;
        }

so, the kernel jumps to:
out_mptspi_probe:
        mptscsih_remove(pdev);
        return error;

Somewhere inside mptscsih_remove() the kernel crashes with a "Data memory access rights trap".
At first thought I assumed ioc->sh had an invalid value, but debugging showed that it's 0UL.
Do you have an idea what's going wrong in mptscsih_remove().
I'd expect the kernel to free all memory, ignore those drives and continue booting (and fail
later in the boot process because the root drive isn't found then).

Any idea what I could test?

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 16:10   ` Helge Deller
@ 2020-10-21 17:12     ` John David Anglin
  2020-10-21 18:02       ` Helge Deller
  2020-10-21 22:32       ` James Bottomley
  2020-10-22  7:28     ` Helge Deller
  1 sibling, 2 replies; 11+ messages in thread
From: John David Anglin @ 2020-10-21 17:12 UTC (permalink / raw)
  To: Helge Deller, James Bottomley, linux-parisc

On 2020-10-21 12:10 p.m., Helge Deller wrote:
> Any idea what I could test?
Try kernel a build with gcc-9 or earlier.  It appears there are problem(s) with gcc-10.  I'm getting all kinds
of random issues building glibc.
https://buildd.debian.org/status/logs.php?pkg=glibc&ver=2.31-4&arch=hppa

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 17:12     ` John David Anglin
@ 2020-10-21 18:02       ` Helge Deller
  2020-10-21 18:29         ` John David Anglin
  2020-10-21 22:32       ` James Bottomley
  1 sibling, 1 reply; 11+ messages in thread
From: Helge Deller @ 2020-10-21 18:02 UTC (permalink / raw)
  To: John David Anglin, James Bottomley, linux-parisc

On 10/21/20 7:12 PM, John David Anglin wrote:
> On 2020-10-21 12:10 p.m., Helge Deller wrote:
>> Any idea what I could test?
> Try kernel a build with gcc-9 or earlier.  It appears there are problem(s) with gcc-10.  I'm getting all kinds
> of random issues building glibc.
> https://buildd.debian.org/status/logs.php?pkg=glibc&ver=2.31-4&arch=hppa

Thanks for this hint!
Actually, I do build my kernels with a gcc-9.2.1 cross-compiler, so
in this case that's not the reason.

Regarding the memory allocation issues, it's being discussed upstream:
https://lore.kernel.org/lkml/CAHk-=wg5-P79Hr4iaC_disKR2P+7cRVqBA9Dsria9jdVwHo0+A@mail.gmail.com/

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 18:02       ` Helge Deller
@ 2020-10-21 18:29         ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2020-10-21 18:29 UTC (permalink / raw)
  To: Helge Deller, James Bottomley, linux-parisc

On 2020-10-21 2:02 p.m., Helge Deller wrote:
> On 10/21/20 7:12 PM, John David Anglin wrote:
>> On 2020-10-21 12:10 p.m., Helge Deller wrote:
>>> Any idea what I could test?
>> Try kernel a build with gcc-9 or earlier.  It appears there are problem(s) with gcc-10.  I'm getting all kinds
>> of random issues building glibc.
>> https://buildd.debian.org/status/logs.php?pkg=glibc&ver=2.31-4&arch=hppa
> Thanks for this hint!
> Actually, I do build my kernels with a gcc-9.2.1 cross-compiler, so
> in this case that's not the reason.
What about Debian kernels?
>
> Regarding the memory allocation issues, it's being discussed upstream:
> https://lore.kernel.org/lkml/CAHk-=wg5-P79Hr4iaC_disKR2P+7cRVqBA9Dsria9jdVwHo0+A@mail.gmail.com/
>
> Helge

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
                   ` (2 preceding siblings ...)
  2020-10-21 15:52 ` James Bottomley
@ 2020-10-21 19:23 ` Meelis Roos
  3 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2020-10-21 19:23 UTC (permalink / raw)
  To: Helge Deller; +Cc: James Bottomley, linux-parisc

20.10.20 16:45 Helge Deller wrote:
> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> machines with this error:
>   mptspi: probe of 0000:40:01.0 failed with error -12
>   mptbase: ioc1: ERROR - Insufficient memory to add adapter!
>   mptspi: probe of 0000:40:01.1 failed with error -12
> 
> The c8000 has a built-in Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual
> Ultra320 SCSI controller.
> 
> Do other people see this as well?

Works for me on a rp2470 with this card but no disks attached (gcc-10.2 from Gentoo, this test machine has usually been stable):

lspci:
00:00.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
00:01.0 SCSI storage controller: Broadcom / LSI 53C896/897 (rev 07)
00:01.1 SCSI storage controller: Broadcom / LSI 53C896/897 (rev 07)
00:02.0 SCSI storage controller: Broadcom / LSI 53c875 (rev 37)
00:02.1 SCSI storage controller: Broadcom / LSI 53c875 (rev 37)
00:04.0 System peripheral: Hewlett-Packard Company Diva [GSP] Management Board (rev 01)
00:04.1 Serial controller: Hewlett-Packard Company Diva Serial [GSP] Multiport UART (rev 03)
10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mach64 GT-B [3D Rage II+ DVD] (rev 9a)
20:00.0 SCSI storage controller: Broadcom / LSI 53c1010 66MHz  Ultra3 SCSI Adapter (rev 01)
20:00.1 SCSI storage controller: Broadcom / LSI 53c1010 66MHz  Ultra3 SCSI Adapter (rev 01)
30:00.0 SCSI storage controller: Adaptec AIC-7870P/7881U [AHA-2940U/UW/D/S76] (rev 01)
30:02.0 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
30:02.1 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)

dmesg:
[    0.000000] Linux version 5.9.0 (mroos@rp2470) (hppa64-unknown-linux-gnu-gcc (Gentoo 10.2.0 p1) 10.2.0, GNU ld (Gentoo 2.33.1 p1) 2.33.1) #248 Mon Oct 12 19:01:45 EEST 2020
...
[    2.623778] pci 0000:30:02.0: [1000:0030] type 00 class 0x010000
[    2.625453] pci 0000:30:02.0: reg 0x10: [io  0x30000-0x300ff]
[    2.625712] pci 0000:30:02.0: reg 0x14: [mem 0x00000000-0x0001ffff 64bit]
[    2.627793] pci 0000:30:02.0: reg 0x1c: [mem 0x00000000-0x0001ffff 64bit]
[    2.630015] pci 0000:30:02.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[    2.630532] pci 0000:30:02.0: supports D1 D2
[    2.632633] pci 0000:30:02.1: [1000:0030] type 00 class 0x010000
[    2.632909] pci 0000:30:02.1: reg 0x10: [io  0x30000-0x300ff]
[    2.647585] pci 0000:30:02.1: reg 0x14: [mem 0x00000000-0x0001ffff 64bit]
[    2.650580] pci 0000:30:02.1: reg 0x1c: [mem 0x00000000-0x0001ffff 64bit]
[    2.650876] pci 0000:30:02.1: reg 0x30: [mem 0x00000000-0x000fffff pref]
[    2.655192] pci 0000:30:02.1: supports D1 D2
[    2.657716] pci 0000:30:02.0: can't claim BAR 0 [io  0x30000-0x300ff]: address conflict with 0000:30:00.0 [io  0x30000-0x300ff]
[    2.659477] pci 0000:30:02.1: can't claim BAR 0 [io  0x30000-0x300ff]: address conflict with 0000:30:00.0 [io  0x30000-0x300ff]
[    2.661856] pci 0000:30:02.0: BAR 6: assigned [mem 0xfffffffffb000000-0xfffffffffb0fffff pref]
[    2.664079] pci 0000:30:02.1: BAR 6: assigned [mem 0xfffffffffb100000-0xfffffffffb1fffff pref]
[    2.666330] pci 0000:30:02.0: BAR 1: assigned [mem 0xfffffffffb200000-0xfffffffffb21ffff 64bit]
[    2.674350] pci 0000:30:02.0: BAR 3: assigned [mem 0xfffffffffb220000-0xfffffffffb23ffff 64bit]
[    2.674679] pci 0000:30:02.1: BAR 1: assigned [mem 0xfffffffffb240000-0xfffffffffb25ffff 64bit]
[    2.676975] pci 0000:30:02.1: BAR 3: assigned [mem 0xfffffffffb260000-0xfffffffffb27ffff 64bit]
[    2.679263] pci 0000:30:00.0: BAR 6: assigned [mem 0xfffffffffb280000-0xfffffffffb28ffff pref]
[    2.681484] pci 0000:30:00.0: BAR 1: assigned [mem 0xfffffffffb290000-0xfffffffffb290fff]
[    2.683749] pci 0000:30:02.0: BAR 0: assigned [io  0x30100-0x301ff]
[    2.829064] pci 0000:30:02.1: BAR 0: assigned [io  0x30200-0x302ff]
...
[   59.788086] Fusion MPT SPI Host driver 3.04.20
[   59.788436] mptspi 0000:30:02.0: enabling device (0000 -> 0002)
[   59.912240] mptspi 0000:30:02.0: enabling SERR and PARITY (0002 -> 0142)
[   61.848400] scsi host7: ioc0: LSI53C1030 C0, FwRev=01032341h, Ports=1, MaxQ=255, IRQ=26
[   62.541761] mptspi 0000:30:02.1: enabling device (0000 -> 0002)
[   62.542024] mptspi 0000:30:02.1: enabling SERR and PARITY (0002 -> 0142)
[   62.548157] mptbase: ioc1: Initiating bringup
[   63.272365] ioc1: LSI53C1030 C0: Capabilities={Initiator,Target}
[   64.020620] scsi host8: ioc1: LSI53C1030 C0, FwRev=01032341h, Ports=1, MaxQ=255, IRQ=27

-- 
Meelis Roos <mroos@linux.ee>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 17:12     ` John David Anglin
  2020-10-21 18:02       ` Helge Deller
@ 2020-10-21 22:32       ` James Bottomley
  1 sibling, 0 replies; 11+ messages in thread
From: James Bottomley @ 2020-10-21 22:32 UTC (permalink / raw)
  To: John David Anglin, Helge Deller, linux-parisc

On Wed, 2020-10-21 at 13:12 -0400, John David Anglin wrote:
> On 2020-10-21 12:10 p.m., Helge Deller wrote:
> > Any idea what I could test?
> Try kernel a build with gcc-9 or earlier.  It appears there are
> problem(s) with gcc-10.  I'm getting all kinds
> of random issues building glibc.
> https://buildd.debian.org/status/logs.php?pkg=glibc&ver=2.31-4&arch=hppa

This version of the kernel built with gcc-10 is working for me on my
Mako system:

   Linux version 5.9.0 (jejb@ion) (hppa64-linux-gnu-gcc (GCC) 10.2.0,
   GNU ld (GNU Binutils for Debian) 2.35) #1 SMP Wed Oct 21 09:35:50
   PDT 2020

So if it is gcc-10 I'm not seeing the problem.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 5.8 and 5.9 fail to boot on C8000
  2020-10-21 16:10   ` Helge Deller
  2020-10-21 17:12     ` John David Anglin
@ 2020-10-22  7:28     ` Helge Deller
  1 sibling, 0 replies; 11+ messages in thread
From: Helge Deller @ 2020-10-22  7:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-parisc

* Helge Deller <deller@gmx.de>:
> On 10/21/20 5:52 PM, James Bottomley wrote:
> > On Tue, 2020-10-20 at 15:45 +0200, Helge Deller wrote:
> >> Latest Linux kernels v5.8 and v5.9 fail to boot for me on the C8000
> >> machines with this error:
> >>  mptspi: probe of 0000:40:01.0 failed with error -12
> >>  mptbase: ioc1: ERROR - Insufficient memory to add adapter!
> >>  mptspi: probe of 0000:40:01.1 failed with error -12
> >
> > I think you've already figured out that this is an allocation issue.
> > However, it does seem fishy, the code is
> >
> > 	ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_KERNEL);
> > 	if (ioc == NULL) {
> > 		printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to
> > add adapter!\n");
> > 		return -ENOMEM;
> > 	}
> >
> > And MPT_ADAPTER should be just under a page which looks like a very odd
> > allocation to fail so early in boot.  The memory subsystem should have
> > also printed out a trace explaining why it failed the allocation.
>
> I think there are a few issues here.
> First, the allocation issue as seen above is from a current git head,
> where it seems memory allocation is somewhat broken. For now I would ignore it
> until git head stabilizes...
>
> Then, in my machine I have two U320 drives, one "SEAGATE ST373307LW", and one
> "HP 73.4GMAW3073NP". It seems both drives start to fail, because
> even in the firmware when running "search for boot devices", they sometime
> fail to be detected.
>
> The good thing with bad drives is, that with those it's now possible to
> debug error code paths in the drivers. In my case the last syslog
> looks like this (I'm currently testing with Linus plain v5.9 kernel now).
>
> +[ 1126.041880] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
> +Begin: Waiting for root file system ...
> +[ 1127.069515] scsi host2: error handler thread failed to spawn, error = -4
> +[ 1127.069515] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
> +<Cpu1> 78000c6201e00000  a0e008c01100b009  CC_PAT_ENCODED_FIELD_WARNING
> +<Cpu1> 76000c6801e00000  0000000000000520  CC_PAT_DATA_FIELD_WARNING
> <XXX: here is something missing - serial port is often not fast enough....>
> +[ 1127.069515] Backtrace:
> +[ 1127.069515]  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
> +[ 1127.069515]  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
> +[ 1127.069515]  [<0000000040add668>] really_probe+0x1bc/0x988
> +[ 1127.069515]  [<0000000040ade704>] driver_probe_device+0x160/0x218
> +[ 1127.069515]  [<0000000040adee24>] device_driver_attach+0x160/0x188
> +[ 1127.069515]  [<0000000040adef90>] __driver_attach+0x144/0x320
> +[ 1127.069515]  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
> +[ 1127.069515]  [<0000000040adc138>] driver_attach+0x4c/0x80
> +[ 1127.069515]  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
> +[ 1127.069515]  [<0000000040ae0130>] driver_register+0xf4/0x298
> +[ 1127.069515]  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
> +[ 1127.069515]  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]
> +[ 1127.069515]  [<0000000040200f18>] do_one_initcall+0x74/0x314
> +[ 1127.069515]  [<00000000403528c0>] do_init_module+0xb4/0x640
> +[ 1127.069515]  [<0000000040356a24>] load_module+0x3a48/0x493c
> +[ 1127.069515]  [<0000000040357d58>] __do_sys_finit_module+0x120/0x1bc
> +[ 1127.069515]  [<0000000040357e84>] sys_finit_module+0x30/0xa0
> +[ 1127.069515]  [<0000000040210054>] syscall_exit+0x0/0x14
> +[ 1127.069515]
> +[ 1127.069515] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000000007d0
> +[ 1127.069515] CPU: 1 PID: 94 Comm: systemd-udevd Tainted: G            E     5.9.0-1-parisc64 #1 Debian 5.9.1-1
> +[ 1127.069515] Hardware name: 9000/785/C8000
> +[ 1127.069515]
> +[ 1127.069515]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> +[ 1127.069515] PSW: 00001000000011101111111000001111 Tainted: G            E
> +[ 1127.069515] r00-03  000000ff080efe0f 000000413a6a4d60 000000000c1f8be8 000000413a6a4e00
> +[ 1127.069515] r04-07  000000000c1f7000 0000004087ce3000 000000007f41e000 0000000000000000
> +[ 1127.069515] r08-11  0000004087ce3000 000000001045e500 000000001045e6f8 000000004158ea68
> +[ 1127.069515] r12-15  0000000000000002 0000000000000000 000000413a6a44a0 0000000040f92680
> +[ 1127.069515] r16-19  0000000000000cc0 0000000000000002 000000001045eaa0 0000000005c47000
> +[ 1127.069515] r20-23  000000000800000e 000000004c2ce5ae 0000000000000384 0000000000000000
> +[ 1127.069515] r24-27  0000000000000143 000000000800000e 0000000000000000 000000000c1f7000
> +[ 1127.069515] r28-31  00000000000005c8 000000413a6a4e70 000000413a6a4ea0 0000000041430aa0
> +[ 1127.069515] sr00-03  0000000000002800 0000000000000000 0000000000000000 0000000000019000
>
> The string "WARNING - Unable to register controller with SCSI subsystem" is
> from drivers/message/fusion/mptspi.c: mptspi_probe():
>         sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
>         if (!sh) {
>                 printk(MYIOC_s_WARN_FMT
>                         "Unable to register controller with SCSI subsystem\n",
>                         ioc->name);
>                 error = -1;
>                 goto out_mptspi_probe;
>         }
>
> so, the kernel jumps to:
> out_mptspi_probe:
>         mptscsih_remove(pdev);
>         return error;
>
> Somewhere inside mptscsih_remove() the kernel crashes with a "Data memory access rights trap".
> At first thought I assumed ioc->sh had an invalid value, but debugging showed that it's 0UL.
> Do you have an idea what's going wrong in mptscsih_remove().
> I'd expect the kernel to free all memory, ignore those drives and continue booting (and fail
> later in the boot process because the root drive isn't found then).

Everyone can trigger the fault (on any architecture) by this patch:

diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c
index eabc4de5816c..1f26ecea4c95 100644
--- a/drivers/message/fusion/mptspi.c
+++ b/drivers/message/fusion/mptspi.c
@@ -1404,6 +1404,7 @@ mptspi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}

 	sh = scsi_host_alloc(&mptspi_driver_template, sizeof(MPT_SCSI_HOST));
+	sh = NULL;

 	if (!sh) {
 		printk(MYIOC_s_WARN_FMT


With the patch below the driver now cleanly exits:

[ 1119.508147] Fusion MPT base driver 3.04.20
[ 1119.508147] Copyright (c) 1999-2008 LSI Corporation
[ 1119.508147] Fusion MPT SPI Host driver 3.04.20
[ 1119.508147] mptbase: ioc0: Initiating bringup
[ 1119.508147] sr 1:0:0:0: [sr0] scsi3-mmc drive: 40x/40x cd/rw xa/form2 cdda tray
[ 1119.508147] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 1119.508147] ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
[ 1121.512619] mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
[ 1121.512619] mptspi: probe of 0000:40:01.0 failed with error -1
[ 1121.512619] mptbase: ioc1: Initiating bringup
[ 1122.508645] ioc1: LSI53C1030 B2: Capabilities={Initiator,Target}
[ 1122.508645] mptspi: ioc1: WARNING - Unable to register controller with SCSI subsystem
[ 1123.417139] mptspi: probe of 0000:40:01.1 failed with error -1
[ 1123.487494] Fusion MPT FC Host driver 3.04.20
[ 1123.487494] Fusion MPT SAS Host driver 3.04.20
[ 1123.487494] Fusion MPT misc device (ioctl) driver 3.04.20
[ 1123.487494] mptctl: Registered with Fusion MPT base driver
[ 1123.487494] mptctl: /dev/mptctl @ (major,minor=10,220)


I'll send this patch to the scsi mailing list shortly:


[PATCH] scsi: mptfusion: Fix error paths in mptscsih_remove()

Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
index 8543f0324d5a..0d1b2b0eb843 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1176,8 +1176,10 @@ mptscsih_remove(struct pci_dev *pdev)
 	MPT_SCSI_HOST		*hd;
 	int sz1;

-	if((hd = shost_priv(host)) == NULL)
-		return;
+	if (host == NULL)
+		hd = NULL;
+	else
+		hd = shost_priv(host);

 	mptscsih_shutdown(pdev);

@@ -1193,14 +1195,15 @@ mptscsih_remove(struct pci_dev *pdev)
 	    "Free'd ScsiLookup (%d) memory\n",
 	    ioc->name, sz1));

-	kfree(hd->info_kbuf);
+	if (hd)
+		kfree(hd->info_kbuf);

 	/* NULL the Scsi_Host pointer
 	 */
 	ioc->sh = NULL;

-	scsi_host_put(host);
-
+	if (host)
+		scsi_host_put(host);
 	mpt_detach(pdev);

 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-22  7:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 13:45 Kernel 5.8 and 5.9 fail to boot on C8000 Helge Deller
2020-10-20 14:23 ` John David Anglin
2020-10-20 17:06 ` Jeroen Roovers
2020-10-21 15:52 ` James Bottomley
2020-10-21 16:10   ` Helge Deller
2020-10-21 17:12     ` John David Anglin
2020-10-21 18:02       ` Helge Deller
2020-10-21 18:29         ` John David Anglin
2020-10-21 22:32       ` James Bottomley
2020-10-22  7:28     ` Helge Deller
2020-10-21 19:23 ` Meelis Roos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).