megaraid_sas problem for scsi_add_host() fail

* megaraid_sas problem for scsi_add_host() fail
@ 2020-02-27 18:41 John Garry
  2020-03-02 11:10 ` Sumit Saxena
  0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2020-02-27 18:41 UTC (permalink / raw)
  To: Kashyap Desai, Sumit Saxena, Shivasharan S; +Cc: megaraidlinux.pdl, linux-scsi

Hi guys,

While do some testing, I noticed that the megaraid SAS driver has a 
problem and looks to corrupt my system when the call to scsi_add_host() 
in the driver fails:

[   62.516871] megasas: 07.713.01.00-rc1
[   62.526189] megaraid_sas 0000:08:00.0: Adding to iommu group 1
[   62.571790] megaraid_sas 0000:08:00.0: BAR:0x0  BAR's 
base_addr(phys):0x0000080010000000  mapped virt_addr:0x(____ptrval____)
[   62.571802] megaraid_sas 0000:08:00.0: FW now in Ready state
[   62.583811] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit 
consistent mask
[   62.602143] megaraid_sas 0000:08:00.0: firmware supports msix : (128)
[   62.780250] megaraid_sas 0000:08:00.0: requested/available msix 128/128
[   62.794292] megaraid_sas 0000:08:00.0: current msix/online cpus : 
(128/128)
[   62.809011] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled)
[   62.820968] megaraid_sas 0000:08:00.0: Current firmware supports 
maximum commands: 4077 LDIO threshold: 0
[   62.937043] megaraid_sas 0000:08:00.0: Configured max firmware 
commands: 4076
[   63.509185] megaraid_sas 0000:08:00.0: Performance mode :Latency
[   63.521906] megaraid_sas 0000:08:00.0: FW supports sync cache : Yes
[   63.535148] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x40000009
[   63.610607] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs: 
1 max_lds: 64
[   63.626618] megaraid_sas 0000:08:00.0: controller type : MR(2048MB)
[   63.639870] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR) : 
Enabled
[   63.654945] megaraid_sas 0000:08:00.0: Secure JBOD support : Yes
[   63.667661] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes
[   63.667672] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset 
timeout : 6 secs/60 secs
[   63.698922] megaraid_sas 0000:08:00.0: JBOD sequence map support : Yes
[   63.712715] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No
[   63.754764] megaraid_sas 0000:08:00.0: NVME page size : (4096)
[   63.787258] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is 
called outbound_intr_mask:0x40000000
[   63.807485] megaraid_sas 0000:08:00.0: INIT adapter done
[   63.822235] megaraid_sas 0000:08:00.0: pci id : 
(0x1000)/(0x0016)/(0x19e5)/(0xd215)
[   63.838652] megaraid_sas 0000:08:00.0: unevenspan support : no
[   63.850980] megaraid_sas 0000:08:00.0: firmware crash dump : no
[   63.863499] megaraid_sas 0000:08:00.0: JBOD sequence map : enabled
[   63.877352] scsi host0: Avago SAS based MegaRAID driver
[   63.890398] megaraid_sas 0000:08:00.0: Failed to add host from 
megasas_io_attach 6802
[   63.906999] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x40000009
[   64.591755] nvme 0000:81:00.0: Adding to iommu group 2
[   64.636476] nvme nvme0: pci function 0000:81:00.0
[   64.669635] libphy: Fixed MDIO Bus: probed
[   64.680255] tun: Universal TUN/TAP device driver, 1.6
[   64.694422] thunder_xcv, ver 1.0
[   64.702042] thunder_bgx, ver 1.0
[   64.709277] nicpf, ver 1.0
[   64.718144] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[   64.730402] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[   64.743337] igb: Intel(R) Gigabit Ethernet Network Driver - version 
5.6.0-k
[   64.754981] nvme nvme0: Removing after probe failure status: -12
[   64.757953] igb: Copyright (c) 2007-2014 Intel Corporation.
[   64.782805] igbvf: Intel(R) Gigabit Virtual Function Network Driver - 
version 2.4.0-k
[   64.799423] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[   64.813848] sky2: driver version 1.30
[   64.825564] VFIO - User Level meta-driver version: 0.3
[   64.848089] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   64.862029] ehci-pci: EHCI PCI platform driver
[   64.873445] ehci-pci 0000:7a:01.0: Adding to iommu group 3
[   64.886700] 
==================================================================
[   64.901999] BUG: KASAN: slab-out-of-bounds in 
run_timer_softirq+0x6f4/0xae0
[   64.916663] Write of size 8 at addr ffff0026b931aae0 by task swapper/0/0

[   64.933914] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775
[   64.952240] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, 
BIOS 2280-V2 CS V3.B160.01 02/24/2020
[   64.972575] Call trace:
[   64.977729]  dump_backtrace+0x0/0x298
[   64.985439]  show_stack+0x14/0x20
[   64.992418]  dump_stack+0x118/0x190
[   64.999762]  print_address_description.isra.9+0x6c/0x3b8
[   65.010953]  __kasan_report+0x134/0x23c
[   65.019029]  kasan_report+0xc/0x18
[   65.026188]  __asan_store8+0x94/0xb8
[   65.033720]  run_timer_softirq+0x6f4/0xae0
[   65.042343]  efi_header_end+0x16c/0x840
[   65.050420]  irq_exit+0x19c/0x1a8
[   65.057396]  __handle_domain_irq+0x7c/0xe0
[   65.066022]  gic_handle_irq+0x64/0x168
[   65.073917]  el1_irq+0xbc/0x180
[   65.080528]  arch_cpu_idle+0x3c/0x320
[   65.088239]  default_idle_call+0x28/0x4c
[   65.096502]  do_idle+0x278/0x348
[   65.103295]  cpu_startup_entry+0x24/0x40
[   65.111554]  rest_init+0x1c4/0x298
[   65.118718]  arch_call_rest_init+0xc/0x14
[   65.127159]  start_kernel+0x848/0x888

[   65.138006] Allocated by task 0:
[   65.144802] (stack is not available)

[   65.155465] Freed by task 0:
[   65.161530] (stack is not available)

[   65.172193] The buggy address belongs to the object at ffff0026b931aa00
  which belongs to the cache pool_workqueue of size 256
[   65.199113] The buggy address is located 224 bytes inside of
  256-byte region [ffff0026b931aa00, ffff0026b931ab00)
[   65.223840] The buggy address belongs to the page:
[   65.233931] page:fffffe009ac4c600 refcount:1 mapcount:0 
mapping:ffff0026dd81c880 index:0xffff0026b931fe00 compound_mapcount: 0
[   65.257923] flags: 0x6ffff00000010200(slab|head)
[   65.267649] raw: 6ffff00000010200 fffffe009b20b208 fffffe009ac07608 
ffff0026dd81c880
[   65.283959] raw: ffff0026b931fe00 0000000000400002 00000001ffffffff 
0000000000000000
[   65.300270] page dumped because: kasan: bad access detected

[   65.315139] Memory state around the buggy address:
[   65.325231]  ffff0026b931a980: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   65.340445]  ffff0026b931aa00: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   65.355660] >ffff0026b931aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   65.370870]                                                        ^
[   65.384256]  ffff0026b931ab00: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   65.399467]  ffff0026b931ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   65.414675] 
==================================================================
[   65.429885] Disabling lock debugging due to kernel taint
[   65.441431] Unable to handle kernel paging request at virtual address 
ffffa0001013c0b0
[   65.441695] ehci-pci 0000:7a:01.0: EHCI Host Controller
[   65.458088] Mem abort info:
[   65.469183] ehci-pci 0000:7a:01.0: new USB bus registered, assigned 
bus number 1
[   65.474927]   ESR = 0x96000007
[   65.491201] ehci-pci 0000:7a:01.0: irq 65, io mem 0x20c101000
[   65.496913]   EC = 0x25: DABT (current EL), IL = 32 bits
[   65.496918]   SET = 0, FnV = 0
[   65.496922]   EA = 0, S1PTW = 0
[   65.522586] ehci-pci 0000:7a:01.0: USB 0.0 started, EHCI 1.00
[   65.526575] Data abort info:
[   65.526580]   ISV = 0, ISS = 0x00000007
[   65.535948] hub 1-0:1.0: USB hub found
[   65.545245]   CM = 0, WnR = 0
[   65.545251] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000052530000
[   65.545256] [ffffa0001013c0b0] pgd=00002027fffff003, 
pud=00002027ffffe003, pmd=00000026dda5b003, pte=0000000000000000
[   65.551519] hub 1-0:1.0: 2 ports detected
[   65.559375] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   65.559379] Modules linked in:
[   65.569534] ehci-platform: EHCI generic platform driver
[   65.573475] CPU: 34 PID: 8 Comm: kworker/u256:0 Tainted: G    B 
       5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775
[   65.573477] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, 
BIOS 2280-V2 CS V3.B160.01 02/24/2020
[   65.573487] Workqueue: poll_megasas0_status megasas_fault_detect_work
[   65.573492] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   65.588048] ehci-orion: EHCI orion driver
[   65.609756] pc : megasas_readl+0x60/0x80
[   65.609759] lr : megasas_readl+0x1c/0x80
[   65.609761] sp : ffff0026d97bfc00
[   65.609763] x29: ffff0026d97bfc00 x28: ffff0026d97a9890
[   65.609767] x27: ffff0026d97a0618 x26: ffff0026d97a9880
[   65.609771] x25: ffff0026d9758808 x24: ffff0026b931aa28
[   65.609775] x23: ffff0026b931aa98 x22: ffffa0002931e000
[   65.609779] x21: ffff0026dd898800 x20: ffff0026b931dcd8
[   65.618543] ehci-exynos: EHCI Exynos driver
[   65.629840] x19: ffffa0001013c0b0 x18: 0000000000000000
[   65.629843] x17: 0000000000001d50 x16: ffffffffffffe240
[   65.629847] x15: 00000000000013a8 x14: 0000000000000000
[   65.629850] x13: 00000000000013a0 x12: 1fffe004db2f7f7c
[   65.629854] x11: ffff8004db2f7f78 x10: dfffa00000000000
[   65.629857] x9 : ffffa00028f679e8 x8 : ffffa0002a483a48
[   65.629861] x7 : ffffa00026d5ed94 x6 : 0000000000000000
[   65.629864] x5 : ffffa0002a483a48 x4 : 0000000000000000
[   65.629868] x3 : ffffa000279df03c x2 : 0000000000000000
[   65.636662] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   65.647207] x1 : ef244e124d671400 x0 : 0000000000000004
[   65.647210] Call trace:
[   65.647214]  megasas_readl+0x60/0x80
[   65.647218]  megasas_read_fw_status_reg_fusion+0x2c/0x38
[   65.647221]  megasas_fault_detect_work+0x44/0x520
[   65.647226]  process_one_work+0x488/0xc08
[   65.647228]  worker_thread+0x68/0x5d0
[   65.647233]  kthread+0x1c8/0x1d0
[   65.669535] ohci-pci: OHCI PCI platform driver
[   65.689683]  ret_from_fork+0x10/0x18
[   65.689689] Code: 54ffff09 a94153f3 a8c27bfd d65f03c0 (b9400260)
[   65.689695] ---[ end trace 3632c7efc4f2d69c ]---

That's 5.6-rc3 .

Please have a look,

John

^ permalink raw reply	[flat|nested] 6+ messages in thread