All of lore.kernel.org
 help / color / mirror / Atom feed
* ahci KASAN warning in experimental arm64 allmodconfig boot
@ 2019-10-14 18:02 John Garry
  2019-10-15 14:17 ` John Garry
  0 siblings, 1 reply; 3+ messages in thread
From: John Garry @ 2019-10-14 18:02 UTC (permalink / raw)
  To: linux-ide, axboe; +Cc: linux-kernel, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 8010 bytes --]

Hi guys,

I'm experimenting by trying to boot an allmodconfig arm64 kernel, as 
mentioned here:
https://lore.kernel.org/linux-arm-kernel/507325a3-030e-2843-0f46-7e18c60257de@huawei.com/

I am getting a KASAN slab-out-of-bounds warning around the ahci code, as 
below. I am also getting a crash some time after, but the stackframe is 
mixed up in the dmesg, so hard to extract that.

The kernel is v5.4-rc3.

I'll continue to look at this as time allows.

I have cc'ed the scsi list as this may be related to the SAS host (using 
libsas), but I'm doubtful, considering the log mentions ahci_init_one().

Thanks,
John

Log snippet:

[  137.529384][  T187] 
==================================================================
[  137.529403][  T187] BUG: KASAN: slab-out-of-bounds in 
ata_link_next+0x5c/0x1b0
[  137.529411][  T187] Read of size 4 at addr ffff002320073cc8 by task 
kworker/u194:0/187
[  137.529414][  T187]
[  137.529426][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G 
       W         5.4.0-rc3+ #1149
[  137.529433][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon 
D06 UEFI RC0 - V1.16.01 03/15/2019
[  137.529447][  T187] Workqueue: events_unbound async_run_entry_fn
[  137.529454][  T187] Call trace:
[  137.529462][  T187]  dump_backtrace+0x0/0x298
[  137.529469][  T187]  show_stack+0x20/0x30
[  137.529477][  T187]  dump_stack+0x190/0x21c
[  137.529488][  T187]  print_address_description.isra.6+0x80/0x3d0
[  137.529496][  T187]  __kasan_report+0x174/0x23c
[  137.529503][  T187]  kasan_report+0xc/0x18
[  137.529511][  T187]  __asan_load4+0xa4/0xb0
[  137.529517][  T187]  ata_link_next+0x5c/0x1b0
[  137.529526][  T187]  ata_scsi_scan_host+0x50/0x2d0
[  137.529533][  T187]  async_port_probe+0x7c/0xa8
[  137.529541][  T187]  async_run_entry_fn+0x118/0x340
[  137.529549][  T187]  process_one_work+0x7b8/0xdb8
[  137.529556][  T187]  worker_thread+0x414/0x6b8
[  137.529562][  T187]  kthread+0x1d4/0x1f0
[  137.529570][  T187]  ret_from_fork+0x10/0x18
[  137.529574][  T187]
[  137.529580][  T187] Allocated by task 16:
[  137.529589][  T187]  save_stack+0x28/0xb0
[  137.529597][  T187]  __kasan_kmalloc.isra.9+0xa0/0xc8
[  137.529604][  T187]  kasan_slab_alloc+0x14/0x20
[  137.529612][  T187]  kmem_cache_alloc+0x25c/0x420
[  137.529620][  T187]  __proc_create+0x430/0x458
[  137.529627][  T187]  proc_mkdir_data+0x50/0xf0
[  137.529634][  T187]  proc_mkdir+0x2c/0x38
[  137.529642][  T187]  register_irq_proc+0x104/0x278
[  137.529650][  T187]  __setup_irq+0xafc/0xe80
[  137.529657][  T187]  request_threaded_irq+0x198/0x218
[  137.529666][  T187]  devm_request_threaded_irq+0xb0/0x120
[  137.529674][  T187]  ahci_host_activate+0x1cc/0x290
[  137.529682][  T187]  ahci_init_one+0x1358/0x1570
[  137.529691][  T187]  local_pci_probe+0x78/0xf0
[  137.529698][  T187]  work_for_cpu_fn+0x30/0x50
[  137.529705][  T187]  process_one_work+0x7b8/0xdb8
[  137.529713][  T187]  worker_thread+0x4a8/0x6b8
[  137.529719][  T187]  kthread+0x1d4/0x1f0
[  137.529726][  T187]  ret_from_fork+0x10/0x18
[  137.529729][  T187]
[  137.529734][  T187] Freed by task 0:
[  137.529737][  T187] (stack is not available)
[  137.529741][  T187]
[  137.529747][  T187] The buggy address belongs to the object at 
ffff002320073b68
[  137.529747][  T187]  which belongs to the cache proc_dir_entry of 
size 256
[  137.529755][  T187] The buggy address is located 96 bytes to the right of
[  137.529755][  T187]  256-byte region [ffff002320073b68, ffff002320073c68)
[  137.529759][  T187] The buggy address belongs to the page:
[  137.529768][  T187] page:fffffe008c601c00 refcount:1 mapcount:0 
mapping:ffff0020bf1b8480 index:0xffff002320070988 compound_mapcount: 0
[  137.529779][  T187] flags: 0x1ffff00000010200(slab|head)
[  137.529791][  T187] raw: 1ffff00000010200 ffff0020bf1a3cd0 
fffffe008c612208 ffff0020bf1b8480
[  137.529801][  T187] raw: ffff002320070988 000000000035000b 
00000001ffffffff 0000000000000000
[  137.529805][  T187] page dumped because: kasan: bad access detected
[  137.529809][  T187]
[  137.529812][  T187] Memory state around the buggy address:
[  137.529820][  T187]  ffff002320073b80: 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00
[  137.529827][  T187]  ffff002320073c00: 00 00 00 00 00 00 00 00 00 00 
00 00 00 fc fc fc
[  137.529835][  T187] >ffff002320073c80: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  137.529839][  T187]                                               ^
[  137.529846][  T187]  ffff002320073d00: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  137.529853][  T187]  ffff002320073d80: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  137.529857][  T187] 
==================================================================
[  137.529874][  T187] Unable to handle kernel paging request at virtual 
address 000004a02a0040ba
[  137.529883][  T187] Mem abort info:
[  137.529893][  T187]   ESR = 0x96000004
[  137.529905][  T187]   EC = 0x25: DABT (current EL), IL = 32 bits
[  137.529915][  T187]   SET = 0, FnV = 0
[  137.529925][  T187]   EA = 0, S1PTW = 0
[  137.529934][  T187] Data abort info:
[  137.529940][   T16] kobject: '177' ((____ptrval____)): 
kobject_add_internal: parent: 'irq', set: '<NULL>'
[  137.529945][  T187]   ISV = 0, ISS = 0x00000004
[  137.529955][  T187]   CM = 0, WnR = 0
[  137.529966][  T187] [000004a02a0040ba] user address but active_mm is 
swapper
[  137.529979][  T187] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  137.529985][  T187] Modules linked in:
[  137.529996][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G 
   B   W         5.4.0-rc3+ #1149
[  137.530002][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon 
D06 UEFI RC0 - V1.16.01 03/15/2019
[  137.530011][  T187] Workqueue: events_unbound async_run_entry_fn
[  137.530020][  T187] pstate: 40800009 (nZcv daif -PAN +UAO)
[  137.530026][  T187] pc : ata_dev_next+0xf8/0x1f8
[  137.530033][  T187] lr : ata_dev_next+0xf8/0x1f8
[  137.530037][  T187] sp : ffff0020bddafb80
[  137.530042][  T187] x29: ffff0020bddafb80 x28: 0000000000000000
[  137.530050][  T187] x27: 0000000000003cc0 x26: ffffa00015016000
[  137.530059][  T187] x25: 00000000000020c0 x24: ffffa000113102a8
[  137.530068][  T187] x23: 350004a02a0003fa x22: ffffa00011335db0
[  137.530077][  T187] x21: ffffa000113356f0 x20: 0000000000000000
[  137.530085][  T187] x19: ffffa00011336830 x18: 00000000000025a8
[  137.530094][  T187] x17: 00000000000025a0 x16: 00000000000026b0
[  137.530102][  T187] x15: 0000000000001470 x14: 3d3d3d3d3d3d3d3d
[  137.530111][  T187] x13: 3d3d3d3d3d3d3d3d x12: 1fffe00417bb5f3e
[  137.530120][  T187] x11: ffff800417bb5f3e x10: dfffa00000000000
[  137.530129][  T187] x9 : ffff800417bb5f3f x8 : 0000000000000001
[  137.530138][  T187] x7 : 0000000000003cd0 x6 : 0000000000000000
[  137.530146][  T187] x5 : ffffa00015015cc8 x4 : 0000000000000000
[  137.530155][  T187] x3 : ffffa00011301608 x2 : 0000000000000000
[  137.530163][  T187] x1 : 21f045c4027f9500 x0 : 0000000000000000
[  137.530171][  T187] Call trace:
[  137.530178][  T187]  ata_dev_next+0xf8/0x1f8
[  137.530187][  T187]  ata_scsi_scan_host+0x6c/0x2d0
[  137.530194][  T187]  async_port_probe+0x7c/0xa8
[  137.530202][  T187]  async_run_entry_fn+0x118/0x340
[  137.530209][  T187]  process_one_work+0x7b8/0xdb8
[  137.530216][  T187]  worker_thread+0x414/0x6b8
[  137.530223][  T187]  kthread+0x1d4/0x1f0
[  137.530230][  T187]  ret_from_fork+0x10/0x18
[  137.530242][  T187] Code: 97c544f6 d2879801 8b0102e0 97cd4fb5 (f95e62e0)
[  137.530472][  T187] ---[ end trace 1702441cf7c2cd89 ]---
[  137.530479][  T187] Kernel panic - not syncing: Fatal exception
[  137.530592][   T16] kobject: '178' ((____ptrval____)): 
kobject_add_internal: parent: 'irq', set: '<NULL>'
[  137.530653][  T187] SMP: stopping secondary CPUs
[  137.531078][  T187] Kernel Offset: disabled
[  137.531085][  T187] CPU features: 0x0002,23208a38
[  137.531089][  T187] Memory Limit: none
[  143.699079][  T187] ---[ end Kernel panic - not syncing: Fatal 
exception ]---






[-- Attachment #2: ata kasan issue.log --]
[-- Type: text/plain, Size: 7693 bytes --]

[  137.529187][    T8] kobject: '0:0:4:0' ((____ptrval____)): fill_kobj_path: path = '/devices/pci0000:74/0000:74:02.0/host0/port-0:0/expander-0:1/port-0:1:24/end_device-0:1:24/target0:0:4/0:0:4:0/bsg/0:0:4:0'
[  137.529377][   T16] kobject: '176' ((____ptrval____)): kobject_add_internal: parent: 'irq', set: '<NULL>'
[  137.529384][  T187] ==================================================================
[  137.529403][  T187] BUG: KASAN: slab-out-of-bounds in ata_link_next+0x5c/0x1b0
[  137.529411][  T187] Read of size 4 at addr ffff002320073cc8 by task kworker/u194:0/187
[  137.529414][  T187] 
[  137.529426][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G        W         5.4.0-rc3+ #1149
[  137.529433][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[  137.529447][  T187] Workqueue: events_unbound async_run_entry_fn
[  137.529454][  T187] Call trace:
[  137.529462][  T187]  dump_backtrace+0x0/0x298
[  137.529469][  T187]  show_stack+0x20/0x30
[  137.529477][  T187]  dump_stack+0x190/0x21c
[  137.529488][  T187]  print_address_description.isra.6+0x80/0x3d0
[  137.529496][  T187]  __kasan_report+0x174/0x23c
[  137.529503][  T187]  kasan_report+0xc/0x18
[  137.529511][  T187]  __asan_load4+0xa4/0xb0
[  137.529517][  T187]  ata_link_next+0x5c/0x1b0
[  137.529526][  T187]  ata_scsi_scan_host+0x50/0x2d0
[  137.529533][  T187]  async_port_probe+0x7c/0xa8
[  137.529541][  T187]  async_run_entry_fn+0x118/0x340
[  137.529549][  T187]  process_one_work+0x7b8/0xdb8
[  137.529556][  T187]  worker_thread+0x414/0x6b8
[  137.529562][  T187]  kthread+0x1d4/0x1f0
[  137.529570][  T187]  ret_from_fork+0x10/0x18
[  137.529574][  T187] 
[  137.529580][  T187] Allocated by task 16:
[  137.529589][  T187]  save_stack+0x28/0xb0
[  137.529597][  T187]  __kasan_kmalloc.isra.9+0xa0/0xc8
[  137.529604][  T187]  kasan_slab_alloc+0x14/0x20
[  137.529612][  T187]  kmem_cache_alloc+0x25c/0x420
[  137.529620][  T187]  __proc_create+0x430/0x458
[  137.529627][  T187]  proc_mkdir_data+0x50/0xf0
[  137.529634][  T187]  proc_mkdir+0x2c/0x38
[  137.529642][  T187]  register_irq_proc+0x104/0x278
[  137.529650][  T187]  __setup_irq+0xafc/0xe80
[  137.529657][  T187]  request_threaded_irq+0x198/0x218
[  137.529666][  T187]  devm_request_threaded_irq+0xb0/0x120
[  137.529674][  T187]  ahci_host_activate+0x1cc/0x290
[  137.529682][  T187]  ahci_init_one+0x1358/0x1570
[  137.529691][  T187]  local_pci_probe+0x78/0xf0
[  137.529698][  T187]  work_for_cpu_fn+0x30/0x50
[  137.529705][  T187]  process_one_work+0x7b8/0xdb8
[  137.529713][  T187]  worker_thread+0x4a8/0x6b8
[  137.529719][  T187]  kthread+0x1d4/0x1f0
[  137.529726][  T187]  ret_from_fork+0x10/0x18
[  137.529729][  T187] 
[  137.529734][  T187] Freed by task 0:
[  137.529737][  T187] (stack is not available)
[  137.529741][  T187] 
[  137.529747][  T187] The buggy address belongs to the object at ffff002320073b68
[  137.529747][  T187]  which belongs to the cache proc_dir_entry of size 256
[  137.529755][  T187] The buggy address is located 96 bytes to the right of
[  137.529755][  T187]  256-byte region [ffff002320073b68, ffff002320073c68)
[  137.529759][  T187] The buggy address belongs to the page:
[  137.529768][  T187] page:fffffe008c601c00 refcount:1 mapcount:0 mapping:ffff0020bf1b8480 index:0xffff002320070988 compound_mapcount: 0
[  137.529779][  T187] flags: 0x1ffff00000010200(slab|head)
[  137.529791][  T187] raw: 1ffff00000010200 ffff0020bf1a3cd0 fffffe008c612208 ffff0020bf1b8480
[  137.529801][  T187] raw: ffff002320070988 000000000035000b 00000001ffffffff 0000000000000000
[  137.529805][  T187] page dumped because: kasan: bad access detected
[  137.529809][  T187] 
[  137.529812][  T187] Memory state around the buggy address:
[  137.529820][  T187]  ffff002320073b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  137.529827][  T187]  ffff002320073c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
[  137.529835][  T187] >ffff002320073c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  137.529839][  T187]                                               ^
[  137.529846][  T187]  ffff002320073d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  137.529853][  T187]  ffff002320073d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  137.529857][  T187] ==================================================================
[  137.529874][  T187] Unable to handle kernel paging request at virtual address 000004a02a0040ba
[  137.529883][  T187] Mem abort info:
[  137.529893][  T187]   ESR = 0x96000004
[  137.529905][  T187]   EC = 0x25: DABT (current EL), IL = 32 bits
[  137.529915][  T187]   SET = 0, FnV = 0
[  137.529925][  T187]   EA = 0, S1PTW = 0
[  137.529934][  T187] Data abort info:
[  137.529940][   T16] kobject: '177' ((____ptrval____)): kobject_add_internal: parent: 'irq', set: '<NULL>'
[  137.529945][  T187]   ISV = 0, ISS = 0x00000004
[  137.529955][  T187]   CM = 0, WnR = 0
[  137.529966][  T187] [000004a02a0040ba] user address but active_mm is swapper
[  137.529979][  T187] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  137.529985][  T187] Modules linked in:
[  137.529996][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G    B   W         5.4.0-rc3+ #1149
[  137.530002][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[  137.530011][  T187] Workqueue: events_unbound async_run_entry_fn
[  137.530020][  T187] pstate: 40800009 (nZcv daif -PAN +UAO)
[  137.530026][  T187] pc : ata_dev_next+0xf8/0x1f8
[  137.530033][  T187] lr : ata_dev_next+0xf8/0x1f8
[  137.530037][  T187] sp : ffff0020bddafb80
[  137.530042][  T187] x29: ffff0020bddafb80 x28: 0000000000000000 
[  137.530050][  T187] x27: 0000000000003cc0 x26: ffffa00015016000 
[  137.530059][  T187] x25: 00000000000020c0 x24: ffffa000113102a8 
[  137.530068][  T187] x23: 350004a02a0003fa x22: ffffa00011335db0 
[  137.530077][  T187] x21: ffffa000113356f0 x20: 0000000000000000 
[  137.530085][  T187] x19: ffffa00011336830 x18: 00000000000025a8 
[  137.530094][  T187] x17: 00000000000025a0 x16: 00000000000026b0 
[  137.530102][  T187] x15: 0000000000001470 x14: 3d3d3d3d3d3d3d3d 
[  137.530111][  T187] x13: 3d3d3d3d3d3d3d3d x12: 1fffe00417bb5f3e 
[  137.530120][  T187] x11: ffff800417bb5f3e x10: dfffa00000000000 
[  137.530129][  T187] x9 : ffff800417bb5f3f x8 : 0000000000000001 
[  137.530138][  T187] x7 : 0000000000003cd0 x6 : 0000000000000000 
[  137.530146][  T187] x5 : ffffa00015015cc8 x4 : 0000000000000000 
[  137.530155][  T187] x3 : ffffa00011301608 x2 : 0000000000000000 
[  137.530163][  T187] x1 : 21f045c4027f9500 x0 : 0000000000000000 
[  137.530171][  T187] Call trace:
[  137.530178][  T187]  ata_dev_next+0xf8/0x1f8
[  137.530187][  T187]  ata_scsi_scan_host+0x6c/0x2d0
[  137.530194][  T187]  async_port_probe+0x7c/0xa8
[  137.530202][  T187]  async_run_entry_fn+0x118/0x340
[  137.530209][  T187]  process_one_work+0x7b8/0xdb8
[  137.530216][  T187]  worker_thread+0x414/0x6b8
[  137.530223][  T187]  kthread+0x1d4/0x1f0
[  137.530230][  T187]  ret_from_fork+0x10/0x18
[  137.530242][  T187] Code: 97c544f6 d2879801 8b0102e0 97cd4fb5 (f95e62e0) 
[  137.530472][  T187] ---[ end trace 1702441cf7c2cd89 ]---
[  137.530479][  T187] Kernel panic - not syncing: Fatal exception
[  137.530592][   T16] kobject: '178' ((____ptrval____)): kobject_add_internal: parent: 'irq', set: '<NULL>'
[  137.530653][  T187] SMP: stopping secondary CPUs
[  137.531078][  T187] Kernel Offset: disabled
[  137.531085][  T187] CPU features: 0x0002,23208a38
[  137.531089][  T187] Memory Limit: none
[  143.699079][  T187] ---[ end Kernel panic - not syncing: Fatal exception ]---



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ahci KASAN warning in experimental arm64 allmodconfig boot
  2019-10-14 18:02 ahci KASAN warning in experimental arm64 allmodconfig boot John Garry
@ 2019-10-15 14:17 ` John Garry
  2019-10-15 16:17   ` John Garry
  0 siblings, 1 reply; 3+ messages in thread
From: John Garry @ 2019-10-15 14:17 UTC (permalink / raw)
  To: linux-ide, axboe; +Cc: linux-kernel, linux-scsi

On 14/10/2019 19:02, John Garry wrote:
> Hi guys,
>
> I'm experimenting by trying to boot an allmodconfig arm64 kernel, as
> mentioned here:
> https://lore.kernel.org/linux-arm-kernel/507325a3-030e-2843-0f46-7e18c60257de@huawei.com/
>
>
> I am getting a KASAN slab-out-of-bounds warning around the ahci code, as
> below. I am also getting a crash some time after, but the stackframe is
> mixed up in the dmesg, so hard to extract that.
>
> The kernel is v5.4-rc3.
>
> I'll continue to look at this as time allows.
>
> I have cc'ed the scsi list as this may be related to the SAS host (using
> libsas), but I'm doubtful, considering the log mentions ahci_init_one().
>

It may very well be related - just CONFIG_DEBUG_TEST_DRIVER_REMOVE 
triggers this WARN:

[   23.452574] ------------[ cut here ]------------
[   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 
ata_host_detach+0x15c/0x168
[   23.466047] Modules linked in:
[   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 
5.4.0-rc1-00010-g5b83fd27752b-dirty #296
[   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019
[   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)
[   23.491065] pc : ata_host_detach+0x15c/0x168
[   23.495322] lr : ata_host_detach+0x88/0x168
[   23.499491] sp : ffff800011cabb50
[   23.502792] x29: ffff800011cabb50 x28: 0000000000000007
[   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28
[   23.513390] x25: 0000000000003848 x24: ffff0023ea185300
[   23.518689] x23: 0000000000000001 x22: 00000000000014c0
[   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000
[   23.529286] x19: 0000000000000000 x18: 0000000000000004
[   23.534584] x17: 0000000000000001 x16: 00000000000000f0
[   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408
[   23.545181] x13: 0000000000000000 x12: ffff0023eac13790
[   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000
[   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000
[   23.561077] x7 : 0000000000000002 x6 : 0000000000000001
[   23.566376] x5 : 0000000000000002 x4 : 0000000000000000
[   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000
[   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284
[   23.582272] Call trace:
[   23.584706]  ata_host_detach+0x15c/0x168
[   23.588616]  ata_pci_remove_one+0x10/0x18
[   23.592615]  ahci_remove_one+0x20/0x40
[   23.596356]  pci_device_remove+0x3c/0xe0
[   23.600267]  really_probe+0xdc/0x3e0
[   23.603830]  driver_probe_device+0x58/0x100
[   23.608000]  device_driver_attach+0x6c/0x90
[   23.612169]  __driver_attach+0x84/0xc8
[   23.615908]  bus_for_each_dev+0x74/0xc8
[   23.619730]  driver_attach+0x20/0x28
[   23.623292]  bus_add_driver+0x148/0x1f0
[   23.627115]  driver_register+0x60/0x110
[   23.630938]  __pci_register_driver+0x40/0x48
[   23.635199]  ahci_pci_driver_init+0x20/0x28
[   23.639372]  do_one_initcall+0x5c/0x1b0
[   23.643199]  kernel_init_freeable+0x1a4/0x24c
[   23.647546]  kernel_init+0x10/0x108
[   23.651023]  ret_from_fork+0x10/0x18
[   23.654590] ---[ end trace 634a14b675b71c13 ]---

I'll dig deeper if I get a chance.

John

> Thanks,
> John
>
> Log snippet:
>
> [  137.529384][  T187]
> ==================================================================
> [  137.529403][  T187] BUG: KASAN: slab-out-of-bounds in
> ata_link_next+0x5c/0x1b0
> [  137.529411][  T187] Read of size 4 at addr ffff002320073cc8 by task
> kworker/u194:0/187
> [  137.529414][  T187]
> [  137.529426][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G
>       W         5.4.0-rc3+ #1149
> [  137.529433][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon
> D06 UEFI RC0 - V1.16.01 03/15/2019
> [  137.529447][  T187] Workqueue: events_unbound async_run_entry_fn
> [  137.529454][  T187] Call trace:
> [  137.529462][  T187]  dump_backtrace+0x0/0x298
> [  137.529469][  T187]  show_stack+0x20/0x30
> [  137.529477][  T187]  dump_stack+0x190/0x21c
> [  137.529488][  T187]  print_address_description.isra.6+0x80/0x3d0
> [  137.529496][  T187]  __kasan_report+0x174/0x23c
> [  137.529503][  T187]  kasan_report+0xc/0x18
> [  137.529511][  T187]  __asan_load4+0xa4/0xb0
> [  137.529517][  T187]  ata_link_next+0x5c/0x1b0
> [  137.529526][  T187]  ata_scsi_scan_host+0x50/0x2d0
> [  137.529533][  T187]  async_port_probe+0x7c/0xa8
> [  137.529541][  T187]  async_run_entry_fn+0x118/0x340
> [  137.529549][  T187]  process_one_work+0x7b8/0xdb8
> [  137.529556][  T187]  worker_thread+0x414/0x6b8
> [  137.529562][  T187]  kthread+0x1d4/0x1f0
> [  137.529570][  T187]  ret_from_fork+0x10/0x18
> [  137.529574][  T187]
> [  137.529580][  T187] Allocated by task 16:
> [  137.529589][  T187]  save_stack+0x28/0xb0
> [  137.529597][  T187]  __kasan_kmalloc.isra.9+0xa0/0xc8
> [  137.529604][  T187]  kasan_slab_alloc+0x14/0x20
> [  137.529612][  T187]  kmem_cache_alloc+0x25c/0x420
> [  137.529620][  T187]  __proc_create+0x430/0x458
> [  137.529627][  T187]  proc_mkdir_data+0x50/0xf0
> [  137.529634][  T187]  proc_mkdir+0x2c/0x38
> [  137.529642][  T187]  register_irq_proc+0x104/0x278
> [  137.529650][  T187]  __setup_irq+0xafc/0xe80
> [  137.529657][  T187]  request_threaded_irq+0x198/0x218
> [  137.529666][  T187]  devm_request_threaded_irq+0xb0/0x120
> [  137.529674][  T187]  ahci_host_activate+0x1cc/0x290
> [  137.529682][  T187]  ahci_init_one+0x1358/0x1570
> [  137.529691][  T187]  local_pci_probe+0x78/0xf0
> [  137.529698][  T187]  work_for_cpu_fn+0x30/0x50
> [  137.529705][  T187]  process_one_work+0x7b8/0xdb8
> [  137.529713][  T187]  worker_thread+0x4a8/0x6b8
> [  137.529719][  T187]  kthread+0x1d4/0x1f0
> [  137.529726][  T187]  ret_from_fork+0x10/0x18
> [  137.529729][  T187]
> [  137.529734][  T187] Freed by task 0:
> [  137.529737][  T187] (stack is not available)
> [  137.529741][  T187]
> [  137.529747][  T187] The buggy address belongs to the object at
> ffff002320073b68
> [  137.529747][  T187]  which belongs to the cache proc_dir_entry of
> size 256
> [  137.529755][  T187] The buggy address is located 96 bytes to the
> right of
> [  137.529755][  T187]  256-byte region [ffff002320073b68,
> ffff002320073c68)
> [  137.529759][  T187] The buggy address belongs to the page:
> [  137.529768][  T187] page:fffffe008c601c00 refcount:1 mapcount:0
> mapping:ffff0020bf1b8480 index:0xffff002320070988 compound_mapcount: 0
> [  137.529779][  T187] flags: 0x1ffff00000010200(slab|head)
> [  137.529791][  T187] raw: 1ffff00000010200 ffff0020bf1a3cd0
> fffffe008c612208 ffff0020bf1b8480
> [  137.529801][  T187] raw: ffff002320070988 000000000035000b
> 00000001ffffffff 0000000000000000
> [  137.529805][  T187] page dumped because: kasan: bad access detected
> [  137.529809][  T187]
> [  137.529812][  T187] Memory state around the buggy address:
> [  137.529820][  T187]  ffff002320073b80: 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00
> [  137.529827][  T187]  ffff002320073c00: 00 00 00 00 00 00 00 00 00 00
> 00 00 00 fc fc fc
> [  137.529835][  T187] >ffff002320073c80: fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc fc
> [  137.529839][  T187]                                               ^
> [  137.529846][  T187]  ffff002320073d00: fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc fc
> [  137.529853][  T187]  ffff002320073d80: fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc fc
> [  137.529857][  T187]
> ==================================================================
> [  137.529874][  T187] Unable to handle kernel paging request at virtual
> address 000004a02a0040ba
> [  137.529883][  T187] Mem abort info:
> [  137.529893][  T187]   ESR = 0x96000004
> [  137.529905][  T187]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  137.529915][  T187]   SET = 0, FnV = 0
> [  137.529925][  T187]   EA = 0, S1PTW = 0
> [  137.529934][  T187] Data abort info:
> [  137.529940][   T16] kobject: '177' ((____ptrval____)):
> kobject_add_internal: parent: 'irq', set: '<NULL>'
> [  137.529945][  T187]   ISV = 0, ISS = 0x00000004
> [  137.529955][  T187]   CM = 0, WnR = 0
> [  137.529966][  T187] [000004a02a0040ba] user address but active_mm is
> swapper
> [  137.529979][  T187] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [  137.529985][  T187] Modules linked in:
> [  137.529996][  T187] CPU: 38 PID: 187 Comm: kworker/u194:0 Tainted: G
>   B   W         5.4.0-rc3+ #1149
> [  137.530002][  T187] Hardware name: Huawei D06 /D06, BIOS Hisilicon
> D06 UEFI RC0 - V1.16.01 03/15/2019
> [  137.530011][  T187] Workqueue: events_unbound async_run_entry_fn
> [  137.530020][  T187] pstate: 40800009 (nZcv daif -PAN +UAO)
> [  137.530026][  T187] pc : ata_dev_next+0xf8/0x1f8
> [  137.530033][  T187] lr : ata_dev_next+0xf8/0x1f8
> [  137.530037][  T187] sp : ffff0020bddafb80
> [  137.530042][  T187] x29: ffff0020bddafb80 x28: 0000000000000000
> [  137.530050][  T187] x27: 0000000000003cc0 x26: ffffa00015016000
> [  137.530059][  T187] x25: 00000000000020c0 x24: ffffa000113102a8
> [  137.530068][  T187] x23: 350004a02a0003fa x22: ffffa00011335db0
> [  137.530077][  T187] x21: ffffa000113356f0 x20: 0000000000000000
> [  137.530085][  T187] x19: ffffa00011336830 x18: 00000000000025a8
> [  137.530094][  T187] x17: 00000000000025a0 x16: 00000000000026b0
> [  137.530102][  T187] x15: 0000000000001470 x14: 3d3d3d3d3d3d3d3d
> [  137.530111][  T187] x13: 3d3d3d3d3d3d3d3d x12: 1fffe00417bb5f3e
> [  137.530120][  T187] x11: ffff800417bb5f3e x10: dfffa00000000000
> [  137.530129][  T187] x9 : ffff800417bb5f3f x8 : 0000000000000001
> [  137.530138][  T187] x7 : 0000000000003cd0 x6 : 0000000000000000
> [  137.530146][  T187] x5 : ffffa00015015cc8 x4 : 0000000000000000
> [  137.530155][  T187] x3 : ffffa00011301608 x2 : 0000000000000000
> [  137.530163][  T187] x1 : 21f045c4027f9500 x0 : 0000000000000000
> [  137.530171][  T187] Call trace:
> [  137.530178][  T187]  ata_dev_next+0xf8/0x1f8
> [  137.530187][  T187]  ata_scsi_scan_host+0x6c/0x2d0
> [  137.530194][  T187]  async_port_probe+0x7c/0xa8
> [  137.530202][  T187]  async_run_entry_fn+0x118/0x340
> [  137.530209][  T187]  process_one_work+0x7b8/0xdb8
> [  137.530216][  T187]  worker_thread+0x414/0x6b8
> [  137.530223][  T187]  kthread+0x1d4/0x1f0
> [  137.530230][  T187]  ret_from_fork+0x10/0x18
> [  137.530242][  T187] Code: 97c544f6 d2879801 8b0102e0 97cd4fb5 (f95e62e0)
> [  137.530472][  T187] ---[ end trace 1702441cf7c2cd89 ]---
> [  137.530479][  T187] Kernel panic - not syncing: Fatal exception
> [  137.530592][   T16] kobject: '178' ((____ptrval____)):
> kobject_add_internal: parent: 'irq', set: '<NULL>'
> [  137.530653][  T187] SMP: stopping secondary CPUs
> [  137.531078][  T187] Kernel Offset: disabled
> [  137.531085][  T187] CPU features: 0x0002,23208a38
> [  137.531089][  T187] Memory Limit: none
> [  143.699079][  T187] ---[ end Kernel panic - not syncing: Fatal
> exception ]---
>
>
>
>
>



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ahci KASAN warning in experimental arm64 allmodconfig boot
  2019-10-15 14:17 ` John Garry
@ 2019-10-15 16:17   ` John Garry
  0 siblings, 0 replies; 3+ messages in thread
From: John Garry @ 2019-10-15 16:17 UTC (permalink / raw)
  To: linux-ide, axboe; +Cc: linux-kernel, linux-scsi

On 15/10/2019 15:17, John Garry wrote:
>>
>>
>> I'll continue to look at this as time allows.
>>
>> I have cc'ed the scsi list as this may be related to the SAS host (using
>> libsas), but I'm doubtful, considering the log mentions ahci_init_one().
>>
>
> It may very well be related - just CONFIG_DEBUG_TEST_DRIVER_REMOVE
> triggers this WARN:
>
> [   23.452574] ------------[ cut here ]------------
> [   23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676
> ata_host_detach+0x15c/0x168
> [   23.466047] Modules linked in:
> [   23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted
> 5.4.0-rc1-00010-g5b83fd27752b-dirty #296
> [   23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
> RC0 - V1.16.01 03/15/2019
> [   23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)
> [   23.491065] pc : ata_host_detach+0x15c/0x168
> [   23.495322] lr : ata_host_detach+0x88/0x168
> [   23.499491] sp : ffff800011cabb50
> [   23.502792] x29: ffff800011cabb50 x28: 0000000000000007
> [   23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28
> [   23.513390] x25: 0000000000003848 x24: ffff0023ea185300
> [   23.518689] x23: 0000000000000001 x22: 00000000000014c0
> [   23.523987] x21: 0000000000013740 x20: ffff0023bdc20000
> [   23.529286] x19: 0000000000000000 x18: 0000000000000004
> [   23.534584] x17: 0000000000000001 x16: 00000000000000f0
> [   23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408
> [   23.545181] x13: 0000000000000000 x12: ffff0023eac13790
> [   23.550480] x11: ffff0023eb76c228 x10: 0000000000000000
> [   23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000
> [   23.561077] x7 : 0000000000000002 x6 : 0000000000000001
> [   23.566376] x5 : 0000000000000002 x4 : 0000000000000000
> [   23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000
> [   23.576972] x1 : 3099674201f72700 x0 : 0000000000400284
> [   23.582272] Call trace:
> [   23.584706]  ata_host_detach+0x15c/0x168
> [   23.588616]  ata_pci_remove_one+0x10/0x18
> [   23.592615]  ahci_remove_one+0x20/0x40
> [   23.596356]  pci_device_remove+0x3c/0xe0
> [   23.600267]  really_probe+0xdc/0x3e0
> [   23.603830]  driver_probe_device+0x58/0x100
> [   23.608000]  device_driver_attach+0x6c/0x90
> [   23.612169]  __driver_attach+0x84/0xc8
> [   23.615908]  bus_for_each_dev+0x74/0xc8
> [   23.619730]  driver_attach+0x20/0x28
> [   23.623292]  bus_add_driver+0x148/0x1f0
> [   23.627115]  driver_register+0x60/0x110
> [   23.630938]  __pci_register_driver+0x40/0x48
> [   23.635199]  ahci_pci_driver_init+0x20/0x28
> [   23.639372]  do_one_initcall+0x5c/0x1b0
> [   23.643199]  kernel_init_freeable+0x1a4/0x24c
> [   23.647546]  kernel_init+0x10/0x108
> [   23.651023]  ret_from_fork+0x10/0x18
> [   23.654590] ---[ end trace 634a14b675b71c13 ]---
>
> I'll dig deeper if I get a chance.
>

So it seems that problem comes from the way we async probe the ata_port.

What I find is that at the point when we async probe the ata_port, with 
CONFIG_DEBUG_TEST_DRIVER_REMOVE set we start to detach the driver and 
tear everything down before the probe runs.

There's no synchronisation in detach to ensure that the probe has 
actually completed, and when it does, use-after-free, WARNs, etc...

This change fixes the issue for me:

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 28c492be0a57..ff3d4ab81818 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6708,6 +6708,8 @@ void ata_host_detach(struct ata_host *host)
  {
         int i;

+       async_synchronize_full();
+
         for (i = 0; i < host->n_ports; i++)
                 ata_port_detach(host->ports[i]);



> John
>
>> Thanks,



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-15 16:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-14 18:02 ahci KASAN warning in experimental arm64 allmodconfig boot John Garry
2019-10-15 14:17 ` John Garry
2019-10-15 16:17   ` John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.