linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* use-after-free in bfq_idle_slice_timer()
@ 2019-05-08 19:50 Guenter Roeck
  2019-05-09 15:14 ` Paolo Valente
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2019-05-08 19:50 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jens Axboe, linux-block, linux-kernel

Hi,

we have seen the attached use-after-free in bfq_idle_slice_timer()
while running reboot tests. The code in these tests includes all
bfq patches up to and including commit eed47d19d936 ("block, bfq:
fix use after free in bfq_bfqq_expire").

gdb points to the dereference of struct bfq_queue *bfqq in
bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
understand the code well enough to be sure.

Any thoughts / comments ?

Thanks,
Guenter

---
[   25.307269] ==================================================================
[   25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
[   25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
[   25.327975] 
[   25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                4.19.38 #37
[   25.336974] Hardware name: <...> rev2 board (DT)
[   25.342378] Call trace:
[   25.344844]  dump_backtrace+0x0/0x358
[   25.348521]  show_stack+0x20/0x2c
[   25.351849]  dump_stack+0x130/0x19c
[   25.355353]  print_address_description+0x74/0x250
[   25.360068]  kasan_report+0x27c/0x2a0
[   25.363742]  __asan_report_load8_noabort+0x2c/0x38
[   25.368546]  bfq_idle_slice_timer+0x88/0x1d4
[   25.372829]  __hrtimer_run_queues+0x794/0xa34
[   25.377197]  hrtimer_interrupt+0x278/0x600
[   25.381310]  arch_timer_handler_phys+0x5c/0x6c
[   25.385767]  handle_percpu_devid_irq+0x2e4/0x754
[   25.390399]  __handle_domain_irq+0xd4/0x158
[   25.394593]  gic_handle_irq+0x208/0x260
[   25.398437]  el1_irq+0xb0/0x128
[   25.401591]  arch_cpu_idle+0x20c/0x548
[   25.405352]  do_idle+0x184/0x4dc
[   25.408590]  cpu_startup_entry+0x24/0x28
[   25.412526]  rest_init+0x114/0x148
[   25.415939]  start_kernel+0x4c8/0x5c4
[   25.419605] 
[   25.421105] Allocated by task 1430:
[   25.424606]  kasan_kmalloc+0xe0/0x1ac
[   25.428279]  kasan_slab_alloc+0x14/0x1c
[   25.432127]  kmem_cache_alloc+0x178/0x278
[   25.436149]  bfq_get_queue+0x160/0x650
[   25.439911]  bfq_get_bfqq_handle_split+0xcc/0x2fc
[   25.444627]  bfq_init_rq+0x254/0x18c0
[   25.448301]  bfq_insert_requests+0x5d0/0x1048
[   25.452669]  blk_mq_sched_insert_requests+0x130/0x204
[   25.457734]  blk_mq_flush_plug_list+0x844/0x91c
[   25.462278]  blk_flush_plug_list+0x3e4/0x778
[   25.466559]  blk_finish_plug+0x54/0x78
[   25.470322]  read_pages+0x294/0x2f0
[   25.473824]  __do_page_cache_readahead+0x1a4/0x354
[   25.478628]  filemap_fault+0x8ec/0xbb4
[   25.482389]  ext4_filemap_fault+0x84/0xa4
[   25.486409]  __do_fault+0x128/0x338
[   25.489909]  handle_mm_fault+0x1de0/0x2588
[   25.494017]  do_page_fault+0x464/0x8d8
[   25.497777]  do_translation_fault+0x6c/0x88
[   25.501969]  do_mem_abort+0xd8/0x2d0
[   25.505554]  do_el0_ia_bp_hardening+0x13c/0x1a8
[   25.510094]  el0_ia+0x18/0x1c
[   25.513065] 
[   25.514562] Freed by task 1430:
[   25.517715]  __kasan_slab_free+0x13c/0x21c
[   25.521821]  kasan_slab_free+0x10/0x1c
[   25.525582]  kmem_cache_free+0x7c/0x5f8
[   25.529429]  bfq_put_queue+0x19c/0x2e4
[   25.533191]  bfq_exit_icq_bfqq+0x108/0x228
[   25.537299]  bfq_exit_icq+0x20/0x38
[   25.540798]  ioc_exit_icq+0xe4/0x16c
[   25.544384]  put_io_context_active+0x174/0x234
[   25.548836]  exit_io_context+0x84/0x94
[   25.552599]  do_exit+0x13b4/0x18e4
[   25.556013]  do_group_exit+0x1cc/0x204
[   25.559775]  __wake_up_parent+0x0/0x5c
[   25.563537]  __se_sys_exit_group+0x0/0x24
[   25.567558]  el0_svc_common+0x124/0x1ec
[   25.571407]  el0_svc_compat_handler+0x84/0xb0
[   25.575774]  el0_svc_compat+0x8/0x18
[   25.579351] 
[   25.580854] The buggy address belongs to the object at fffffff089360338
[   25.580854]  which belongs to the cache bfq_queue of size 464
[   25.587350] cros-ec-spi spi2.0: SPI transfer timed out
[   25.593209] The buggy address is located 264 bytes inside of
[   25.593209]  464-byte region [fffffff089360338, fffffff089360508)
[   25.593216] The buggy address belongs to the page:
[   25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
[   25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
[   25.610092]  compound_mapcount: 0
[   25.610109] flags: 0x4000000000008100(slab|head)
[   25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
[   25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
[   25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
[   25.624233] page dumped because: kasan: bad access detected
[   25.624237] 
[   25.624243] Memory state around the buggy address:
[   25.624257]  fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
[   25.624271]  fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
[   25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.632985]                                            ^
[   25.632997]  fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.633010]  fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   25.633016] ==================================================================


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: use-after-free in bfq_idle_slice_timer()
  2019-05-08 19:50 use-after-free in bfq_idle_slice_timer() Guenter Roeck
@ 2019-05-09 15:14 ` Paolo Valente
  2019-05-09 16:17   ` Guenter Roeck
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Valente @ 2019-05-09 15:14 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jens Axboe, linux-block, linux-kernel

I couldn't find any explanation.  Would you be willing to apply a
patch that adds runtime checks in bfq?  It'd take me a few days to
prepare it.

Thanks,
Paolo

> Il giorno 8 mag 2019, alle ore 21:50, Guenter Roeck <linux@roeck-us.net> ha scritto:
> 
> Hi,
> 
> we have seen the attached use-after-free in bfq_idle_slice_timer()
> while running reboot tests. The code in these tests includes all
> bfq patches up to and including commit eed47d19d936 ("block, bfq:
> fix use after free in bfq_bfqq_expire").
> 
> gdb points to the dereference of struct bfq_queue *bfqq in
> bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
> in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
> understand the code well enough to be sure.
> 
> Any thoughts / comments ?
> 
> Thanks,
> Guenter
> 
> ---
> [   25.307269] ==================================================================
> [   25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
> [   25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
> [   25.327975] 
> [   25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                4.19.38 #37
> [   25.336974] Hardware name: <...> rev2 board (DT)
> [   25.342378] Call trace:
> [   25.344844]  dump_backtrace+0x0/0x358
> [   25.348521]  show_stack+0x20/0x2c
> [   25.351849]  dump_stack+0x130/0x19c
> [   25.355353]  print_address_description+0x74/0x250
> [   25.360068]  kasan_report+0x27c/0x2a0
> [   25.363742]  __asan_report_load8_noabort+0x2c/0x38
> [   25.368546]  bfq_idle_slice_timer+0x88/0x1d4
> [   25.372829]  __hrtimer_run_queues+0x794/0xa34
> [   25.377197]  hrtimer_interrupt+0x278/0x600
> [   25.381310]  arch_timer_handler_phys+0x5c/0x6c
> [   25.385767]  handle_percpu_devid_irq+0x2e4/0x754
> [   25.390399]  __handle_domain_irq+0xd4/0x158
> [   25.394593]  gic_handle_irq+0x208/0x260
> [   25.398437]  el1_irq+0xb0/0x128
> [   25.401591]  arch_cpu_idle+0x20c/0x548
> [   25.405352]  do_idle+0x184/0x4dc
> [   25.408590]  cpu_startup_entry+0x24/0x28
> [   25.412526]  rest_init+0x114/0x148
> [   25.415939]  start_kernel+0x4c8/0x5c4
> [   25.419605] 
> [   25.421105] Allocated by task 1430:
> [   25.424606]  kasan_kmalloc+0xe0/0x1ac
> [   25.428279]  kasan_slab_alloc+0x14/0x1c
> [   25.432127]  kmem_cache_alloc+0x178/0x278
> [   25.436149]  bfq_get_queue+0x160/0x650
> [   25.439911]  bfq_get_bfqq_handle_split+0xcc/0x2fc
> [   25.444627]  bfq_init_rq+0x254/0x18c0
> [   25.448301]  bfq_insert_requests+0x5d0/0x1048
> [   25.452669]  blk_mq_sched_insert_requests+0x130/0x204
> [   25.457734]  blk_mq_flush_plug_list+0x844/0x91c
> [   25.462278]  blk_flush_plug_list+0x3e4/0x778
> [   25.466559]  blk_finish_plug+0x54/0x78
> [   25.470322]  read_pages+0x294/0x2f0
> [   25.473824]  __do_page_cache_readahead+0x1a4/0x354
> [   25.478628]  filemap_fault+0x8ec/0xbb4
> [   25.482389]  ext4_filemap_fault+0x84/0xa4
> [   25.486409]  __do_fault+0x128/0x338
> [   25.489909]  handle_mm_fault+0x1de0/0x2588
> [   25.494017]  do_page_fault+0x464/0x8d8
> [   25.497777]  do_translation_fault+0x6c/0x88
> [   25.501969]  do_mem_abort+0xd8/0x2d0
> [   25.505554]  do_el0_ia_bp_hardening+0x13c/0x1a8
> [   25.510094]  el0_ia+0x18/0x1c
> [   25.513065] 
> [   25.514562] Freed by task 1430:
> [   25.517715]  __kasan_slab_free+0x13c/0x21c
> [   25.521821]  kasan_slab_free+0x10/0x1c
> [   25.525582]  kmem_cache_free+0x7c/0x5f8
> [   25.529429]  bfq_put_queue+0x19c/0x2e4
> [   25.533191]  bfq_exit_icq_bfqq+0x108/0x228
> [   25.537299]  bfq_exit_icq+0x20/0x38
> [   25.540798]  ioc_exit_icq+0xe4/0x16c
> [   25.544384]  put_io_context_active+0x174/0x234
> [   25.548836]  exit_io_context+0x84/0x94
> [   25.552599]  do_exit+0x13b4/0x18e4
> [   25.556013]  do_group_exit+0x1cc/0x204
> [   25.559775]  __wake_up_parent+0x0/0x5c
> [   25.563537]  __se_sys_exit_group+0x0/0x24
> [   25.567558]  el0_svc_common+0x124/0x1ec
> [   25.571407]  el0_svc_compat_handler+0x84/0xb0
> [   25.575774]  el0_svc_compat+0x8/0x18
> [   25.579351] 
> [   25.580854] The buggy address belongs to the object at fffffff089360338
> [   25.580854]  which belongs to the cache bfq_queue of size 464
> [   25.587350] cros-ec-spi spi2.0: SPI transfer timed out
> [   25.593209] The buggy address is located 264 bytes inside of
> [   25.593209]  464-byte region [fffffff089360338, fffffff089360508)
> [   25.593216] The buggy address belongs to the page:
> [   25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
> [   25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
> [   25.610092]  compound_mapcount: 0
> [   25.610109] flags: 0x4000000000008100(slab|head)
> [   25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
> [   25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
> [   25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
> [   25.624233] page dumped because: kasan: bad access detected
> [   25.624237] 
> [   25.624243] Memory state around the buggy address:
> [   25.624257]  fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
> [   25.624271]  fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
> [   25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.632985]                                            ^
> [   25.632997]  fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.633010]  fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   25.633016] ==================================================================
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: use-after-free in bfq_idle_slice_timer()
  2019-05-09 15:14 ` Paolo Valente
@ 2019-05-09 16:17   ` Guenter Roeck
  2019-05-15 18:20     ` Paolo Valente
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2019-05-09 16:17 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jens Axboe, linux-block, linux-kernel

Hi Paolo,

On Thu, May 09, 2019 at 05:14:57PM +0200, Paolo Valente wrote:
> I couldn't find any explanation.  Would you be willing to apply a
> patch that adds runtime checks in bfq?  It'd take me a few days to
> prepare it.
> 

Sure, we can do that.

Thanks,
Guenter

> Thanks,
> Paolo
> 
> > Il giorno 8 mag 2019, alle ore 21:50, Guenter Roeck <linux@roeck-us.net> ha scritto:
> > 
> > Hi,
> > 
> > we have seen the attached use-after-free in bfq_idle_slice_timer()
> > while running reboot tests. The code in these tests includes all
> > bfq patches up to and including commit eed47d19d936 ("block, bfq:
> > fix use after free in bfq_bfqq_expire").
> > 
> > gdb points to the dereference of struct bfq_queue *bfqq in
> > bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
> > in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
> > understand the code well enough to be sure.
> > 
> > Any thoughts / comments ?
> > 
> > Thanks,
> > Guenter
> > 
> > ---
> > [   25.307269] ==================================================================
> > [   25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
> > [   25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
> > [   25.327975] 
> > [   25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                4.19.38 #37
> > [   25.336974] Hardware name: <...> rev2 board (DT)
> > [   25.342378] Call trace:
> > [   25.344844]  dump_backtrace+0x0/0x358
> > [   25.348521]  show_stack+0x20/0x2c
> > [   25.351849]  dump_stack+0x130/0x19c
> > [   25.355353]  print_address_description+0x74/0x250
> > [   25.360068]  kasan_report+0x27c/0x2a0
> > [   25.363742]  __asan_report_load8_noabort+0x2c/0x38
> > [   25.368546]  bfq_idle_slice_timer+0x88/0x1d4
> > [   25.372829]  __hrtimer_run_queues+0x794/0xa34
> > [   25.377197]  hrtimer_interrupt+0x278/0x600
> > [   25.381310]  arch_timer_handler_phys+0x5c/0x6c
> > [   25.385767]  handle_percpu_devid_irq+0x2e4/0x754
> > [   25.390399]  __handle_domain_irq+0xd4/0x158
> > [   25.394593]  gic_handle_irq+0x208/0x260
> > [   25.398437]  el1_irq+0xb0/0x128
> > [   25.401591]  arch_cpu_idle+0x20c/0x548
> > [   25.405352]  do_idle+0x184/0x4dc
> > [   25.408590]  cpu_startup_entry+0x24/0x28
> > [   25.412526]  rest_init+0x114/0x148
> > [   25.415939]  start_kernel+0x4c8/0x5c4
> > [   25.419605] 
> > [   25.421105] Allocated by task 1430:
> > [   25.424606]  kasan_kmalloc+0xe0/0x1ac
> > [   25.428279]  kasan_slab_alloc+0x14/0x1c
> > [   25.432127]  kmem_cache_alloc+0x178/0x278
> > [   25.436149]  bfq_get_queue+0x160/0x650
> > [   25.439911]  bfq_get_bfqq_handle_split+0xcc/0x2fc
> > [   25.444627]  bfq_init_rq+0x254/0x18c0
> > [   25.448301]  bfq_insert_requests+0x5d0/0x1048
> > [   25.452669]  blk_mq_sched_insert_requests+0x130/0x204
> > [   25.457734]  blk_mq_flush_plug_list+0x844/0x91c
> > [   25.462278]  blk_flush_plug_list+0x3e4/0x778
> > [   25.466559]  blk_finish_plug+0x54/0x78
> > [   25.470322]  read_pages+0x294/0x2f0
> > [   25.473824]  __do_page_cache_readahead+0x1a4/0x354
> > [   25.478628]  filemap_fault+0x8ec/0xbb4
> > [   25.482389]  ext4_filemap_fault+0x84/0xa4
> > [   25.486409]  __do_fault+0x128/0x338
> > [   25.489909]  handle_mm_fault+0x1de0/0x2588
> > [   25.494017]  do_page_fault+0x464/0x8d8
> > [   25.497777]  do_translation_fault+0x6c/0x88
> > [   25.501969]  do_mem_abort+0xd8/0x2d0
> > [   25.505554]  do_el0_ia_bp_hardening+0x13c/0x1a8
> > [   25.510094]  el0_ia+0x18/0x1c
> > [   25.513065] 
> > [   25.514562] Freed by task 1430:
> > [   25.517715]  __kasan_slab_free+0x13c/0x21c
> > [   25.521821]  kasan_slab_free+0x10/0x1c
> > [   25.525582]  kmem_cache_free+0x7c/0x5f8
> > [   25.529429]  bfq_put_queue+0x19c/0x2e4
> > [   25.533191]  bfq_exit_icq_bfqq+0x108/0x228
> > [   25.537299]  bfq_exit_icq+0x20/0x38
> > [   25.540798]  ioc_exit_icq+0xe4/0x16c
> > [   25.544384]  put_io_context_active+0x174/0x234
> > [   25.548836]  exit_io_context+0x84/0x94
> > [   25.552599]  do_exit+0x13b4/0x18e4
> > [   25.556013]  do_group_exit+0x1cc/0x204
> > [   25.559775]  __wake_up_parent+0x0/0x5c
> > [   25.563537]  __se_sys_exit_group+0x0/0x24
> > [   25.567558]  el0_svc_common+0x124/0x1ec
> > [   25.571407]  el0_svc_compat_handler+0x84/0xb0
> > [   25.575774]  el0_svc_compat+0x8/0x18
> > [   25.579351] 
> > [   25.580854] The buggy address belongs to the object at fffffff089360338
> > [   25.580854]  which belongs to the cache bfq_queue of size 464
> > [   25.587350] cros-ec-spi spi2.0: SPI transfer timed out
> > [   25.593209] The buggy address is located 264 bytes inside of
> > [   25.593209]  464-byte region [fffffff089360338, fffffff089360508)
> > [   25.593216] The buggy address belongs to the page:
> > [   25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
> > [   25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
> > [   25.610092]  compound_mapcount: 0
> > [   25.610109] flags: 0x4000000000008100(slab|head)
> > [   25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
> > [   25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
> > [   25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
> > [   25.624233] page dumped because: kasan: bad access detected
> > [   25.624237] 
> > [   25.624243] Memory state around the buggy address:
> > [   25.624257]  fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
> > [   25.624271]  fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [   25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
> > [   25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [   25.632985]                                            ^
> > [   25.632997]  fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [   25.633010]  fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [   25.633016] ==================================================================
> > 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: use-after-free in bfq_idle_slice_timer()
  2019-05-09 16:17   ` Guenter Roeck
@ 2019-05-15 18:20     ` Paolo Valente
  2019-05-16  0:15       ` Guenter Roeck
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Valente @ 2019-05-15 18:20 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 758 bytes --]



> Il giorno 9 mag 2019, alle ore 18:17, Guenter Roeck <linux@roeck-us.net> ha scritto:
> 
> Hi Paolo,
> 
> On Thu, May 09, 2019 at 05:14:57PM +0200, Paolo Valente wrote:
>> I couldn't find any explanation.  Would you be willing to apply a
>> patch that adds runtime checks in bfq?  It'd take me a few days to
>> prepare it.
>> 
> 
> Sure, we can do that.
> 

Ok, (compressed) patch attached. It applies cleanly on top of current mainline HEAD, so it should apply on top of your branch too. I'm sorry, but I couldn't test the patch thoroughly, because the current mainline kernel systematically corrupts the fs on my main test machine. If the patch does introduce unwanted crashes, I'll fix it.

Looking forward to your feedback,
Paolo


[-- Attachment #1.2: 0001-block-bfq-add-logs-and-BUG_ONs.patch.gz --]
[-- Type: application/x-gzip, Size: 27287 bytes --]

[-- Attachment #1.3: Type: text/plain, Size: 5968 bytes --]



> Thanks,
> Guenter
> 
>> Thanks,
>> Paolo
>> 
>>> Il giorno 8 mag 2019, alle ore 21:50, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>> 
>>> Hi,
>>> 
>>> we have seen the attached use-after-free in bfq_idle_slice_timer()
>>> while running reboot tests. The code in these tests includes all
>>> bfq patches up to and including commit eed47d19d936 ("block, bfq:
>>> fix use after free in bfq_bfqq_expire").
>>> 
>>> gdb points to the dereference of struct bfq_queue *bfqq in
>>> bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
>>> in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
>>> understand the code well enough to be sure.
>>> 
>>> Any thoughts / comments ?
>>> 
>>> Thanks,
>>> Guenter
>>> 
>>> ---
>>> [   25.307269] ==================================================================
>>> [   25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
>>> [   25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
>>> [   25.327975]
>>> [   25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                4.19.38 #37
>>> [   25.336974] Hardware name: <...> rev2 board (DT)
>>> [   25.342378] Call trace:
>>> [   25.344844]  dump_backtrace+0x0/0x358
>>> [   25.348521]  show_stack+0x20/0x2c
>>> [   25.351849]  dump_stack+0x130/0x19c
>>> [   25.355353]  print_address_description+0x74/0x250
>>> [   25.360068]  kasan_report+0x27c/0x2a0
>>> [   25.363742]  __asan_report_load8_noabort+0x2c/0x38
>>> [   25.368546]  bfq_idle_slice_timer+0x88/0x1d4
>>> [   25.372829]  __hrtimer_run_queues+0x794/0xa34
>>> [   25.377197]  hrtimer_interrupt+0x278/0x600
>>> [   25.381310]  arch_timer_handler_phys+0x5c/0x6c
>>> [   25.385767]  handle_percpu_devid_irq+0x2e4/0x754
>>> [   25.390399]  __handle_domain_irq+0xd4/0x158
>>> [   25.394593]  gic_handle_irq+0x208/0x260
>>> [   25.398437]  el1_irq+0xb0/0x128
>>> [   25.401591]  arch_cpu_idle+0x20c/0x548
>>> [   25.405352]  do_idle+0x184/0x4dc
>>> [   25.408590]  cpu_startup_entry+0x24/0x28
>>> [   25.412526]  rest_init+0x114/0x148
>>> [   25.415939]  start_kernel+0x4c8/0x5c4
>>> [   25.419605]
>>> [   25.421105] Allocated by task 1430:
>>> [   25.424606]  kasan_kmalloc+0xe0/0x1ac
>>> [   25.428279]  kasan_slab_alloc+0x14/0x1c
>>> [   25.432127]  kmem_cache_alloc+0x178/0x278
>>> [   25.436149]  bfq_get_queue+0x160/0x650
>>> [   25.439911]  bfq_get_bfqq_handle_split+0xcc/0x2fc
>>> [   25.444627]  bfq_init_rq+0x254/0x18c0
>>> [   25.448301]  bfq_insert_requests+0x5d0/0x1048
>>> [   25.452669]  blk_mq_sched_insert_requests+0x130/0x204
>>> [   25.457734]  blk_mq_flush_plug_list+0x844/0x91c
>>> [   25.462278]  blk_flush_plug_list+0x3e4/0x778
>>> [   25.466559]  blk_finish_plug+0x54/0x78
>>> [   25.470322]  read_pages+0x294/0x2f0
>>> [   25.473824]  __do_page_cache_readahead+0x1a4/0x354
>>> [   25.478628]  filemap_fault+0x8ec/0xbb4
>>> [   25.482389]  ext4_filemap_fault+0x84/0xa4
>>> [   25.486409]  __do_fault+0x128/0x338
>>> [   25.489909]  handle_mm_fault+0x1de0/0x2588
>>> [   25.494017]  do_page_fault+0x464/0x8d8
>>> [   25.497777]  do_translation_fault+0x6c/0x88
>>> [   25.501969]  do_mem_abort+0xd8/0x2d0
>>> [   25.505554]  do_el0_ia_bp_hardening+0x13c/0x1a8
>>> [   25.510094]  el0_ia+0x18/0x1c
>>> [   25.513065]
>>> [   25.514562] Freed by task 1430:
>>> [   25.517715]  __kasan_slab_free+0x13c/0x21c
>>> [   25.521821]  kasan_slab_free+0x10/0x1c
>>> [   25.525582]  kmem_cache_free+0x7c/0x5f8
>>> [   25.529429]  bfq_put_queue+0x19c/0x2e4
>>> [   25.533191]  bfq_exit_icq_bfqq+0x108/0x228
>>> [   25.537299]  bfq_exit_icq+0x20/0x38
>>> [   25.540798]  ioc_exit_icq+0xe4/0x16c
>>> [   25.544384]  put_io_context_active+0x174/0x234
>>> [   25.548836]  exit_io_context+0x84/0x94
>>> [   25.552599]  do_exit+0x13b4/0x18e4
>>> [   25.556013]  do_group_exit+0x1cc/0x204
>>> [   25.559775]  __wake_up_parent+0x0/0x5c
>>> [   25.563537]  __se_sys_exit_group+0x0/0x24
>>> [   25.567558]  el0_svc_common+0x124/0x1ec
>>> [   25.571407]  el0_svc_compat_handler+0x84/0xb0
>>> [   25.575774]  el0_svc_compat+0x8/0x18
>>> [   25.579351]
>>> [   25.580854] The buggy address belongs to the object at fffffff089360338
>>> [   25.580854]  which belongs to the cache bfq_queue of size 464
>>> [   25.587350] cros-ec-spi spi2.0: SPI transfer timed out
>>> [   25.593209] The buggy address is located 264 bytes inside of
>>> [   25.593209]  464-byte region [fffffff089360338, fffffff089360508)
>>> [   25.593216] The buggy address belongs to the page:
>>> [   25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
>>> [   25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
>>> [   25.610092]  compound_mapcount: 0
>>> [   25.610109] flags: 0x4000000000008100(slab|head)
>>> [   25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
>>> [   25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
>>> [   25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
>>> [   25.624233] page dumped because: kasan: bad access detected
>>> [   25.624237]
>>> [   25.624243] Memory state around the buggy address:
>>> [   25.624257]  fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
>>> [   25.624271]  fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> [   25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
>>> [   25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> [   25.632985]                                            ^
>>> [   25.632997]  fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> [   25.633010]  fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>> [   25.633016] ==================================================================
>>> 
>> 


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: use-after-free in bfq_idle_slice_timer()
  2019-05-15 18:20     ` Paolo Valente
@ 2019-05-16  0:15       ` Guenter Roeck
  0 siblings, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2019-05-16  0:15 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jens Axboe, linux-block, linux-kernel

Hi Paolo,

On 5/15/19 11:20 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 9 mag 2019, alle ore 18:17, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>
>> Hi Paolo,
>>
>> On Thu, May 09, 2019 at 05:14:57PM +0200, Paolo Valente wrote:
>>> I couldn't find any explanation.  Would you be willing to apply a
>>> patch that adds runtime checks in bfq?  It'd take me a few days to
>>> prepare it.
>>>
>>
>> Sure, we can do that.
>>
> 
> Ok, (compressed) patch attached. It applies cleanly on top of current mainline HEAD, so it should apply on top of your branch too. I'm sorry, but I couldn't test the patch thoroughly, because the current mainline kernel systematically corrupts the fs on my main test machine. If the patch does introduce unwanted crashes, I'll fix it.
> 
Outch.

> Looking forward to your feedback,

Unfortunately I'll be off on vacation starting tomorrow and all next week.
I'll only be able to look into this after I am back.

Guenter

> Paolo
> 
> 
> 
> 
>> Thanks,
>> Guenter
>>
>>> Thanks,
>>> Paolo
>>>
>>>> Il giorno 8 mag 2019, alle ore 21:50, Guenter Roeck <linux@roeck-us.net> ha scritto:
>>>>
>>>> Hi,
>>>>
>>>> we have seen the attached use-after-free in bfq_idle_slice_timer()
>>>> while running reboot tests. The code in these tests includes all
>>>> bfq patches up to and including commit eed47d19d936 ("block, bfq:
>>>> fix use after free in bfq_bfqq_expire").
>>>>
>>>> gdb points to the dereference of struct bfq_queue *bfqq in
>>>> bfq_bfqq_budget_timeout(). My suspicion is that the cleanup path
>>>> in bfq_put_queue() may not clear bfqd->in_service_queue, but I don't
>>>> understand the code well enough to be sure.
>>>>
>>>> Any thoughts / comments ?
>>>>
>>>> Thanks,
>>>> Guenter
>>>>
>>>> ---
>>>> [   25.307269] ==================================================================
>>>> [   25.314555] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0x88/0x1d4
>>>> [   25.321359] Read of size 8 at addr fffffff089360440 by task swapper/0/0
>>>> [   25.327975]
>>>> [   25.329487] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G S                4.19.38 #37
>>>> [   25.336974] Hardware name: <...> rev2 board (DT)
>>>> [   25.342378] Call trace:
>>>> [   25.344844]  dump_backtrace+0x0/0x358
>>>> [   25.348521]  show_stack+0x20/0x2c
>>>> [   25.351849]  dump_stack+0x130/0x19c
>>>> [   25.355353]  print_address_description+0x74/0x250
>>>> [   25.360068]  kasan_report+0x27c/0x2a0
>>>> [   25.363742]  __asan_report_load8_noabort+0x2c/0x38
>>>> [   25.368546]  bfq_idle_slice_timer+0x88/0x1d4
>>>> [   25.372829]  __hrtimer_run_queues+0x794/0xa34
>>>> [   25.377197]  hrtimer_interrupt+0x278/0x600
>>>> [   25.381310]  arch_timer_handler_phys+0x5c/0x6c
>>>> [   25.385767]  handle_percpu_devid_irq+0x2e4/0x754
>>>> [   25.390399]  __handle_domain_irq+0xd4/0x158
>>>> [   25.394593]  gic_handle_irq+0x208/0x260
>>>> [   25.398437]  el1_irq+0xb0/0x128
>>>> [   25.401591]  arch_cpu_idle+0x20c/0x548
>>>> [   25.405352]  do_idle+0x184/0x4dc
>>>> [   25.408590]  cpu_startup_entry+0x24/0x28
>>>> [   25.412526]  rest_init+0x114/0x148
>>>> [   25.415939]  start_kernel+0x4c8/0x5c4
>>>> [   25.419605]
>>>> [   25.421105] Allocated by task 1430:
>>>> [   25.424606]  kasan_kmalloc+0xe0/0x1ac
>>>> [   25.428279]  kasan_slab_alloc+0x14/0x1c
>>>> [   25.432127]  kmem_cache_alloc+0x178/0x278
>>>> [   25.436149]  bfq_get_queue+0x160/0x650
>>>> [   25.439911]  bfq_get_bfqq_handle_split+0xcc/0x2fc
>>>> [   25.444627]  bfq_init_rq+0x254/0x18c0
>>>> [   25.448301]  bfq_insert_requests+0x5d0/0x1048
>>>> [   25.452669]  blk_mq_sched_insert_requests+0x130/0x204
>>>> [   25.457734]  blk_mq_flush_plug_list+0x844/0x91c
>>>> [   25.462278]  blk_flush_plug_list+0x3e4/0x778
>>>> [   25.466559]  blk_finish_plug+0x54/0x78
>>>> [   25.470322]  read_pages+0x294/0x2f0
>>>> [   25.473824]  __do_page_cache_readahead+0x1a4/0x354
>>>> [   25.478628]  filemap_fault+0x8ec/0xbb4
>>>> [   25.482389]  ext4_filemap_fault+0x84/0xa4
>>>> [   25.486409]  __do_fault+0x128/0x338
>>>> [   25.489909]  handle_mm_fault+0x1de0/0x2588
>>>> [   25.494017]  do_page_fault+0x464/0x8d8
>>>> [   25.497777]  do_translation_fault+0x6c/0x88
>>>> [   25.501969]  do_mem_abort+0xd8/0x2d0
>>>> [   25.505554]  do_el0_ia_bp_hardening+0x13c/0x1a8
>>>> [   25.510094]  el0_ia+0x18/0x1c
>>>> [   25.513065]
>>>> [   25.514562] Freed by task 1430:
>>>> [   25.517715]  __kasan_slab_free+0x13c/0x21c
>>>> [   25.521821]  kasan_slab_free+0x10/0x1c
>>>> [   25.525582]  kmem_cache_free+0x7c/0x5f8
>>>> [   25.529429]  bfq_put_queue+0x19c/0x2e4
>>>> [   25.533191]  bfq_exit_icq_bfqq+0x108/0x228
>>>> [   25.537299]  bfq_exit_icq+0x20/0x38
>>>> [   25.540798]  ioc_exit_icq+0xe4/0x16c
>>>> [   25.544384]  put_io_context_active+0x174/0x234
>>>> [   25.548836]  exit_io_context+0x84/0x94
>>>> [   25.552599]  do_exit+0x13b4/0x18e4
>>>> [   25.556013]  do_group_exit+0x1cc/0x204
>>>> [   25.559775]  __wake_up_parent+0x0/0x5c
>>>> [   25.563537]  __se_sys_exit_group+0x0/0x24
>>>> [   25.567558]  el0_svc_common+0x124/0x1ec
>>>> [   25.571407]  el0_svc_compat_handler+0x84/0xb0
>>>> [   25.575774]  el0_svc_compat+0x8/0x18
>>>> [   25.579351]
>>>> [   25.580854] The buggy address belongs to the object at fffffff089360338
>>>> [   25.580854]  which belongs to the cache bfq_queue of size 464
>>>> [   25.587350] cros-ec-spi spi2.0: SPI transfer timed out
>>>> [   25.593209] The buggy address is located 264 bytes inside of
>>>> [   25.593209]  464-byte region [fffffff089360338, fffffff089360508)
>>>> [   25.593216] The buggy address belongs to the page:
>>>> [   25.593234] page:ffffffbfc224d800 count:1 mapcount:0 mapping:fffffff09916d880 index:0xfffffff089360cc8
>>>> [   25.598388] cros-ec-spi spi2.0: spi transfer failed: -110
>>>> [   25.610092]  compound_mapcount: 0
>>>> [   25.610109] flags: 0x4000000000008100(slab|head)
>>>> [   25.610133] raw: 4000000000008100 ffffffbfc14c1908 fffffff0991668a0 fffffff09916d880
>>>> [   25.615021] cros-ec-spi spi2.0: Command xfer error (err:-110)
>>>> [   25.624225] raw: fffffff089360cc8 000000000014000d 00000001ffffffff 0000000000000000
>>>> [   25.624233] page dumped because: kasan: bad access detected
>>>> [   25.624237]
>>>> [   25.624243] Memory state around the buggy address:
>>>> [   25.624257]  fffffff089360300: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb
>>>> [   25.624271]  fffffff089360380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [   25.630223] cros-ec-i2c-tunnel 11012000.spi:ec@0:i2c-tunnel: Error transferring EC i2c message -110
>>>> [   25.632977] >fffffff089360400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [   25.632985]                                            ^
>>>> [   25.632997]  fffffff089360480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [   25.633010]  fffffff089360500: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>> [   25.633016] ==================================================================
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-16  1:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 19:50 use-after-free in bfq_idle_slice_timer() Guenter Roeck
2019-05-09 15:14 ` Paolo Valente
2019-05-09 16:17   ` Guenter Roeck
2019-05-15 18:20     ` Paolo Valente
2019-05-16  0:15       ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).