All of lore.kernel.org
 help / color / mirror / Atom feed
* hung task in 4.14 (syzbot bug from 2018 April 17)
@ 2018-10-03 23:29 Sebastian Kuzminsky
  2018-10-05 21:56 ` Sebastian Kuzminsky
  0 siblings, 1 reply; 2+ messages in thread
From: Sebastian Kuzminsky @ 2018-10-03 23:29 UTC (permalink / raw)
  To: linux-kernel

I think i've run into the bug described here:

https://lkml.org/lkml/2018/4/18/188

I've got a binder-free system that reports a hung task with this backtrace:

[76800.726654] INFO: task systemd:1 blocked for more than 60 seconds.
[76800.726657]       Tainted: G           OE   4.14.67-solidfire1 #1
[76800.726657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[76800.726659] systemd         D    0     1      0 0x00000000
[76800.726662] Call Trace:
[76800.726673]  ? __schedule+0x27f/0x870
[76800.726676]  schedule+0x28/0x80
[76800.726679]  schedule_timeout+0x1e7/0x340
[76800.726685]  ? check_preempt_wakeup+0x102/0x230
[76800.726687]  ? wait_for_completion+0xb0/0x120
[76800.726689]  wait_for_completion+0xb0/0x120
[76800.726693]  ? wake_up_q+0x70/0x70
[76800.726698]  flush_work+0x10d/0x1c0
[76800.726700]  ? worker_detach_from_pool+0xa0/0xa0
[76800.726706]  fsnotify_destroy_group+0x34/0xa0
[76800.726708]  ? SyS_epoll_ctl+0x1d4/0xe50
[76800.726710]  inotify_release+0x1a/0x50
[76800.726714]  __fput+0xd8/0x220
[76800.726717]  task_work_run+0x8a/0xb0
[76800.726721]  exit_to_usermode_loop+0xb9/0xc0
[76800.726723]  do_syscall_64+0x10b/0x120
[76800.726727]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[76800.726730] RIP: 0033:0x7fb6957ff900
[76800.726731] RSP: 002b:00007ffc685fdd60 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[76800.726733] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 00007fb6957ff900
[76800.726735] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012
[76800.726736] RBP: 00007fb697167088 R08: 000055ae6c9224c0 R09: 000055ae6ace92ad
[76800.726737] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[76800.726738] R13: 0000000000000000 R14: 0000000000079de4 R15: 0000000000000000
[76800.727130] INFO: task kworker/u113:1:29214 blocked for more than 60 seconds.
[76800.727132]       Tainted: G           OE   4.14.67-solidfire1 #1
[76800.727132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[76800.727134] kworker/u113:1  D    0 29214      2 0x80000000
[76800.727139] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[76800.727141] Call Trace:
[76800.727144]  ? __schedule+0x27f/0x870
[76800.727146]  schedule+0x28/0x80
[76800.727148]  schedule_timeout+0x1e7/0x340
[76800.727151]  ? __switch_to_asm+0x40/0x70
[76800.727153]  ? update_curr+0xe1/0x1a0
[76800.727156]  ? wait_for_completion+0xb0/0x120
[76800.727157]  wait_for_completion+0xb0/0x120
[76800.727160]  ? wake_up_q+0x70/0x70
[76800.727164]  __synchronize_srcu.part.13+0x76/0x90
[76800.727167]  ? trace_raw_output_rcu_utilization+0x40/0x40
[76800.727169]  ? try_to_wake_up+0x44/0x460
[76800.727172]  ? fsnotify_mark_destroy_workfn+0x67/0xb0
[76800.727174]  fsnotify_mark_destroy_workfn+0x67/0xb0
[76800.727177]  process_one_work+0x1da/0x3d0
[76800.727179]  worker_thread+0x21f/0x3f0
[76800.727181]  ? process_one_work+0x3d0/0x3d0
[76800.727184]  kthread+0x119/0x130
[76800.727186]  ? kthread_create_on_node+0x40/0x40
[76800.727188]  ret_from_fork+0x35/0x40

The kernel is a stock 4.14.67, plus some minor local patches mostly
related to fibre channel, which i believe is not implicated here.

I have a crash dump of this failure, the reaper_work struct has these contents:

        crash> print reaper_work
        $2 = {
          work = {
            data = {
              counter = -108686497013755
            },
            entry = {
              next = 0xffff9d2671a395f0,
              prev = 0xffffb2624006fdf8
            },
            func = 0xffffffffb5249df0
          },
          timer = {
            entry = {
              next = 0xdead000000000200,
              pprev = 0x0
            },
            expires = 4302608557,
            function = 0xffffffffb50778c0,
            data = 18446744072468618080,
            flags = 195035137
          },
          wq = 0xffff9d2680411400,
          cpu = 128
        }

I'd appreciate help or pointers on how to debug and fix this problem.


-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: hung task in 4.14 (syzbot bug from 2018 April 17)
  2018-10-03 23:29 hung task in 4.14 (syzbot bug from 2018 April 17) Sebastian Kuzminsky
@ 2018-10-05 21:56 ` Sebastian Kuzminsky
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Kuzminsky @ 2018-10-05 21:56 UTC (permalink / raw)
  To: linux-kernel

Another instance of the same problem (also on 4.14.67), this time with
the hung task timeout restored to its 120 second default:

[  980.954129] INFO: task systemd:1 blocked for more than 120 seconds.
[  980.954132]       Tainted: G           OE   4.14.67-solidfire1 #1
[  980.954133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  980.954135] systemd         D    0     1      0 0x00000000
[  980.954137] Call Trace:
[  980.954150]  ? __schedule+0x27f/0x870
[  980.954154]  ? check_preempt_wakeup+0x102/0x230
[  980.954156]  schedule+0x28/0x80
[  980.954159]  schedule_timeout+0x1e7/0x340
[  980.954163]  ? radix_tree_node_ctor+0x20/0x20
[  980.954165]  ? delete_node+0x1b8/0x1f0
[  980.954167]  ? wait_for_completion+0xb0/0x120
[  980.954169]  wait_for_completion+0xb0/0x120
[  980.954172]  ? wake_up_q+0x70/0x70
[  980.954176]  flush_work+0x10d/0x1c0
[  980.954179]  ? worker_detach_from_pool+0xa0/0xa0
[  980.954183]  fsnotify_destroy_group+0x34/0xa0
[  980.954185]  inotify_release+0x1a/0x50
[  980.954189]  __fput+0xd8/0x220
[  980.954194]  task_work_run+0x8a/0xb0
[  980.954198]  exit_to_usermode_loop+0xb9/0xc0
[  980.954200]  do_syscall_64+0x10b/0x120
[  980.954203]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  980.954206] RIP: 0033:0x7f031a552900
[  980.954208] RSP: 002b:00007fff93fd18b0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[  980.954210] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f031a552900
[  980.954211] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000010
[  980.954212] RBP: 00007f031beba088 R08: 0000000000000020 R09: 000055ac5a21e2ad
[  980.954213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[  980.954214] R13: 000055ac5a488b20 R14: 000055ac5c0907f0 R15: 0000000000000000
[  980.954724] INFO: task kworker/u114:4:41909 blocked for more than
120 seconds.
[  980.954725]       Tainted: G           OE   4.14.67-solidfire1 #1
[  980.954726] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  980.954727] kworker/u114:4  D    0 41909      2 0x80000000
[  980.954733] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  980.954734] Call Trace:
[  980.954739]  ? __schedule+0x27f/0x870
[  980.954741]  schedule+0x28/0x80
[  980.954743]  schedule_timeout+0x1e7/0x340
[  980.954746]  ? __switch_to_asm+0x40/0x70
[  980.954747]  ? update_curr+0xe1/0x1a0
[  980.954750]  ? wait_for_completion+0xb0/0x120
[  980.954751]  wait_for_completion+0xb0/0x120
[  980.954753]  ? wake_up_q+0x70/0x70
[  980.954758]  __synchronize_srcu.part.13+0x76/0x90
[  980.954761]  ? trace_raw_output_rcu_utilization+0x40/0x40
[  980.954762]  ? try_to_wake_up+0x44/0x460
[  980.954765]  ? fsnotify_mark_destroy_workfn+0x67/0xb0
[  980.954767]  fsnotify_mark_destroy_workfn+0x67/0xb0
[  980.954769]  process_one_work+0x1da/0x3d0
[  980.954772]  worker_thread+0x21f/0x3f0
[  980.954775]  ? process_one_work+0x3d0/0x3d0
[  980.954777]  kthread+0x119/0x130
[  980.954779]  ? kthread_create_on_node+0x40/0x40
[  980.954781]  ret_from_fork+0x35/0x40

crash> print reaper_work
$3 = {
  work = {
    data = {
      counter = -117148584927483
    },
    entry = {
      next = 0xffff95743772b770,
      prev = 0xffffa2aa800a3df8
    },
    func = 0xffffffffa8249df0
  },
  timer = {
    entry = {
      next = 0xdead000000000200,
      pprev = 0x0
    },
    expires = 4295020559,
    function = 0xffffffffa80778c0,
    data = 18446744072250514272,
    flags = 69206020
  },
  wq = 0xffff95443f011400,
  cpu = 128
}

Any input is welcome.

-- 
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-10-05 21:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 23:29 hung task in 4.14 (syzbot bug from 2018 April 17) Sebastian Kuzminsky
2018-10-05 21:56 ` Sebastian Kuzminsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.