* [PATCH] locking/hung_task: Show all hung tasks before panic @ 2018-04-02 14:12 Tetsuo Handa 2018-04-02 15:35 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2018-04-02 14:12 UTC (permalink / raw) To: linux-kernel Cc: Tetsuo Handa, Andrew Morton, Ingo Molnar, Linus Torvalds, Mandeep Singh Baines, Paul E. McKenney, Peter Zijlstra, Thomas Gleixner, Vegard Nossum When we get a hung task it can often be valuable to see _all_ the hung tasks on the system before calling panic(). Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5412451675799552 ---------------------------------------- INFO: task syz-executor3:13421 blocked for more than 120 seconds. Not tainted 4.16.0-rc7+ #9 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor3 D24672 13421 4481 0x00000004 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline] rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94 __down_read arch/x86/include/asm/rwsem.h:83 [inline] down_read+0xa4/0x150 kernel/locking/rwsem.c:26 __get_super.part.9+0x1d3/0x280 fs/super.c:663 __get_super include/linux/spinlock.h:310 [inline] get_super+0x2d/0x40 fs/super.c:692 fsync_bdev+0x19/0x80 fs/block_dev.c:468 invalidate_partition+0x35/0x60 block/genhd.c:1566 drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440 rescan_partitions+0x72/0x900 block/partition-generic.c:513 __blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173 blkdev_reread_part+0x26/0x40 block/ioctl.c:193 loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619 loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161 loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271 lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381 (...snipped...) Showing all locks held in the system: (...snipped...) 3 locks held by syz-executor3/13421: #0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ #1: (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>] blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 #2: (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>] __get_super.part.9+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ (...snipped...) 2 locks held by syz-executor0/13428: #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] alloc_super fs/super.c:211 [inline] #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] sget_userns+0x3a1/0xe40 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ #1: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ ---------------------------------------- In addition to showing hashed address of lock instances, it would be nice if trace of 13428 is printed as well as 13421. Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay calling panic() but normally there should not be so many hung tasks. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> --- kernel/hung_task.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 751593e..32b4794 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -44,6 +44,7 @@ static int __read_mostly did_panic; static bool hung_task_show_lock; +static bool hung_task_call_panic; static struct task_struct *watchdog_task; @@ -127,10 +128,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) touch_nmi_watchdog(); if (sysctl_hung_task_panic) { - if (hung_task_show_lock) - debug_show_all_locks(); - trigger_all_cpu_backtrace(); - panic("hung_task: blocked tasks"); + hung_task_show_lock = true; + hung_task_call_panic = true; } } @@ -193,6 +192,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) rcu_read_unlock(); if (hung_task_show_lock) debug_show_all_locks(); + if (hung_task_call_panic) { + trigger_all_cpu_backtrace(); + panic("hung_task: blocked tasks"); + } } static long hung_timeout_jiffies(unsigned long last_checked, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] locking/hung_task: Show all hung tasks before panic 2018-04-02 14:12 [PATCH] locking/hung_task: Show all hung tasks before panic Tetsuo Handa @ 2018-04-02 15:35 ` Paul E. McKenney 2018-04-02 21:09 ` Tetsuo Handa 2018-04-03 15:23 ` Dmitry Vyukov 0 siblings, 2 replies; 12+ messages in thread From: Paul E. McKenney @ 2018-04-02 15:35 UTC (permalink / raw) To: Tetsuo Handa Cc: linux-kernel, Andrew Morton, Ingo Molnar, Linus Torvalds, Mandeep Singh Baines, Peter Zijlstra, Thomas Gleixner, Vegard Nossum On Mon, Apr 02, 2018 at 11:12:04PM +0900, Tetsuo Handa wrote: > When we get a hung task it can often be valuable to see _all_ the hung > tasks on the system before calling panic(). > > Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5412451675799552 > ---------------------------------------- > INFO: task syz-executor3:13421 blocked for more than 120 seconds. > Not tainted 4.16.0-rc7+ #9 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor3 D24672 13421 4481 0x00000004 > Call Trace: > context_switch kernel/sched/core.c:2862 [inline] > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > schedule+0xf5/0x430 kernel/sched/core.c:3499 > __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline] > rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286 > call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94 > __down_read arch/x86/include/asm/rwsem.h:83 [inline] > down_read+0xa4/0x150 kernel/locking/rwsem.c:26 > __get_super.part.9+0x1d3/0x280 fs/super.c:663 > __get_super include/linux/spinlock.h:310 [inline] > get_super+0x2d/0x40 fs/super.c:692 > fsync_bdev+0x19/0x80 fs/block_dev.c:468 > invalidate_partition+0x35/0x60 block/genhd.c:1566 > drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440 > rescan_partitions+0x72/0x900 block/partition-generic.c:513 > __blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173 > blkdev_reread_part+0x26/0x40 block/ioctl.c:193 > loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619 > loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161 > loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271 > lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381 > (...snipped...) > Showing all locks held in the system: > (...snipped...) > 3 locks held by syz-executor3/13421: > #0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ > #1: (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>] blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 > #2: (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>] __get_super.part.9+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ > (...snipped...) > 2 locks held by syz-executor0/13428: > #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] alloc_super fs/super.c:211 [inline] > #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] sget_userns+0x3a1/0xe40 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ > #1: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ > ---------------------------------------- > > In addition to showing hashed address of lock instances, it would be > nice if trace of 13428 is printed as well as 13421. > > Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay > calling panic() but normally there should not be so many hung tasks. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Vegard Nossum <vegard.nossum@oracle.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Mandeep Singh Baines <msb@chromium.org> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> I just know that I am going to regret this the first time this happens on a low-speed console port, but... Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@kernel.org> > --- > kernel/hung_task.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index 751593e..32b4794 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -44,6 +44,7 @@ > > static int __read_mostly did_panic; > static bool hung_task_show_lock; > +static bool hung_task_call_panic; > > static struct task_struct *watchdog_task; > > @@ -127,10 +128,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > touch_nmi_watchdog(); > > if (sysctl_hung_task_panic) { > - if (hung_task_show_lock) > - debug_show_all_locks(); > - trigger_all_cpu_backtrace(); > - panic("hung_task: blocked tasks"); > + hung_task_show_lock = true; > + hung_task_call_panic = true; > } > } > > @@ -193,6 +192,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > rcu_read_unlock(); > if (hung_task_show_lock) > debug_show_all_locks(); > + if (hung_task_call_panic) { > + trigger_all_cpu_backtrace(); > + panic("hung_task: blocked tasks"); > + } > } > > static long hung_timeout_jiffies(unsigned long last_checked, > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] locking/hung_task: Show all hung tasks before panic 2018-04-02 15:35 ` Paul E. McKenney @ 2018-04-02 21:09 ` Tetsuo Handa 2018-04-03 15:23 ` Dmitry Vyukov 1 sibling, 0 replies; 12+ messages in thread From: Tetsuo Handa @ 2018-04-02 21:09 UTC (permalink / raw) To: paulmck Cc: linux-kernel, akpm, mingo, torvalds, msb, peterz, tglx, vegard.nossum Paul E. McKenney wrote: > On Mon, Apr 02, 2018 at 11:12:04PM +0900, Tetsuo Handa wrote: > > When we get a hung task it can often be valuable to see _all_ the hung > > tasks on the system before calling panic(). > > > > Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5412451675799552 > > ---------------------------------------- > > INFO: task syz-executor3:13421 blocked for more than 120 seconds. > > Not tainted 4.16.0-rc7+ #9 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > syz-executor3 D24672 13421 4481 0x00000004 > > Call Trace: > > context_switch kernel/sched/core.c:2862 [inline] > > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > > schedule+0xf5/0x430 kernel/sched/core.c:3499 > > __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline] > > rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286 > > call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94 > > __down_read arch/x86/include/asm/rwsem.h:83 [inline] > > down_read+0xa4/0x150 kernel/locking/rwsem.c:26 > > __get_super.part.9+0x1d3/0x280 fs/super.c:663 > > __get_super include/linux/spinlock.h:310 [inline] > > get_super+0x2d/0x40 fs/super.c:692 > > fsync_bdev+0x19/0x80 fs/block_dev.c:468 > > invalidate_partition+0x35/0x60 block/genhd.c:1566 > > drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440 > > rescan_partitions+0x72/0x900 block/partition-generic.c:513 > > __blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173 > > blkdev_reread_part+0x26/0x40 block/ioctl.c:193 > > loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619 > > loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161 > > loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271 > > lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381 > > (...snipped...) > > Showing all locks held in the system: > > (...snipped...) > > 3 locks held by syz-executor3/13421: > > #0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ > > #1: (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>] blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 > > #2: (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>] __get_super.part.9+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ > > (...snipped...) > > 2 locks held by syz-executor0/13428: > > #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] alloc_super fs/super.c:211 [inline] > > #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] sget_userns+0x3a1/0xe40 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ > > #1: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ > > ---------------------------------------- > > > > In addition to showing hashed address of lock instances, it would be > > nice if trace of 13428 is printed as well as 13421. > > > > Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay > > calling panic() but normally there should not be so many hung tasks. > > > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Cc: Vegard Nossum <vegard.nossum@oracle.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Mandeep Singh Baines <msb@chromium.org> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > I just know that I am going to regret this the first time this happens > on a low-speed console port, but... > > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Thank you. Currently sysctl_hung_task_panic case can show only the first task which was blocked for more than hung_task_timeout_secs. Suppose "there were A, B, C, D, E, F on the task_list in that order", "B was blocked for e.g. 130 seconds due to C", and "C and D were blocked for e.g. 180 seconds due to deadlock". Showing B, C, D would be more helpful than showing only B (who is just a collateral victim of a problem between C and D). While vmcore would tell everything, it is not realistic to share vmcore of development kernels; different config / different compiler / different source / too large debuginfo / possibly confidential information etc... Thus, I hope we can get as much information as possible via printk(). > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Ingo Molnar <mingo@kernel.org> > > --- > > kernel/hung_task.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > > index 751593e..32b4794 100644 > > --- a/kernel/hung_task.c > > +++ b/kernel/hung_task.c > > @@ -44,6 +44,7 @@ > > > > static int __read_mostly did_panic; > > static bool hung_task_show_lock; > > +static bool hung_task_call_panic; > > > > static struct task_struct *watchdog_task; > > > > @@ -127,10 +128,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > > touch_nmi_watchdog(); > > > > if (sysctl_hung_task_panic) { > > - if (hung_task_show_lock) > > - debug_show_all_locks(); > > - trigger_all_cpu_backtrace(); > > - panic("hung_task: blocked tasks"); > > + hung_task_show_lock = true; > > + hung_task_call_panic = true; > > } > > } > > > > @@ -193,6 +192,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > > rcu_read_unlock(); > > if (hung_task_show_lock) > > debug_show_all_locks(); > > + if (hung_task_call_panic) { > > + trigger_all_cpu_backtrace(); > > + panic("hung_task: blocked tasks"); > > + } > > } > > > > static long hung_timeout_jiffies(unsigned long last_checked, > > -- > > 1.8.3.1 > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] locking/hung_task: Show all hung tasks before panic 2018-04-02 15:35 ` Paul E. McKenney 2018-04-02 21:09 ` Tetsuo Handa @ 2018-04-03 15:23 ` Dmitry Vyukov 2018-04-04 22:05 ` [PATCH v2] " Tetsuo Handa 1 sibling, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2018-04-03 15:23 UTC (permalink / raw) To: Paul McKenney Cc: Tetsuo Handa, LKML, Andrew Morton, Ingo Molnar, Linus Torvalds, Mandeep Singh Baines, Peter Zijlstra, Thomas Gleixner, Vegard Nossum On Mon, Apr 2, 2018 at 5:35 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Mon, Apr 02, 2018 at 11:12:04PM +0900, Tetsuo Handa wrote: >> When we get a hung task it can often be valuable to see _all_ the hung >> tasks on the system before calling panic(). >> >> Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5412451675799552 >> ---------------------------------------- >> INFO: task syz-executor3:13421 blocked for more than 120 seconds. >> Not tainted 4.16.0-rc7+ #9 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> syz-executor3 D24672 13421 4481 0x00000004 >> Call Trace: >> context_switch kernel/sched/core.c:2862 [inline] >> __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> schedule+0xf5/0x430 kernel/sched/core.c:3499 >> __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline] >> rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286 >> call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94 >> __down_read arch/x86/include/asm/rwsem.h:83 [inline] >> down_read+0xa4/0x150 kernel/locking/rwsem.c:26 >> __get_super.part.9+0x1d3/0x280 fs/super.c:663 >> __get_super include/linux/spinlock.h:310 [inline] >> get_super+0x2d/0x40 fs/super.c:692 >> fsync_bdev+0x19/0x80 fs/block_dev.c:468 >> invalidate_partition+0x35/0x60 block/genhd.c:1566 >> drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440 >> rescan_partitions+0x72/0x900 block/partition-generic.c:513 >> __blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173 >> blkdev_reread_part+0x26/0x40 block/ioctl.c:193 >> loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619 >> loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161 >> loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271 >> lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381 >> (...snipped...) >> Showing all locks held in the system: >> (...snipped...) >> 3 locks held by syz-executor3/13421: >> #0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ >> #1: (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>] blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 >> #2: (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>] __get_super.part.9+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ >> (...snipped...) >> 2 locks held by syz-executor0/13428: >> #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] alloc_super fs/super.c:211 [inline] >> #0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>] sget_userns+0x3a1/0xe40 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ >> #1: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>] lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ >> ---------------------------------------- >> >> In addition to showing hashed address of lock instances, it would be >> nice if trace of 13428 is printed as well as 13421. >> >> Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay >> calling panic() but normally there should not be so many hung tasks. >> >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> Cc: Vegard Nossum <vegard.nossum@oracle.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Linus Torvalds <torvalds@linux-foundation.org> >> Cc: Mandeep Singh Baines <msb@chromium.org> >> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > I just know that I am going to regret this the first time this happens > on a low-speed console port, but... > > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Thanks! I think getting these last bits of debugging tools to be more useful is very important in the context of syzbot. So: Acked-by: Dmitry Vyukov <dvyukov@google.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@kernel.org> >> --- >> kernel/hung_task.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/hung_task.c b/kernel/hung_task.c >> index 751593e..32b4794 100644 >> --- a/kernel/hung_task.c >> +++ b/kernel/hung_task.c >> @@ -44,6 +44,7 @@ >> >> static int __read_mostly did_panic; >> static bool hung_task_show_lock; >> +static bool hung_task_call_panic; >> >> static struct task_struct *watchdog_task; >> >> @@ -127,10 +128,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) >> touch_nmi_watchdog(); >> >> if (sysctl_hung_task_panic) { >> - if (hung_task_show_lock) >> - debug_show_all_locks(); >> - trigger_all_cpu_backtrace(); >> - panic("hung_task: blocked tasks"); >> + hung_task_show_lock = true; >> + hung_task_call_panic = true; >> } >> } >> >> @@ -193,6 +192,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) >> rcu_read_unlock(); >> if (hung_task_show_lock) >> debug_show_all_locks(); >> + if (hung_task_call_panic) { >> + trigger_all_cpu_backtrace(); >> + panic("hung_task: blocked tasks"); >> + } >> } >> >> static long hung_timeout_jiffies(unsigned long last_checked, >> -- >> 1.8.3.1 >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-03 15:23 ` Dmitry Vyukov @ 2018-04-04 22:05 ` Tetsuo Handa 2018-04-07 12:31 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2018-04-04 22:05 UTC (permalink / raw) To: akpm Cc: dvyukov, paulmck, linux-kernel, mingo, torvalds, msb, peterz, tglx, vegard.nossum When we get a hung task it can often be valuable to see _all_ the hung tasks on the system before calling panic(). Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5316056503549952 ---------------------------------------- INFO: task syz-executor0:6540 blocked for more than 120 seconds. Not tainted 4.16.0+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor0 D23560 6540 4521 0x80000004 Call Trace: context_switch kernel/sched/core.c:2848 [inline] __schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490 schedule+0xf5/0x430 kernel/sched/core.c:3549 schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607 __mutex_lock_common kernel/locking/mutex.c:833 [inline] __mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 __blkdev_driver_ioctl block/ioctl.c:303 [inline] blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601 ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060 isofs_get_last_session fs/isofs/inode.c:567 [inline] isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660 mount_bdev+0x2b7/0x370 fs/super.c:1119 isofs_mount+0x34/0x40 fs/isofs/inode.c:1560 mount_fs+0x66/0x2d0 fs/super.c:1222 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:2514 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0xea4/0x2b90 fs/namespace.c:2847 ksys_mount+0xab/0x120 fs/namespace.c:3063 SYSC_mount fs/namespace.c:3077 [inline] SyS_mount+0x39/0x50 fs/namespace.c:3074 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 (...snipped...) Showing all locks held in the system: (...snipped...) 2 locks held by syz-executor0/6540: #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline] #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ #1: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ (...snipped...) 3 locks held by syz-executor7/6541: #0: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ #1: 000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 #2: 00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ ---------------------------------------- When reporting an AB-BA deadlock like shown above, it would be nice if trace of PID=6541 is printed as well as trace of PID=6540 before calling panic(). Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay calling panic() but normally there should not be so many hung tasks. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> --- kernel/hung_task.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 751593e..32b4794 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -44,6 +44,7 @@ static int __read_mostly did_panic; static bool hung_task_show_lock; +static bool hung_task_call_panic; static struct task_struct *watchdog_task; @@ -127,10 +128,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) touch_nmi_watchdog(); if (sysctl_hung_task_panic) { - if (hung_task_show_lock) - debug_show_all_locks(); - trigger_all_cpu_backtrace(); - panic("hung_task: blocked tasks"); + hung_task_show_lock = true; + hung_task_call_panic = true; } } @@ -193,6 +192,10 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) rcu_read_unlock(); if (hung_task_show_lock) debug_show_all_locks(); + if (hung_task_call_panic) { + trigger_all_cpu_backtrace(); + panic("hung_task: blocked tasks"); + } } static long hung_timeout_jiffies(unsigned long last_checked, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-04 22:05 ` [PATCH v2] " Tetsuo Handa @ 2018-04-07 12:31 ` Tetsuo Handa 2018-04-07 15:39 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2018-04-07 12:31 UTC (permalink / raw) To: peterz, mingo Cc: akpm, dvyukov, paulmck, linux-kernel, torvalds, msb, tglx, vegard.nossum Hello. Can we add lockdep functions which print trace of threads with locks held? For example, void debug_show_map_users(struct lockdep_map *map) { struct task_struct *g, *p; struct held_lock *hlock; int i, depth; rcu_read_lock(); for_each_process_thread(g, p) { depth = p->lockdep_depth; hlock = p->held_locks; for (i = 0; i < depth; i++) if (map == hlock[i]->instance) { touch_nmi_watchdog(); touch_all_softlockup_watchdogs(); sched_show_task(p); lockdep_print_held_locks(p); break; } } rcu_read_unlock(); } is for replacing debug_show_all_locks() in oom_reap_task() because we are interested in only threads holding specific mm->mmap_sem. For example void debug_show_relevant_tasks(struct task_struct *origin) { struct task_struct *g, *p; struct held_lock *i_hlock, *j_hlock; int i, j, i_depth, j_depth; rcu_read_lock(); i_depth = origin->lockdep_depth; i_hlock = origin->held_locks; for_each_process_thread(g, p) { j_depth = p->lockdep_depth; j_hlock = p->held_locks; for (i = 0; i < i_depth; i++) for (j = 0; j < j_depth; j++) if (i_hlock[i]->instance == j_hlock[j]->instance) goto hit; continue; hit: touch_nmi_watchdog(); touch_all_softlockup_watchdogs(); sched_show_task(p); lockdep_print_held_locks(p); } rcu_read_unlock(); } or void debug_show_all_locked_tasks(void) { struct task_struct *g, *p; rcu_read_lock(); for_each_process_thread(g, p) { if (p->lockdep_depth == 0) continue; touch_nmi_watchdog(); touch_all_softlockup_watchdogs(); sched_show_task(p); lockdep_print_held_locks(p); } rcu_read_unlock(); } are for replacing debug_show_all_locks() in check_hung_task() for cases like https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db because we are interested in only threads holding locks. SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-07 12:31 ` Tetsuo Handa @ 2018-04-07 15:39 ` Peter Zijlstra 2018-04-07 16:03 ` Dmitry Vyukov 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2018-04-07 15:39 UTC (permalink / raw) To: Tetsuo Handa Cc: mingo, akpm, dvyukov, paulmck, linux-kernel, torvalds, msb, tglx, vegard.nossum On Sat, Apr 07, 2018 at 09:31:19PM +0900, Tetsuo Handa wrote: > are for replacing debug_show_all_locks() in check_hung_task() for cases like > https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db > because we are interested in only threads holding locks. > > SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... Or use a script to process the sysrq-t output? I mean, we can add all sorts, but where does it end? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-07 15:39 ` Peter Zijlstra @ 2018-04-07 16:03 ` Dmitry Vyukov 2018-04-07 16:24 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2018-04-07 16:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Tetsuo Handa, Ingo Molnar, Andrew Morton, Paul McKenney, LKML, Linus Torvalds, Mandeep Singh Baines, Thomas Gleixner, Vegard Nossum On Sat, Apr 7, 2018 at 5:39 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Sat, Apr 07, 2018 at 09:31:19PM +0900, Tetsuo Handa wrote: >> are for replacing debug_show_all_locks() in check_hung_task() for cases like >> https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db >> because we are interested in only threads holding locks. >> >> SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... > > Or use a script to process the sysrq-t output? I mean, we can add all > sorts, but where does it end? Good question. We are talking about few dozen more stacks, right? Not all kernel bugs are well reproducible, so it's not always possible to go back and hit sysrq-t. And this come up in the context of syzbot, which is an automated system. It reported a bunch of hangs and most of them are real bugs, but not all of them are easily actionable. Can it be a config or a command line argument, which will make syzbot capture more useful context for each such hang? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-07 16:03 ` Dmitry Vyukov @ 2018-04-07 16:24 ` Tetsuo Handa 2018-04-09 9:03 ` Dmitry Vyukov 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2018-04-07 16:24 UTC (permalink / raw) To: dvyukov, peterz Cc: mingo, akpm, paulmck, linux-kernel, torvalds, msb, tglx, vegard.nossum Dmitry Vyukov wrote: > On Sat, Apr 7, 2018 at 5:39 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Sat, Apr 07, 2018 at 09:31:19PM +0900, Tetsuo Handa wrote: > >> are for replacing debug_show_all_locks() in check_hung_task() for cases like > >> https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db > >> because we are interested in only threads holding locks. > >> > >> SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... > > > > Or use a script to process the sysrq-t output? I mean, we can add all > > sorts, but where does it end? Maybe allow khungtaskd to call call_usermode_helper() to run arbitrary operations instead of just calling panic()? > > Good question. > We are talking about few dozen more stacks, right? > > Not all kernel bugs are well reproducible, so it's not always possible > to go back and hit sysrq-t. And this come up in the context of syzbot, > which is an automated system. It reported a bunch of hangs and most of > them are real bugs, but not all of them are easily actionable. > Can it be a config or a command line argument, which will make syzbot > capture more useful context for each such hang? > It will be nice if syzbot testing is done with kdump configured, and the result of automated scripting on vmcore (such as "foreach bt -s -l") is available. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-07 16:24 ` Tetsuo Handa @ 2018-04-09 9:03 ` Dmitry Vyukov 2018-04-09 11:13 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2018-04-09 9:03 UTC (permalink / raw) To: Tetsuo Handa Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Paul McKenney, LKML, Linus Torvalds, Mandeep Singh Baines, Thomas Gleixner, Vegard Nossum On Sat, Apr 7, 2018 at 6:24 PM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > Dmitry Vyukov wrote: >> On Sat, Apr 7, 2018 at 5:39 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Sat, Apr 07, 2018 at 09:31:19PM +0900, Tetsuo Handa wrote: >> >> are for replacing debug_show_all_locks() in check_hung_task() for cases like >> >> https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db >> >> because we are interested in only threads holding locks. >> >> >> >> SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... >> > >> > Or use a script to process the sysrq-t output? I mean, we can add all >> > sorts, but where does it end? > > Maybe allow khungtaskd to call call_usermode_helper() to run arbitrary operations > instead of just calling panic()? This would probably work for syzbot too. >> Good question. >> We are talking about few dozen more stacks, right? >> >> Not all kernel bugs are well reproducible, so it's not always possible >> to go back and hit sysrq-t. And this come up in the context of syzbot, >> which is an automated system. It reported a bunch of hangs and most of >> them are real bugs, but not all of them are easily actionable. >> Can it be a config or a command line argument, which will make syzbot >> capture more useful context for each such hang? >> > > It will be nice if syzbot testing is done with kdump configured, and the > result of automated scripting on vmcore (such as "foreach bt -s -l") is > available. kdump's popped up several times already (https://github.com/google/syzkaller/issues/491). But this will require some non-trivial amount of work to pipe it through the whole system (starting from investigation/testing, second kernel to storing them and exposing). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-09 9:03 ` Dmitry Vyukov @ 2018-04-09 11:13 ` Tetsuo Handa 2018-04-09 11:50 ` Dmitry Vyukov 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2018-04-09 11:13 UTC (permalink / raw) To: dvyukov Cc: peterz, mingo, akpm, paulmck, linux-kernel, torvalds, msb, tglx, vegard.nossum Dmitry Vyukov wrote: > On Sat, Apr 7, 2018 at 6:24 PM, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Dmitry Vyukov wrote: > >> On Sat, Apr 7, 2018 at 5:39 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >> > On Sat, Apr 07, 2018 at 09:31:19PM +0900, Tetsuo Handa wrote: > >> >> are for replacing debug_show_all_locks() in check_hung_task() for cases like > >> >> https://syzkaller.appspot.com/bug?id=26aa22915f5e3b7ca2cfca76a939f12c25d624db > >> >> because we are interested in only threads holding locks. > >> >> > >> >> SysRq-t is too much but SysRq-w is useless for killable/interruptible threads... > >> > > >> > Or use a script to process the sysrq-t output? I mean, we can add all > >> > sorts, but where does it end? > > > > Maybe allow khungtaskd to call call_usermode_helper() to run arbitrary operations > > instead of just calling panic()? > > This would probably work for syzbot too. Yes, it should work in many cases. Something like below... ---------- --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -18,6 +18,7 @@ #include <linux/utsname.h> #include <linux/sched/signal.h> #include <linux/sched/debug.h> +#include <linux/kmod.h> #include <trace/events/sched.h> @@ -44,6 +45,7 @@ static int __read_mostly did_panic; static bool hung_task_show_lock; +static bool hung_task_call_panic; static struct task_struct *watchdog_task; @@ -127,10 +129,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) touch_nmi_watchdog(); if (sysctl_hung_task_panic) { - if (hung_task_show_lock) - debug_show_all_locks(); - trigger_all_cpu_backtrace(); - panic("hung_task: blocked tasks"); + hung_task_show_lock = true; + hung_task_call_panic = true; } } @@ -193,6 +193,23 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) rcu_read_unlock(); if (hung_task_show_lock) debug_show_all_locks(); + if (hung_task_call_panic) { + char *argv[2]; + char *envp[3]; + + trigger_all_cpu_backtrace(); + + argv[0] = (char *) "/sbin/khungtaskd_panic"; + argv[1] = NULL; + envp[0] = "HOME=/"; + envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin"; + envp[2] = NULL; + pr_emerg("Calling %s with 60 seconds timeout.\n", + argv[0]); + call_usermodehelper(argv[0], argv, envp, UMH_NO_WAIT); + schedule_timeout_interruptible(60 * HZ); + panic("hung_task: blocked tasks"); + } } static long hung_timeout_jiffies(unsigned long last_checked, ---------- What is unfortunate is that above won't work for "panic due to stall" cases. If available, kdump is preferable... > > >> Good question. > >> We are talking about few dozen more stacks, right? > >> > >> Not all kernel bugs are well reproducible, so it's not always possible > >> to go back and hit sysrq-t. And this come up in the context of syzbot, > >> which is an automated system. It reported a bunch of hangs and most of > >> them are real bugs, but not all of them are easily actionable. > >> Can it be a config or a command line argument, which will make syzbot > >> capture more useful context for each such hang? > >> > > > > It will be nice if syzbot testing is done with kdump configured, and the > > result of automated scripting on vmcore (such as "foreach bt -s -l") is > > available. > > kdump's popped up several times already > (https://github.com/google/syzkaller/issues/491). But this will > require some non-trivial amount of work to pipe it through the whole > system (starting from investigation/testing, second kernel to storing > them and exposing). > We can use different kernels for testing and kdump, can't we? Then, I think it is not difficult to load kernel for kdump from local disk. And kdump (kexec-tools) already supports dumping via ssh. Then, is there still non-trivial amount of work? Just a remote server for temporarily holding kernel for testing and run scripted analyzing commands ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic 2018-04-09 11:13 ` Tetsuo Handa @ 2018-04-09 11:50 ` Dmitry Vyukov 0 siblings, 0 replies; 12+ messages in thread From: Dmitry Vyukov @ 2018-04-09 11:50 UTC (permalink / raw) To: Tetsuo Handa Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Paul McKenney, LKML, Linus Torvalds, Mandeep Singh Baines, Thomas Gleixner, Vegard Nossum On Mon, Apr 9, 2018 at 1:13 PM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: >> > It will be nice if syzbot testing is done with kdump configured, and the >> > result of automated scripting on vmcore (such as "foreach bt -s -l") is >> > available. >> >> kdump's popped up several times already >> (https://github.com/google/syzkaller/issues/491). But this will >> require some non-trivial amount of work to pipe it through the whole >> system (starting from investigation/testing, second kernel to storing >> them and exposing). >> > > We can use different kernels for testing and kdump, can't we? Then, > I think it is not difficult to load kernel for kdump from local disk. > And kdump (kexec-tools) already supports dumping via ssh. Then, is there > still non-trivial amount of work? Just a remote server for temporarily > holding kernel for testing and run scripted analyzing commands ? It's just that usually fully automating something is much larger amount of work than doing it manually as a one-off thing. I also need to figure out how much time and space it takes to reboot into kdump kernel and extract the dump. I don't think that it's feasible to persistently store all kdumps, because we are getting ~1 crash/sec. Then, the web server that serves syzbot UI and sends emails is an Appengine web app which does not have direct access to test machines and/or git, but it seems that only it can decide when we need to store dumps persistently. In the current architecture test machines are disposable and are long gone by the time crash it uploaded to dashboard. So machines needs to be preserved until after dashboard says if we need dump or not. Or maybe extract dumps always and store them locally temporary until we know if we need to persist it or not. I don't know yet what will work better. This also needs to be carefully treated through crash reproduction process which has different logic from main testing loop. And at the end interfaces between multiple systems need to be extended, database format needs to be extended, lots of testing done, and we need to figure out what is a good config for kdump kernel and image build process needs to be extended to package kdump kernel, configs of multiple systems need to be extended and probably a bunch of other small things here and there. Then we also need vmlinux to make dumps actionable, right? And vmlinux is nice in itself because it allows to do objdump -d. So it probably makes sense to separate vmlinux uploading and persistance from dumps, because vmlinux'es probably better be uploaded once per kernel build (which is like once per day). So that will be separate paths through the system. Also probably makes sense to consider if https://github.com/google/syzkaller/issues/466 can be bundled with this work (at least data paths, what exactly is captured can of course be extended later). We also need to figure out if at least part of all this can be unit-tested and write tests. So, yes, nothing extraordinary. But I feel this is not doable within a day and will preferably require several uninterrupted days with nothing else urgent, but I am having troubles with such days lately... ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-04-09 11:50 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-02 14:12 [PATCH] locking/hung_task: Show all hung tasks before panic Tetsuo Handa 2018-04-02 15:35 ` Paul E. McKenney 2018-04-02 21:09 ` Tetsuo Handa 2018-04-03 15:23 ` Dmitry Vyukov 2018-04-04 22:05 ` [PATCH v2] " Tetsuo Handa 2018-04-07 12:31 ` Tetsuo Handa 2018-04-07 15:39 ` Peter Zijlstra 2018-04-07 16:03 ` Dmitry Vyukov 2018-04-07 16:24 ` Tetsuo Handa 2018-04-09 9:03 ` Dmitry Vyukov 2018-04-09 11:13 ` Tetsuo Handa 2018-04-09 11:50 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).