All of lore.kernel.org
 help / color / mirror / Atom feed
* rcu_preempt detected stalls.
@ 2014-10-13 17:35 Dave Jones
  2014-10-15  2:35 ` Sasha Levin
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Dave Jones @ 2014-10-13 17:35 UTC (permalink / raw)
  To: Linux Kernel; +Cc: paulmck, htejun

Today in "rcu stall while fuzzing" news:

INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
trinity-c342    R  running task    13384   766  32295 0x00000000
 ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
 ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
 [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
 [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
 [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
 [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c225    R  running task    13448   646  32295 0x00000000
 ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
 00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
 ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
 [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
 [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
 [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
 [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
 [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
 [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
 [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
 [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
 [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
 [<ffffffff88229ce1>] user_path_at+0x11/0x20
 [<ffffffff8824fac1>] do_utimes+0xd1/0x180
 [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
 [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c342    R  running task    13384   766  32295 0x00000000
 ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
 ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
 [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
 [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
 [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
 [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c225    R  running task    13448   646  32295 0x00000000
 ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
 00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
 ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
 [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
 [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
 [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
 [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
 [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
 [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
 [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
 [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
 [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
 [<ffffffff88229ce1>] user_path_at+0x11/0x20
 [<ffffffff8824fac1>] do_utimes+0xd1/0x180
 [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
 [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
	(detected by 3, t=26007 jiffies, g=75434, c=75433, q=0)
trinity-c342    R  running task    13384   766  32295 0x00000000
 ffff880068943d98 0000000000000002 0000000000000000 ffff880193c8c680
 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
 ffff88000188af00 ffff880193c8c680 ffff880068943fd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff8809f767>] ? pid_task+0x47/0xa0
 [<ffffffff8809f73d>] ? pid_task+0x1d/0xa0
 [<ffffffff8808d4f1>] kill_pid_info+0x61/0x130
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
 [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
 [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
 [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
 [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c225    R  running task    13448   646  32295 0x00000000
 ffff880161ccfb78 0000000000000002 ffffffff88c993ed ffff88000bf85e00
 00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
 ffff88005ea89780 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff8822303a>] ? lookup_fast+0xea/0x3d0
 [<ffffffff88223025>] ? lookup_fast+0xd5/0x3d0
 [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
 [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
 [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
 [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
 [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
 [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
 [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
 [<ffffffff88229ce1>] user_path_at+0x11/0x20
 [<ffffffff8824fac1>] do_utimes+0xd1/0x180
 [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
 [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c342    R  running task    13384   766  32295 0x00000000
 ffff880068943d98 0000000000000002 0000000000000000 ffff880193c8c680
 00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
 ffff88000188af00 ffff880193c8c680 ffff880068943fd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff8809f767>] ? pid_task+0x47/0xa0
 [<ffffffff8809f73d>] ? pid_task+0x1d/0xa0
 [<ffffffff8808d4f1>] kill_pid_info+0x61/0x130
 [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
 [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
 [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
 [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
 [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
 [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
trinity-c225    R  running task    13448   646  32295 0x00000000
 ffff880161ccfb78 0000000000000002 ffffffff88c993ed ffff88000bf85e00
 00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
 ffff88005ea89780 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
Call Trace:
 [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff8883df10>] retint_kernel+0x20/0x30
 [<ffffffff8822303a>] ? lookup_fast+0xea/0x3d0
 [<ffffffff88223025>] ? lookup_fast+0xd5/0x3d0
 [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
 [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
 [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
 [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
 [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
 [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
 [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
 [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
 [<ffffffff88229ce1>] user_path_at+0x11/0x20
 [<ffffffff8824fac1>] do_utimes+0xd1/0x180
 [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
 [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
 [<ffffffff8883d3a4>] tracesys+0xdd/0xe2

This is on Linus' current tree, with the new CONFIG_TASKS_RCU unset.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-13 17:35 rcu_preempt detected stalls Dave Jones
@ 2014-10-15  2:35 ` Sasha Levin
  2014-10-23 18:39   ` Paul E. McKenney
  2014-10-23 18:32 ` Paul E. McKenney
  2014-10-25  3:16 ` Dâniel Fraga
  2 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-10-15  2:35 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, paulmck, htejun

On 10/13/2014 01:35 PM, Dave Jones wrote:
> oday in "rcu stall while fuzzing" news:
> 
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)

I've complained about RCU stalls couple days ago (in a different context)
on -next. I guess whatever causing them made it into Linus's tree?

https://lkml.org/lkml/2014/10/11/64


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-13 17:35 rcu_preempt detected stalls Dave Jones
  2014-10-15  2:35 ` Sasha Levin
@ 2014-10-23 18:32 ` Paul E. McKenney
  2014-10-23 18:40   ` Dave Jones
  2014-10-23 19:13   ` Oleg Nesterov
  2014-10-25  3:16 ` Dâniel Fraga
  2 siblings, 2 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 18:32 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, htejun; +Cc: oleg

On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote:
> Today in "rcu stall while fuzzing" news:
> 
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> trinity-c342    R  running task    13384   766  32295 0x00000000
>  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
>  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
>  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
>  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
>  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
>  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
>  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2

Well, there is a loop in kill_pid_info().  I am surprised that it
would loop indefinitely, but if it did, you would certainly get
RCU CPU stalls.  Please see patch below, adding Oleg for his thoughts.

> trinity-c225    R  running task    13448   646  32295 0x00000000
>  ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
>  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
>  ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
>  [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
>  [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
>  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
>  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
>  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
>  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
>  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
>  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
>  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
>  [<ffffffff88229ce1>] user_path_at+0x11/0x20
>  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
>  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
>  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2

This one will require more looking.  But did you do something like
create a pair of mutually recursive symlinks or something?  ;-)

							Thanx, Paul

> trinity-c342    R  running task    13384   766  32295 0x00000000
>  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
>  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
>  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
>  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
>  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
>  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
>  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> trinity-c225    R  running task    13448   646  32295 0x00000000
>  ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
>  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
>  ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
>  [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
>  [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
>  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
>  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
>  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
>  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
>  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
>  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
>  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
>  [<ffffffff88229ce1>] user_path_at+0x11/0x20
>  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
>  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
>  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> 	(detected by 3, t=26007 jiffies, g=75434, c=75433, q=0)
> trinity-c342    R  running task    13384   766  32295 0x00000000
>  ffff880068943d98 0000000000000002 0000000000000000 ffff880193c8c680
>  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
>  ffff88000188af00 ffff880193c8c680 ffff880068943fd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff8809f767>] ? pid_task+0x47/0xa0
>  [<ffffffff8809f73d>] ? pid_task+0x1d/0xa0
>  [<ffffffff8808d4f1>] kill_pid_info+0x61/0x130
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
>  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
>  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
>  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
>  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> trinity-c225    R  running task    13448   646  32295 0x00000000
>  ffff880161ccfb78 0000000000000002 ffffffff88c993ed ffff88000bf85e00
>  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
>  ffff88005ea89780 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff8822303a>] ? lookup_fast+0xea/0x3d0
>  [<ffffffff88223025>] ? lookup_fast+0xd5/0x3d0
>  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
>  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
>  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
>  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
>  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
>  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
>  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
>  [<ffffffff88229ce1>] user_path_at+0x11/0x20
>  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
>  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
>  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> trinity-c342    R  running task    13384   766  32295 0x00000000
>  ffff880068943d98 0000000000000002 0000000000000000 ffff880193c8c680
>  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
>  ffff88000188af00 ffff880193c8c680 ffff880068943fd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff8809f767>] ? pid_task+0x47/0xa0
>  [<ffffffff8809f73d>] ? pid_task+0x1d/0xa0
>  [<ffffffff8808d4f1>] kill_pid_info+0x61/0x130
>  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
>  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
>  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
>  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
>  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
>  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> trinity-c225    R  running task    13448   646  32295 0x00000000
>  ffff880161ccfb78 0000000000000002 ffffffff88c993ed ffff88000bf85e00
>  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
>  ffff88005ea89780 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
> Call Trace:
>  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  [<ffffffff8822303a>] ? lookup_fast+0xea/0x3d0
>  [<ffffffff88223025>] ? lookup_fast+0xd5/0x3d0
>  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
>  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
>  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
>  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
>  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
>  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
>  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
>  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
>  [<ffffffff88229ce1>] user_path_at+0x11/0x20
>  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
>  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
>  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
>  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> 
> This is on Linus' current tree, with the new CONFIG_TASKS_RCU unset.

------------------------------------------------------------------------

diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f9f6dd..ef6525d0ca73 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1331,8 +1331,8 @@ int kill_pid_info(int sig, struct siginfo *info, struct pid *pid)
 	int error = -ESRCH;
 	struct task_struct *p;
 
-	rcu_read_lock();
 retry:
+	rcu_read_lock();
 	p = pid_task(pid, PIDTYPE_PID);
 	if (p) {
 		error = group_send_sig_info(sig, info, p);
@@ -1343,6 +1343,7 @@ retry:
 			 * if we race with de_thread() it will find the
 			 * new leader.
 			 */
+			rcu_read_unlock();
 			goto retry;
 	}
 	rcu_read_unlock();


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-15  2:35 ` Sasha Levin
@ 2014-10-23 18:39   ` Paul E. McKenney
  2014-10-23 18:55     ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 18:39 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun

On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > oday in "rcu stall while fuzzing" news:
> > 
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> 
> I've complained about RCU stalls couple days ago (in a different context)
> on -next. I guess whatever causing them made it into Linus's tree?
> 
> https://lkml.org/lkml/2014/10/11/64

And on that one, I must confess that I don't see where the RCU read-side
critical section might be.

Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
Can you reproduce this with CONFIG_PROVE_RCU=y?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 18:32 ` Paul E. McKenney
@ 2014-10-23 18:40   ` Dave Jones
  2014-10-23 19:28     ` Paul E. McKenney
  2014-10-23 19:13   ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Jones @ 2014-10-23 18:40 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 11:32:32AM -0700, Paul E. McKenney wrote:
 
 > > trinity-c225    R  running task    13448   646  32295 0x00000000
 > >  ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
 > >  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
 > >  ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
 > > Call Trace:
 > >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
 > >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
 > >  [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
 > >  [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
 > >  [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
 > >  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
 > >  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
 > >  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
 > >  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
 > >  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
 > >  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
 > >  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
 > >  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
 > >  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
 > >  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
 > >  [<ffffffff88229ce1>] user_path_at+0x11/0x20
 > >  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
 > >  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
 > >  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
 > >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
 > 
 > This one will require more looking.  But did you do something like
 > create a pair of mutually recursive symlinks or something?  ;-)

I'm not 100% sure, but this may have been on a box that I was running
tests on NFS. So maybe the server had disappeared with the mount
still active..

Just a guess tbh.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 18:39   ` Paul E. McKenney
@ 2014-10-23 18:55     ` Sasha Levin
  2014-10-23 19:58       ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-10-23 18:55 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun

On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>> oday in "rcu stall while fuzzing" news:
>>>
>>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>
>> I've complained about RCU stalls couple days ago (in a different context)
>> on -next. I guess whatever causing them made it into Linus's tree?
>>
>> https://lkml.org/lkml/2014/10/11/64
> 
> And on that one, I must confess that I don't see where the RCU read-side
> critical section might be.
> 
> Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> Can you reproduce this with CONFIG_PROVE_RCU=y?

Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
set and nothing else is showing up before/after that.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 18:32 ` Paul E. McKenney
  2014-10-23 18:40   ` Dave Jones
@ 2014-10-23 19:13   ` Oleg Nesterov
  2014-10-23 19:38     ` Paul E. McKenney
  1 sibling, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2014-10-23 19:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Dave Jones, Linux Kernel, htejun

On 10/23, Paul E. McKenney wrote:
>
> On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote:
> > Today in "rcu stall while fuzzing" news:
> >
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > trinity-c342    R  running task    13384   766  32295 0x00000000
> >  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
> >  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
> >  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> > Call Trace:
> >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
> >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
> >  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
> >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> >  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
> >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> >  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
> >  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
> >  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
> >  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
> >  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
> >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
>
> Well, there is a loop in kill_pid_info().  I am surprised that it
> would loop indefinitely, but if it did, you would certainly get
> RCU CPU stalls.  Please see patch below, adding Oleg for his thoughts.

Yes, this loops should not be a problem, we only restart if we race with
a multi-threaded exec from a non-leader thread.

But I already saw a couple of bug-reports which look as a task_struct
corruption (->signal/creds == NULL), looks like something was broken
recently. Perhaps an unbalanced put_task_struct...

_Perhaps_ this is another case. If ->sighand was nullified then it will
loop forever.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 18:40   ` Dave Jones
@ 2014-10-23 19:28     ` Paul E. McKenney
  2014-10-23 19:37       ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 19:28 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 02:40:18PM -0400, Dave Jones wrote:
> On Thu, Oct 23, 2014 at 11:32:32AM -0700, Paul E. McKenney wrote:
> 
>  > > trinity-c225    R  running task    13448   646  32295 0x00000000
>  > >  ffff880161ccfb28 0000000000000002 ffff880161ccfe10 ffff88000bf85e00
>  > >  00000000001d4100 0000000000000003 ffff880161ccffd8 00000000001d4100
>  > >  ffff880030124680 ffff88000bf85e00 ffff880161ccffd8 0000000000000000
>  > > Call Trace:
>  > >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
>  > >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
>  > >  [<ffffffff88233d41>] ? __d_lookup_rcu+0xd1/0x1e0
>  > >  [<ffffffff88233dd6>] ? __d_lookup_rcu+0x166/0x1e0
>  > >  [<ffffffff88222f9f>] lookup_fast+0x4f/0x3d0
>  > >  [<ffffffff88224857>] link_path_walk+0x1a7/0x8a0
>  > >  [<ffffffff88224f95>] ? path_lookupat+0x45/0x7b0
>  > >  [<ffffffff88224fb7>] path_lookupat+0x67/0x7b0
>  > >  [<ffffffff880d385d>] ? trace_hardirqs_off+0xd/0x10
>  > >  [<ffffffff8883dda4>] ? retint_restore_args+0xe/0xe
>  > >  [<ffffffff8822572b>] filename_lookup+0x2b/0xc0
>  > >  [<ffffffff88229c77>] user_path_at_empty+0x67/0xc0
>  > >  [<ffffffff880d3dbe>] ? put_lock_stats.isra.27+0xe/0x30
>  > >  [<ffffffff880d42a6>] ? lock_release_holdtime.part.28+0xe6/0x160
>  > >  [<ffffffff880b15ad>] ? get_parent_ip+0xd/0x50
>  > >  [<ffffffff88229ce1>] user_path_at+0x11/0x20
>  > >  [<ffffffff8824fac1>] do_utimes+0xd1/0x180
>  > >  [<ffffffff8824fbef>] SyS_utime+0x7f/0xc0
>  > >  [<ffffffff8883d345>] ? tracesys+0x7e/0xe2
>  > >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
>  > 
>  > This one will require more looking.  But did you do something like
>  > create a pair of mutually recursive symlinks or something?  ;-)
> 
> I'm not 100% sure, but this may have been on a box that I was running
> tests on NFS. So maybe the server had disappeared with the mount
> still active..
> 
> Just a guess tbh.

Another possibility might be that the box was so overloaded that tasks
were getting preempted for 21 seconds as a matter of course, and sometimes
within RCU read-side critical sections.  Or did the box have ample idle
time?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:28     ` Paul E. McKenney
@ 2014-10-23 19:37       ` Dave Jones
  2014-10-23 19:52         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2014-10-23 19:37 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
 
 > >  > This one will require more looking.  But did you do something like
 > >  > create a pair of mutually recursive symlinks or something?  ;-)
 > > 
 > > I'm not 100% sure, but this may have been on a box that I was running
 > > tests on NFS. So maybe the server had disappeared with the mount
 > > still active..
 > > 
 > > Just a guess tbh.
 > 
 > Another possibility might be that the box was so overloaded that tasks
 > were getting preempted for 21 seconds as a matter of course, and sometimes
 > within RCU read-side critical sections.  Or did the box have ample idle
 > time?

I fairly recently upped the number of child processes I typically run
with, so it being overloaded does sound highly likely.

	Dave

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:13   ` Oleg Nesterov
@ 2014-10-23 19:38     ` Paul E. McKenney
  2014-10-23 19:53       ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 19:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Dave Jones, Linux Kernel, htejun

On Thu, Oct 23, 2014 at 09:13:19PM +0200, Oleg Nesterov wrote:
> On 10/23, Paul E. McKenney wrote:
> >
> > On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote:
> > > Today in "rcu stall while fuzzing" news:
> > >
> > > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > > trinity-c342    R  running task    13384   766  32295 0x00000000
> > >  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
> > >  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
> > >  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> > > Call Trace:
> > >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
> > >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
> > >  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
> > >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> > >  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
> > >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> > >  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
> > >  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
> > >  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
> > >  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
> > >  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
> > >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> >
> > Well, there is a loop in kill_pid_info().  I am surprised that it
> > would loop indefinitely, but if it did, you would certainly get
> > RCU CPU stalls.  Please see patch below, adding Oleg for his thoughts.
> 
> Yes, this loops should not be a problem, we only restart if we race with
> a multi-threaded exec from a non-leader thread.
> 
> But I already saw a couple of bug-reports which look as a task_struct
> corruption (->signal/creds == NULL), looks like something was broken
> recently. Perhaps an unbalanced put_task_struct...
> 
> _Perhaps_ this is another case. If ->sighand was nullified then it will
> loop forever.

OK, so making each pass through the loop a separate RCU read-side critical
section might be considered to be suppressing notification of an error
condition?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:37       ` Dave Jones
@ 2014-10-23 19:52         ` Paul E. McKenney
  2014-10-23 20:28           ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 19:52 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 03:37:59PM -0400, Dave Jones wrote:
> On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
> 
>  > >  > This one will require more looking.  But did you do something like
>  > >  > create a pair of mutually recursive symlinks or something?  ;-)
>  > > 
>  > > I'm not 100% sure, but this may have been on a box that I was running
>  > > tests on NFS. So maybe the server had disappeared with the mount
>  > > still active..
>  > > 
>  > > Just a guess tbh.
>  > 
>  > Another possibility might be that the box was so overloaded that tasks
>  > were getting preempted for 21 seconds as a matter of course, and sometimes
>  > within RCU read-side critical sections.  Or did the box have ample idle
>  > time?
> 
> I fairly recently upped the number of child processes I typically run
> with, so it being overloaded does sound highly likely.

Ah, that could do it!  One way to test extreme loads and not trigger
RCU CPU stall warnings might be to make all of your child processes all
sleep during a given interval of a few hundred milliseconds during each
ten-second interval.  Would that work for you?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:38     ` Paul E. McKenney
@ 2014-10-23 19:53       ` Oleg Nesterov
  2014-10-23 20:24         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2014-10-23 19:53 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Dave Jones, Linux Kernel, htejun

On 10/23, Paul E. McKenney wrote:
>
> OK, so making each pass through the loop a separate RCU read-side critical
> section might be considered to be suppressing notification of an error
> condition?

I agree, this change probably makes sense anyway. Personally I'd prefer
the version below (somehow I hate multiple unlock's), but I won't insist.

Oleg.

--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -1331,21 +1331,19 @@ int kill_pid_info(int sig, struct siginf
 	int error = -ESRCH;
 	struct task_struct *p;
 
-	rcu_read_lock();
 retry:
+	rcu_read_lock();
 	p = pid_task(pid, PIDTYPE_PID);
-	if (p) {
+	if (p)
 		error = group_send_sig_info(sig, info, p);
-		if (unlikely(error == -ESRCH))
-			/*
-			 * The task was unhashed in between, try again.
-			 * If it is dead, pid_task() will return NULL,
-			 * if we race with de_thread() it will find the
-			 * new leader.
-			 */
-			goto retry;
-	}
 	rcu_read_unlock();
+	/*
+	 * The task was unhashed in between, try again. If it is dead,
+	 * pid_task() will return NULL, if we race with de_thread() it
+	 * will find the new leader.
+	 */
+	if (p && error == -ESRCH))
+		goto retry;
 
 	return error;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 18:55     ` Sasha Levin
@ 2014-10-23 19:58       ` Paul E. McKenney
  2014-10-24 12:28         ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 19:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun

On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> >>> oday in "rcu stall while fuzzing" news:
> >>>
> >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> >>
> >> I've complained about RCU stalls couple days ago (in a different context)
> >> on -next. I guess whatever causing them made it into Linus's tree?
> >>
> >> https://lkml.org/lkml/2014/10/11/64
> > 
> > And on that one, I must confess that I don't see where the RCU read-side
> > critical section might be.
> > 
> > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > Can you reproduce this with CONFIG_PROVE_RCU=y?
> 
> Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> set and nothing else is showing up before/after that.

Indeed it was directed to you.  ;-)

Does the following crude diagnostic patch turn up anything?

							Thanx, Paul

------------------------------------------------------------------------

softirq: Check for RCU read-side misnesting in softirq handlers

This commit adds checks for RCU read-side misnesting in softirq handlers.
Please note that this works only for CONFIG_TREE_PREEMPT_RCU=y because
the other RCU flavors have no way of knowing how deeply nested they are.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 501baa9ac1be..c6b63a4c576d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -257,11 +257,13 @@ restart:
 	while ((softirq_bit = ffs(pending))) {
 		unsigned int vec_nr;
 		int prev_count;
+		int rcu_depth;
 
 		h += softirq_bit - 1;
 
 		vec_nr = h - softirq_vec;
 		prev_count = preempt_count();
+		rcu_depth = rcu_preempt_depth();
 
 		kstat_incr_softirqs_this_cpu(vec_nr);
 
@@ -274,6 +276,11 @@ restart:
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
+		if (IS_ENABLED(CONFIG_PROVE_RCU) &&
+		    rcu_depth != rcu_preempt_depth())
+			pr_err("huh, entered softirq %u %s %p with RCU nesting %08x, exited with %08x?\n",
+			       vec_nr, softirq_to_name[vec_nr], h->action,
+			       rcu_depth, rcu_preempt_depth());
 		h++;
 		pending >>= softirq_bit;
 	}


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:53       ` Oleg Nesterov
@ 2014-10-23 20:24         ` Paul E. McKenney
  2014-10-23 21:13           ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 20:24 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Dave Jones, Linux Kernel, htejun

On Thu, Oct 23, 2014 at 09:53:37PM +0200, Oleg Nesterov wrote:
> On 10/23, Paul E. McKenney wrote:
> >
> > OK, so making each pass through the loop a separate RCU read-side critical
> > section might be considered to be suppressing notification of an error
> > condition?
> 
> I agree, this change probably makes sense anyway. Personally I'd prefer
> the version below (somehow I hate multiple unlock's), but I won't insist.

Your code, your rules.  ;-)

But given this structure, why not use a for() loop replace the
"goto retry" with an inverted condition and a "return error"?
Maybe something like the following patch?

							Thanx, Paul

------------------------------------------------------------------------

signal: Exit RCU read-side critical section on each pass through loop

The kill_pid_info() can potentially loop indefinitely if tasks are created
and deleted sufficiently quickly, and if this happens, this function
will remain in a single RCU read-side critical section indefinitely.
This commit therefore exits the RCU read-side critical section on each
pass through the loop.  Because a race must happen to retry the loop,
this should have no performance impact in the common case.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>

diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f9f6dd..54820984a872 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1331,23 +1331,21 @@ int kill_pid_info(int sig, struct siginfo *info, struct pid *pid)
 	int error = -ESRCH;
 	struct task_struct *p;
 
-	rcu_read_lock();
-retry:
-	p = pid_task(pid, PIDTYPE_PID);
-	if (p) {
-		error = group_send_sig_info(sig, info, p);
-		if (unlikely(error == -ESRCH))
-			/*
-			 * The task was unhashed in between, try again.
-			 * If it is dead, pid_task() will return NULL,
-			 * if we race with de_thread() it will find the
-			 * new leader.
-			 */
-			goto retry;
-	}
-	rcu_read_unlock();
+	for (;;) {
+		rcu_read_lock();
+		p = pid_task(pid, PIDTYPE_PID);
+		if (p)
+			error = group_send_sig_info(sig, info, p);
+		rcu_read_unlock();
+		if (likely(!p || error != -ESRCH))
+			return error;
 
-	return error;
+		/*
+		 * The task was unhashed in between, try again.  If it
+		 * is dead, pid_task() will return NULL, if we race with
+		 * de_thread() it will find the new leader.
+		 */
+	}
 }
 
 int kill_proc_info(int sig, struct siginfo *info, pid_t pid)


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:52         ` Paul E. McKenney
@ 2014-10-23 20:28           ` Dave Jones
  2014-10-23 20:44             ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2014-10-23 20:28 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 12:52:21PM -0700, Paul E. McKenney wrote:
 > On Thu, Oct 23, 2014 at 03:37:59PM -0400, Dave Jones wrote:
 > > On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
 > > 
 > >  > >  > This one will require more looking.  But did you do something like
 > >  > >  > create a pair of mutually recursive symlinks or something?  ;-)
 > >  > > 
 > >  > > I'm not 100% sure, but this may have been on a box that I was running
 > >  > > tests on NFS. So maybe the server had disappeared with the mount
 > >  > > still active..
 > >  > > 
 > >  > > Just a guess tbh.
 > >  > 
 > >  > Another possibility might be that the box was so overloaded that tasks
 > >  > were getting preempted for 21 seconds as a matter of course, and sometimes
 > >  > within RCU read-side critical sections.  Or did the box have ample idle
 > >  > time?
 > > 
 > > I fairly recently upped the number of child processes I typically run
 > > with, so it being overloaded does sound highly likely.
 > 
 > Ah, that could do it!  One way to test extreme loads and not trigger
 > RCU CPU stall warnings might be to make all of your child processes all
 > sleep during a given interval of a few hundred milliseconds during each
 > ten-second interval.  Would that work for you?

This feels like hiding from the problem rather than fixing it.
I'm not sure it even makes sense to add sleeps to the fuzzer, other than
to slow things down, and if I were to do that, I may as well just run
it with fewer threads instead.

While the fuzzer is doing pretty crazy stuff, what's different about it
from any other application that overcommits the CPU with too many threads?

We impose rlimits to stop people from forkbombing and the like, but this
doesn't even need that many processes to trigger, and with some effort
could probably done with even fewer if I found ways to keep other cores
busy in the kernel for long enough.

That all said, I don't have easy reproducers for this right now, due
to other bugs manifesting long before this gets to be a problem.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 20:28           ` Dave Jones
@ 2014-10-23 20:44             ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 20:44 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, htejun, oleg

On Thu, Oct 23, 2014 at 04:28:16PM -0400, Dave Jones wrote:
> On Thu, Oct 23, 2014 at 12:52:21PM -0700, Paul E. McKenney wrote:
>  > On Thu, Oct 23, 2014 at 03:37:59PM -0400, Dave Jones wrote:
>  > > On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
>  > > 
>  > >  > >  > This one will require more looking.  But did you do something like
>  > >  > >  > create a pair of mutually recursive symlinks or something?  ;-)
>  > >  > > 
>  > >  > > I'm not 100% sure, but this may have been on a box that I was running
>  > >  > > tests on NFS. So maybe the server had disappeared with the mount
>  > >  > > still active..
>  > >  > > 
>  > >  > > Just a guess tbh.
>  > >  > 
>  > >  > Another possibility might be that the box was so overloaded that tasks
>  > >  > were getting preempted for 21 seconds as a matter of course, and sometimes
>  > >  > within RCU read-side critical sections.  Or did the box have ample idle
>  > >  > time?
>  > > 
>  > > I fairly recently upped the number of child processes I typically run
>  > > with, so it being overloaded does sound highly likely.
>  > 
>  > Ah, that could do it!  One way to test extreme loads and not trigger
>  > RCU CPU stall warnings might be to make all of your child processes all
>  > sleep during a given interval of a few hundred milliseconds during each
>  > ten-second interval.  Would that work for you?
> 
> This feels like hiding from the problem rather than fixing it.
> I'm not sure it even makes sense to add sleeps to the fuzzer, other than
> to slow things down, and if I were to do that, I may as well just run
> it with fewer threads instead.

I was thinking of the RCU CPU stall warnings that were strictly due to
overload as being false positives.  If trinity caused a kthread to loop
within an RCU read-side critical section, you would still get the RCU
CPU stall warning even with the sleeps.

But just a suggestion, no strong feelings.  Might change if there is an
excess of false-positive RCU CPU stall warnings, of course.  ;-)

> While the fuzzer is doing pretty crazy stuff, what's different about it
> from any other application that overcommits the CPU with too many threads?

The (presumably) much higher probability of being preempted in the kernel,
and thus within an RCU read-side critical section.

> We impose rlimits to stop people from forkbombing and the like, but this
> doesn't even need that many processes to trigger, and with some effort
> could probably done with even fewer if I found ways to keep other cores
> busy in the kernel for long enough.
> 
> That all said, I don't have easy reproducers for this right now, due
> to other bugs manifesting long before this gets to be a problem.

Fair enough!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 20:24         ` Paul E. McKenney
@ 2014-10-23 21:13           ` Oleg Nesterov
  2014-10-23 21:38             ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2014-10-23 21:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Dave Jones, Linux Kernel, htejun

On 10/23, Paul E. McKenney wrote:
>
> Your code, your rules.  ;-)

Heh, no. I do not trust my (perverted) taste, I never-never
argue with cosmetic issues ;)

Cough... and at the same time I have a small nit.

> But given this structure, why not use a for() loop replace the
> "goto retry" with an inverted condition and a "return error"?
> Maybe something like the following patch?

Thanks,

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 21:13           ` Oleg Nesterov
@ 2014-10-23 21:38             ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-23 21:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Dave Jones, Linux Kernel, htejun

On Thu, Oct 23, 2014 at 11:13:04PM +0200, Oleg Nesterov wrote:
> On 10/23, Paul E. McKenney wrote:
> >
> > Your code, your rules.  ;-)
> 
> Heh, no. I do not trust my (perverted) taste, I never-never
> argue with cosmetic issues ;)
> 
> Cough... and at the same time I have a small nit.

;-) ;-) ;-)

> > But given this structure, why not use a for() loop replace the
> > "goto retry" with an inverted condition and a "return error"?
> > Maybe something like the following patch?
> 
> Thanks,
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

Got it, thank you!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-23 19:58       ` Paul E. McKenney
@ 2014-10-24 12:28         ` Sasha Levin
  2014-10-24 16:13           ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-10-24 12:28 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun

On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>> > >>>
>>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>> > >>
>>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>> > >>
>>>> > >> https://lkml.org/lkml/2014/10/11/64
>>> > > 
>>> > > And on that one, I must confess that I don't see where the RCU read-side
>>> > > critical section might be.
>>> > > 
>>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>> > 
>> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>> > set and nothing else is showing up before/after that.
> Indeed it was directed to you.  ;-)
> 
> Does the following crude diagnostic patch turn up anything?

Nope, seeing stalls but not seeing that pr_err() you added.

[ 5107.395916] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 5107.395916]  0: (776 ticks this GP) idle=a8d/140000000000002/0 softirq=16356/16356 last_accelerate: f5b7/55e5, nonlazy_posted: 24252, ..
[ 5107.395916]  (detected by 1, t=20502 jiffies, g=13949, c=13948, q=0)
[ 5107.395916] Task dump for CPU 0:
[ 5107.395916] trinity-c0      R  running task    12848 20357   9041 0x0008000e
[ 5107.395916]  0000000000000000 ffff88006bfd76c0 ffff88065722b988 ffffffffa10af964
[ 5107.395916]  ffff88065722b998 ffffffffa106ad23 ffff88065722b9c8 ffffffffa119dce5
[ 5107.395916]  00000000001d76c0 ffff88006bfd76c0 00000000001d76c0 ffff8806473cbd10
[ 5107.395916] Call Trace:
[ 5107.395916]  [<ffffffffa10af964>] ? kvm_clock_read+0x24/0x40
[ 5107.395916]  [<ffffffffa106ad23>] ? sched_clock+0x13/0x30
[ 5107.395916]  [<ffffffffa119dce5>] ? sched_clock_local+0x25/0x90
[ 5107.395916]  [<ffffffffa1303dfb>] ? __slab_free+0xbb/0x3a0
[ 5107.395916]  [<ffffffffa1b71167>] ? debug_smp_processor_id+0x17/0x20
[ 5107.395916]  [<ffffffffa451cb64>] ? _raw_spin_unlock_irqrestore+0x64/0xa0
[ 5107.395916]  [<ffffffffa1303dfb>] ? __slab_free+0xbb/0x3a0
[ 5107.395916]  [<ffffffffa1b71bce>] ? __debug_check_no_obj_freed+0x10e/0x210
[ 5107.395916]  [<ffffffffa1305871>] ? kmem_cache_free+0xb1/0x4f0
[ 5107.395916]  [<ffffffffa1305883>] ? kmem_cache_free+0xc3/0x4f0
[ 5107.395916]  [<ffffffffa1305bb2>] ? kmem_cache_free+0x3f2/0x4f0
[ 5107.395916]  [<ffffffffa12e0cbe>] ? unlink_anon_vmas+0x10e/0x180
[ 5107.395916]  [<ffffffffa12e0cbe>] ? unlink_anon_vmas+0x10e/0x180
[ 5107.395916]  [<ffffffffa12cfbdf>] ? free_pgtables+0x3f/0x130
[ 5107.395916]  [<ffffffffa12dc1a4>] ? exit_mmap+0xc4/0x180
[ 5107.395916]  [<ffffffffa13143fe>] ? __khugepaged_exit+0xbe/0x120
[ 5107.395916]  [<ffffffffa115bbb3>] ? mmput+0x73/0x110
[ 5107.395916]  [<ffffffffa1162eb7>] ? do_exit+0x2c7/0xd30
[ 5107.395916]  [<ffffffffa1173fb9>] ? get_signal+0x3c9/0xaf0
[ 5107.395916]  [<ffffffffa1b71167>] ? debug_smp_processor_id+0x17/0x20
[ 5107.395916]  [<ffffffffa11bccbe>] ? put_lock_stats.isra.13+0xe/0x30
[ 5107.395916]  [<ffffffffa451c810>] ? _raw_spin_unlock_irq+0x30/0x70
[ 5107.395916]  [<ffffffffa11639c2>] ? do_group_exit+0x52/0xe0
[ 5107.395916]  [<ffffffffa1173ef6>] ? get_signal+0x306/0xaf0
[ 5107.395916]  [<ffffffffa119dce5>] ? sched_clock_local+0x25/0x90
[ 5107.395916]  [<ffffffffa105f2f0>] ? do_signal+0x20/0x130
[ 5107.395916]  [<ffffffffa1298558>] ? context_tracking_user_exit+0x78/0x2d0
[ 5107.395916]  [<ffffffffa1b71183>] ? __this_cpu_preempt_check+0x13/0x20
[ 5107.395916]  [<ffffffffa11c04cb>] ? trace_hardirqs_on_caller+0xfb/0x280
[ 5107.395916]  [<ffffffffa11c065d>] ? trace_hardirqs_on+0xd/0x10
[ 5107.395916]  [<ffffffffa105f469>] ? do_notify_resume+0x69/0xb0
[ 5107.395916]  [<ffffffffa451d74f>] ? int_signal+0x12/0x17


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-24 12:28         ` Sasha Levin
@ 2014-10-24 16:13           ` Paul E. McKenney
  2014-10-24 16:39             ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-24 16:13 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun

On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> >>>>> > >>> oday in "rcu stall while fuzzing" news:
> >>>>> > >>>
> >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> >>>> > >>
> >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> >>>> > >>
> >>>> > >> https://lkml.org/lkml/2014/10/11/64
> >>> > > 
> >>> > > And on that one, I must confess that I don't see where the RCU read-side
> >>> > > critical section might be.
> >>> > > 
> >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> >> > 
> >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> >> > set and nothing else is showing up before/after that.
> > Indeed it was directed to you.  ;-)
> > 
> > Does the following crude diagnostic patch turn up anything?
> 
> Nope, seeing stalls but not seeing that pr_err() you added.

OK, color me confused.  Could you please send me the full dmesg or a
pointer to it?

							Thanx, Paul

> [ 5107.395916] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 5107.395916]  0: (776 ticks this GP) idle=a8d/140000000000002/0 softirq=16356/16356 last_accelerate: f5b7/55e5, nonlazy_posted: 24252, ..
> [ 5107.395916]  (detected by 1, t=20502 jiffies, g=13949, c=13948, q=0)
> [ 5107.395916] Task dump for CPU 0:
> [ 5107.395916] trinity-c0      R  running task    12848 20357   9041 0x0008000e
> [ 5107.395916]  0000000000000000 ffff88006bfd76c0 ffff88065722b988 ffffffffa10af964
> [ 5107.395916]  ffff88065722b998 ffffffffa106ad23 ffff88065722b9c8 ffffffffa119dce5
> [ 5107.395916]  00000000001d76c0 ffff88006bfd76c0 00000000001d76c0 ffff8806473cbd10
> [ 5107.395916] Call Trace:
> [ 5107.395916]  [<ffffffffa10af964>] ? kvm_clock_read+0x24/0x40
> [ 5107.395916]  [<ffffffffa106ad23>] ? sched_clock+0x13/0x30
> [ 5107.395916]  [<ffffffffa119dce5>] ? sched_clock_local+0x25/0x90
> [ 5107.395916]  [<ffffffffa1303dfb>] ? __slab_free+0xbb/0x3a0
> [ 5107.395916]  [<ffffffffa1b71167>] ? debug_smp_processor_id+0x17/0x20
> [ 5107.395916]  [<ffffffffa451cb64>] ? _raw_spin_unlock_irqrestore+0x64/0xa0
> [ 5107.395916]  [<ffffffffa1303dfb>] ? __slab_free+0xbb/0x3a0
> [ 5107.395916]  [<ffffffffa1b71bce>] ? __debug_check_no_obj_freed+0x10e/0x210
> [ 5107.395916]  [<ffffffffa1305871>] ? kmem_cache_free+0xb1/0x4f0
> [ 5107.395916]  [<ffffffffa1305883>] ? kmem_cache_free+0xc3/0x4f0
> [ 5107.395916]  [<ffffffffa1305bb2>] ? kmem_cache_free+0x3f2/0x4f0
> [ 5107.395916]  [<ffffffffa12e0cbe>] ? unlink_anon_vmas+0x10e/0x180
> [ 5107.395916]  [<ffffffffa12e0cbe>] ? unlink_anon_vmas+0x10e/0x180
> [ 5107.395916]  [<ffffffffa12cfbdf>] ? free_pgtables+0x3f/0x130
> [ 5107.395916]  [<ffffffffa12dc1a4>] ? exit_mmap+0xc4/0x180
> [ 5107.395916]  [<ffffffffa13143fe>] ? __khugepaged_exit+0xbe/0x120
> [ 5107.395916]  [<ffffffffa115bbb3>] ? mmput+0x73/0x110
> [ 5107.395916]  [<ffffffffa1162eb7>] ? do_exit+0x2c7/0xd30
> [ 5107.395916]  [<ffffffffa1173fb9>] ? get_signal+0x3c9/0xaf0
> [ 5107.395916]  [<ffffffffa1b71167>] ? debug_smp_processor_id+0x17/0x20
> [ 5107.395916]  [<ffffffffa11bccbe>] ? put_lock_stats.isra.13+0xe/0x30
> [ 5107.395916]  [<ffffffffa451c810>] ? _raw_spin_unlock_irq+0x30/0x70
> [ 5107.395916]  [<ffffffffa11639c2>] ? do_group_exit+0x52/0xe0
> [ 5107.395916]  [<ffffffffa1173ef6>] ? get_signal+0x306/0xaf0
> [ 5107.395916]  [<ffffffffa119dce5>] ? sched_clock_local+0x25/0x90
> [ 5107.395916]  [<ffffffffa105f2f0>] ? do_signal+0x20/0x130
> [ 5107.395916]  [<ffffffffa1298558>] ? context_tracking_user_exit+0x78/0x2d0
> [ 5107.395916]  [<ffffffffa1b71183>] ? __this_cpu_preempt_check+0x13/0x20
> [ 5107.395916]  [<ffffffffa11c04cb>] ? trace_hardirqs_on_caller+0xfb/0x280
> [ 5107.395916]  [<ffffffffa11c065d>] ? trace_hardirqs_on+0xd/0x10
> [ 5107.395916]  [<ffffffffa105f469>] ? do_notify_resume+0x69/0xb0
> [ 5107.395916]  [<ffffffffa451d74f>] ? int_signal+0x12/0x17
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-24 16:13           ` Paul E. McKenney
@ 2014-10-24 16:39             ` Sasha Levin
  2014-10-27 21:13               ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-10-24 16:39 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>> > >>>> > >>
>>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>> > >>>> > >>
>>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>> > >>> > > 
>>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>> > >>> > > critical section might be.
>>>>>>> > >>> > > 
>>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>> > >> > 
>>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>> > >> > set and nothing else is showing up before/after that.
>>> > > Indeed it was directed to you.  ;-)
>>> > > 
>>> > > Does the following crude diagnostic patch turn up anything?
>> > 
>> > Nope, seeing stalls but not seeing that pr_err() you added.
> OK, color me confused.  Could you please send me the full dmesg or a
> pointer to it?

Attached.


Thanks,
Sasha

[-- Attachment #2: log.xz --]
[-- Type: application/x-xz, Size: 174240 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-13 17:35 rcu_preempt detected stalls Dave Jones
  2014-10-15  2:35 ` Sasha Levin
  2014-10-23 18:32 ` Paul E. McKenney
@ 2014-10-25  3:16 ` Dâniel Fraga
  2 siblings, 0 replies; 46+ messages in thread
From: Dâniel Fraga @ 2014-10-25  3:16 UTC (permalink / raw)
  To: linux-kernel

On Mon, 13 Oct 2014 13:35:04 -0400
Dave Jones <davej@redhat.com> wrote:

> Today in "rcu stall while fuzzing" news:

	My bug report seems to be related with this topic:

Regression: kernel 3.17 halts sometimes (with Call trace)

https://bugzilla.kernel.org/show_bug.cgi?id=85941

	Could somone take a look?

	Thanks.

-- 
Linux 3.16.0-00115-g19583ca: Shuffling Zombie Juror
http://www.youtube.com/DanielFragaBR
http://exchangewar.info
Bitcoin: 12H6661yoLDUZaYPdah6urZS5WiXwTAUgL



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-24 16:39             ` Sasha Levin
@ 2014-10-27 21:13               ` Paul E. McKenney
  2014-10-27 23:44                   ` Paul E. McKenney
  2014-10-30 23:41                 ` Sasha Levin
  0 siblings, 2 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-27 21:13 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun

On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> >>>>>>>>>>> > >>>>> > >>>
> >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> >>>>>>> > >>> > > critical section might be.
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> >>>>> > >> > 
> >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> >>>>> > >> > set and nothing else is showing up before/after that.
> >>> > > Indeed it was directed to you.  ;-)
> >>> > > 
> >>> > > Does the following crude diagnostic patch turn up anything?
> >> > 
> >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > OK, color me confused.  Could you please send me the full dmesg or a
> > pointer to it?
> 
> Attached.

Thank you!  I would complain about the FAULT_INJECTION messages, but
they don't appear to be happening all that frequently.

The stack dumps do look different here.  I suspect that this is a real
issue in the VM code.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-27 21:13               ` Paul E. McKenney
@ 2014-10-27 23:44                   ` Paul E. McKenney
  2014-10-30 23:41                 ` Sasha Levin
  1 sibling, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-27 23:44 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > >>>>>>>>>>> > >>>>> > >>>
> > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > >>>>>>> > >>> > > critical section might be.
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > >>>>> > >> > 
> > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > >>>>> > >> > set and nothing else is showing up before/after that.
> > >>> > > Indeed it was directed to you.  ;-)
> > >>> > > 
> > >>> > > Does the following crude diagnostic patch turn up anything?
> > >> > 
> > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > OK, color me confused.  Could you please send me the full dmesg or a
> > > pointer to it?
> > 
> > Attached.
> 
> Thank you!  I would complain about the FAULT_INJECTION messages, but
> they don't appear to be happening all that frequently.
> 
> The stack dumps do look different here.  I suspect that this is a real
> issue in the VM code.

And to that end...  The filemap_map_pages() function does have loop over
a list of pages.  I wonder if the rcu_read_lock() should be moved into
the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
though it looks to me like the current radix_tree_for_each_slot() wants
to be under RCU protection.  But I am not seeing anything that requires
all iterations of the loop to be under the same RCU read-side critical
section.  Maybe something like the following patch?

							Thanx, Paul

------------------------------------------------------------------------

mm: Attempted fix for RCU CPU stall warning

It appears that filemap_map_pages() can stay in a single RCU read-side
critical section for a very long time if given a large area to map.
This could result in RCU CPU stall warnings.  This commit therefore breaks
the read-side critical section into per-iteration critical sections, taking
care to make sure that the radix_tree_for_each_slot() call itself remains
in an RCU read-side critical section, as required.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/mm/filemap.c b/mm/filemap.c
index 14b4642279f1..f78f144fb41f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2055,6 +2055,8 @@ skip:
 next:
 		if (iter.index == vmf->max_pgoff)
 			break;
+		rcu_read_unlock();
+		rcu_read_lock();
 	}
 	rcu_read_unlock();
 }


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
@ 2014-10-27 23:44                   ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-10-27 23:44 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > >>>>>>>>>>> > >>>>> > >>>
> > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > >>>>>>>>> > >>>> > >>
> > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > >>>>>>> > >>> > > critical section might be.
> > >>>>>>> > >>> > > 
> > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > >>>>> > >> > 
> > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > >>>>> > >> > set and nothing else is showing up before/after that.
> > >>> > > Indeed it was directed to you.  ;-)
> > >>> > > 
> > >>> > > Does the following crude diagnostic patch turn up anything?
> > >> > 
> > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > OK, color me confused.  Could you please send me the full dmesg or a
> > > pointer to it?
> > 
> > Attached.
> 
> Thank you!  I would complain about the FAULT_INJECTION messages, but
> they don't appear to be happening all that frequently.
> 
> The stack dumps do look different here.  I suspect that this is a real
> issue in the VM code.

And to that end...  The filemap_map_pages() function does have loop over
a list of pages.  I wonder if the rcu_read_lock() should be moved into
the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
though it looks to me like the current radix_tree_for_each_slot() wants
to be under RCU protection.  But I am not seeing anything that requires
all iterations of the loop to be under the same RCU read-side critical
section.  Maybe something like the following patch?

							Thanx, Paul

------------------------------------------------------------------------

mm: Attempted fix for RCU CPU stall warning

It appears that filemap_map_pages() can stay in a single RCU read-side
critical section for a very long time if given a large area to map.
This could result in RCU CPU stall warnings.  This commit therefore breaks
the read-side critical section into per-iteration critical sections, taking
care to make sure that the radix_tree_for_each_slot() call itself remains
in an RCU read-side critical section, as required.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/mm/filemap.c b/mm/filemap.c
index 14b4642279f1..f78f144fb41f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2055,6 +2055,8 @@ skip:
 next:
 		if (iter.index == vmf->max_pgoff)
 			break;
+		rcu_read_unlock();
+		rcu_read_lock();
 	}
 	rcu_read_unlock();
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-27 21:13               ` Paul E. McKenney
  2014-10-27 23:44                   ` Paul E. McKenney
@ 2014-10-30 23:41                 ` Sasha Levin
  1 sibling, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-10-30 23:41 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun

On 10/27/2014 05:13 PM, Paul E. McKenney wrote:
> Thank you!  I would complain about the FAULT_INJECTION messages, but
> they don't appear to be happening all that frequently.
> 
> The stack dumps do look different here.  I suspect that this is a real
> issue in the VM code.

The stack dumps are pretty much different with every trace.

I just got a "double trace" for the first time, maybe it'll help:

[ 5887.980026] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 5887.980028] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 5887.980029]
[ 5887.980030]
[ 5887.980046] 	0: (210 ticks this GP) idle=f11/140000000000002/0 softirq=18456/18456 last_accelerate: 36b9/86cf, nonlazy_posted: 38271, ..
[ 5887.980054] 	0: (210 ticks this GP) idle=f11/140000000000002/0 softirq=18456/18456 last_accelerate: 36b9/86cf, nonlazy_posted: 38271, ..
[ 5887.980060] 	
[ 5887.980073] 	
[ 5887.980074] (detected by 11, t=20502 jiffies, g=16314, c=16313, q=0)
[ 5887.980079] (detected by 28, t=20502 jiffies, g=-284, c=-285, q=0)
[ 5887.980081] Task dump for CPU 0:
[ 5887.980084] Task dump for CPU 0:
[ 5887.980090] trinity-c175    R
[ 5887.980092] trinity-c175    R
[ 5887.980094]   running task
[ 5887.980116]   running task    12864 17785   9032 0x0008000c
[ 5887.980120] 12864 17785   9032 0x0008000c
[ 5887.980125]  ffffffff940ff458
[ 5887.980127]  ffffffff940ff458
[ 5887.980130]  0000000000000286
[ 5887.980132]  0000000000000286
[ 5887.980134]  ffffffff9d28b638
[ 5887.980137]  ffffffff9d28b638
[ 5887.980139]  ffff88014b88b000
[ 5887.980141]  ffff88014b88b000
[ 5887.980141]
[ 5887.980142]
[ 5887.980146]  ffff88014b88b000
[ 5887.980148]  ffff88014b88b000
[ 5887.980151]  ffffffff9d300b30
[ 5887.980153]  ffffffff9d300b30
[ 5887.980155]  0000000000000000
[ 5887.980157]  0000000000000000
[ 5887.980160]  ffff88013dc476a8
[ 5887.980161]  ffff88013dc476a8
[ 5887.980161]
[ 5887.980163]
[ 5887.980167]  ffffffff9423b078
[ 5887.980169]  ffffffff9423b078
[ 5887.980171]  0000000000000002
[ 5887.980173]  0000000000000002
[ 5887.980175]  0000000000000000
[ 5887.980177]  0000000000000000
[ 5887.980180]  ffff88013dc477c0
[ 5887.980181]  ffff88013dc477c0
[ 5887.980181]
[ 5887.980183]
[ 5887.980185] Call Trace:
[ 5887.980187] Call Trace:
[ 5887.980208] ? __change_page_attr_set_clr (arch/x86/mm/pageattr.c:1109 arch/x86/mm/pageattr.c:1279)
[ 5887.980219] ? __change_page_attr_set_clr (arch/x86/mm/pageattr.c:1109 arch/x86/mm/pageattr.c:1279)
[ 5887.980232] ? check_flags (kernel/locking/lockdep.c:3532)
[ 5887.980241] ? check_flags (kernel/locking/lockdep.c:3532)
[ 5887.980253] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980262] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980269] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:4076)
[ 5887.980278] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:4076)
[ 5887.980294] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:161 kernel/locking/spinlock.c:191)
[ 5887.980303] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:161 kernel/locking/spinlock.c:191)
[ 5887.980312] ? kernel_map_pages (arch/x86/mm/pageattr.c:1801 arch/x86/mm/pageattr.c:1821)
[ 5887.980319] ? kernel_map_pages (arch/x86/mm/pageattr.c:1801 arch/x86/mm/pageattr.c:1821)
[ 5887.980334] ? free_pages_prepare (mm/page_alloc.c:779)
[ 5887.980344] ? free_pages_prepare (mm/page_alloc.c:779)
[ 5887.980351] ? __free_pages_ok (mm/page_alloc.c:787)
[ 5887.980358] ? __free_pages_ok (mm/page_alloc.c:787)
[ 5887.980365] ? free_compound_page (mm/page_alloc.c:364)
[ 5887.980373] ? free_compound_page (mm/page_alloc.c:364)
[ 5887.980380] ? put_compound_page (mm/swap.c:262)
[ 5887.980389] ? put_compound_page (mm/swap.c:262)
[ 5887.980395] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980404] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980411] ? release_pages (mm/swap.c:917)
[ 5887.980419] ? release_pages (mm/swap.c:917)
[ 5887.980430] ? free_pages_and_swap_cache (mm/swap_state.c:274)
[ 5887.980440] ? free_pages_and_swap_cache (mm/swap_state.c:274)
[ 5887.980449] ? tlb_flush_mmu_free (mm/memory.c:254 (discriminator 3))
[ 5887.980458] ? tlb_flush_mmu_free (mm/memory.c:254 (discriminator 3))
[ 5887.980465] ? unmap_page_range (mm/memory.c:1220 mm/memory.c:1263 mm/memory.c:1284 mm/memory.c:1308)
[ 5887.980473] ? unmap_page_range (mm/memory.c:1220 mm/memory.c:1263 mm/memory.c:1284 mm/memory.c:1308)
[ 5887.980481] ? unmap_single_vma (mm/memory.c:1355)
[ 5887.980489] ? unmap_single_vma (mm/memory.c:1355)
[ 5887.980496] ? unmap_vmas (mm/memory.c:1382 (discriminator 3))
[ 5887.980503] ? unmap_vmas (mm/memory.c:1382 (discriminator 3))
[ 5887.980512] ? exit_mmap (mm/mmap.c:2855)
[ 5887.980521] ? exit_mmap (mm/mmap.c:2855)
[ 5887.980530] ? kmem_cache_free (include/trace/events/kmem.h:143 mm/slub.c:2715)
[ 5887.980538] ? kmem_cache_free (include/trace/events/kmem.h:143 mm/slub.c:2715)
[ 5887.980546] ? mmput (kernel/fork.c:674)
[ 5887.980554] ? mmput (kernel/fork.c:674)
[ 5887.980561] ? do_exit (./arch/x86/include/asm/thread_info.h:168 kernel/exit.c:462 kernel/exit.c:747)
[ 5887.980569] ? do_exit (./arch/x86/include/asm/thread_info.h:168 kernel/exit.c:462 kernel/exit.c:747)
[ 5887.980578] ? rcu_start_future_gp (kernel/rcu/tree.c:1364)
[ 5887.980589] ? rcu_start_future_gp (kernel/rcu/tree.c:1364)
[ 5887.980596] ? recalc_sigpending_tsk (include/linux/thread_info.h:71 include/linux/sched.h:2733 kernel/signal.c:137)
[ 5887.980605] ? recalc_sigpending_tsk (include/linux/thread_info.h:71 include/linux/sched.h:2733 kernel/signal.c:137)
[ 5887.980613] ? debug_lockdep_rcu_enabled (kernel/rcu/update.c:136 (discriminator 1))
[ 5887.980621] ? debug_lockdep_rcu_enabled (kernel/rcu/update.c:136 (discriminator 1))
[ 5887.980630] ? do_group_exit (./arch/x86/include/asm/current.h:14 kernel/exit.c:875)
[ 5887.980637] ? do_group_exit (./arch/x86/include/asm/current.h:14 kernel/exit.c:875)
[ 5887.980645] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980651] ? preempt_count_sub (kernel/sched/core.c:2641)
[ 5887.980660] ? get_signal (kernel/signal.c:2358)
[ 5887.980667] ? get_signal (kernel/signal.c:2358)
[ 5887.980676] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2621)
[ 5887.980683] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2621)
[ 5887.980692] ? do_signal (arch/x86/kernel/signal.c:703)
[ 5887.980701] ? do_signal (arch/x86/kernel/signal.c:703)
[ 5887.980707] ? get_parent_ip (kernel/sched/core.c:2585)
[ 5887.980714] ? get_parent_ip (kernel/sched/core.c:2585)
[ 5887.980723] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[ 5887.980731] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[ 5887.980740] ? context_tracking_user_exit (include/linux/vtime.h:89 include/linux/jump_label.h:114 include/trace/events/context_tracking.h:47 kernel/context_tracking.c:140)
[ 5887.980748] ? context_tracking_user_exit (include/linux/vtime.h:89 include/linux/jump_label.h:114 include/trace/events/context_tracking.h:47 kernel/context_tracking.c:140)
[ 5887.980757] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2566)
[ 5887.980764] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2566)
[ 5887.980772] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
[ 5887.980778] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
[ 5887.980787] ? do_notify_resume (arch/x86/kernel/signal.c:756)
[ 5887.980793] ? do_notify_resume (arch/x86/kernel/signal.c:756)
[ 5887.980802] ? int_signal (arch/x86/kernel/entry_64.S:587)
[ 5887.980809] ? int_signal (arch/x86/kernel/entry_64.S:587)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-10-27 23:44                   ` Paul E. McKenney
@ 2014-11-13 23:07                     ` Paul E. McKenney
  -1 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-11-13 23:07 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > > >>>>>>>>>>> > >>>>> > >>>
> > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > > >>>>>>> > >>> > > critical section might be.
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > > >>>>> > >> > 
> > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > > >>>>> > >> > set and nothing else is showing up before/after that.
> > > >>> > > Indeed it was directed to you.  ;-)
> > > >>> > > 
> > > >>> > > Does the following crude diagnostic patch turn up anything?
> > > >> > 
> > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > > OK, color me confused.  Could you please send me the full dmesg or a
> > > > pointer to it?
> > > 
> > > Attached.
> > 
> > Thank you!  I would complain about the FAULT_INJECTION messages, but
> > they don't appear to be happening all that frequently.
> > 
> > The stack dumps do look different here.  I suspect that this is a real
> > issue in the VM code.
> 
> And to that end...  The filemap_map_pages() function does have loop over
> a list of pages.  I wonder if the rcu_read_lock() should be moved into
> the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
> though it looks to me like the current radix_tree_for_each_slot() wants
> to be under RCU protection.  But I am not seeing anything that requires
> all iterations of the loop to be under the same RCU read-side critical
> section.  Maybe something like the following patch?

Just following up, did the patch below help?

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> mm: Attempted fix for RCU CPU stall warning
> 
> It appears that filemap_map_pages() can stay in a single RCU read-side
> critical section for a very long time if given a large area to map.
> This could result in RCU CPU stall warnings.  This commit therefore breaks
> the read-side critical section into per-iteration critical sections, taking
> care to make sure that the radix_tree_for_each_slot() call itself remains
> in an RCU read-side critical section, as required.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 14b4642279f1..f78f144fb41f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2055,6 +2055,8 @@ skip:
>  next:
>  		if (iter.index == vmf->max_pgoff)
>  			break;
> +		rcu_read_unlock();
> +		rcu_read_lock();
>  	}
>  	rcu_read_unlock();
>  }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
@ 2014-11-13 23:07                     ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2014-11-13 23:07 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
> > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
> > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
> > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
> > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
> > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
> > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
> > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
> > > >>>>>>>>>>> > >>>>> > >>>
> > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
> > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
> > > >>>>>>>>> > >>>> > >>
> > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
> > > >>>>>>> > >>> > > critical section might be.
> > > >>>>>>> > >>> > > 
> > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
> > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
> > > >>>>> > >> > 
> > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
> > > >>>>> > >> > set and nothing else is showing up before/after that.
> > > >>> > > Indeed it was directed to you.  ;-)
> > > >>> > > 
> > > >>> > > Does the following crude diagnostic patch turn up anything?
> > > >> > 
> > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
> > > > OK, color me confused.  Could you please send me the full dmesg or a
> > > > pointer to it?
> > > 
> > > Attached.
> > 
> > Thank you!  I would complain about the FAULT_INJECTION messages, but
> > they don't appear to be happening all that frequently.
> > 
> > The stack dumps do look different here.  I suspect that this is a real
> > issue in the VM code.
> 
> And to that end...  The filemap_map_pages() function does have loop over
> a list of pages.  I wonder if the rcu_read_lock() should be moved into
> the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
> though it looks to me like the current radix_tree_for_each_slot() wants
> to be under RCU protection.  But I am not seeing anything that requires
> all iterations of the loop to be under the same RCU read-side critical
> section.  Maybe something like the following patch?

Just following up, did the patch below help?

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> mm: Attempted fix for RCU CPU stall warning
> 
> It appears that filemap_map_pages() can stay in a single RCU read-side
> critical section for a very long time if given a large area to map.
> This could result in RCU CPU stall warnings.  This commit therefore breaks
> the read-side critical section into per-iteration critical sections, taking
> care to make sure that the radix_tree_for_each_slot() call itself remains
> in an RCU read-side critical section, as required.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 14b4642279f1..f78f144fb41f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2055,6 +2055,8 @@ skip:
>  next:
>  		if (iter.index == vmf->max_pgoff)
>  			break;
> +		rcu_read_unlock();
> +		rcu_read_lock();
>  	}
>  	rcu_read_unlock();
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
  2014-11-13 23:07                     ` Paul E. McKenney
@ 2014-11-13 23:10                       ` Sasha Levin
  -1 siblings, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-11-13 23:10 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> > 
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you.  ;-)
>>>>>>>>> > > > >>> > > 
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up anything?
>>>>>>> > > > >> > 
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused.  Could you please send me the full dmesg or a
>>>>> > > > > pointer to it?
>>>> > > > 
>>>> > > > Attached.
>>> > > 
>>> > > Thank you!  I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > > 
>>> > > The stack dumps do look different here.  I suspect that this is a real
>>> > > issue in the VM code.
>> > 
>> > And to that end...  The filemap_map_pages() function does have loop over
>> > a list of pages.  I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection.  But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section.  Maybe something like the following patch?
> Just following up, did the patch below help?

I'm not seeing any more stalls with filemap in them, but I don see different
traces.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls.
@ 2014-11-13 23:10                       ` Sasha Levin
  0 siblings, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-11-13 23:10 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, htejun, linux-mm

On 11/13/2014 06:07 PM, Paul E. McKenney wrote:
> On Mon, Oct 27, 2014 at 04:44:25PM -0700, Paul E. McKenney wrote:
>> > On Mon, Oct 27, 2014 at 02:13:29PM -0700, Paul E. McKenney wrote:
>>> > > On Fri, Oct 24, 2014 at 12:39:15PM -0400, Sasha Levin wrote:
>>>> > > > On 10/24/2014 12:13 PM, Paul E. McKenney wrote:
>>>>> > > > > On Fri, Oct 24, 2014 at 08:28:40AM -0400, Sasha Levin wrote:
>>>>>>> > > > >> > On 10/23/2014 03:58 PM, Paul E. McKenney wrote:
>>>>>>>>> > > > >>> > > On Thu, Oct 23, 2014 at 02:55:43PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>> > > > >>>>> > >> > On 10/23/2014 02:39 PM, Paul E. McKenney wrote:
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > On Tue, Oct 14, 2014 at 10:35:10PM -0400, Sasha Levin wrote:
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> On 10/13/2014 01:35 PM, Dave Jones wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> oday in "rcu stall while fuzzing" news:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
>>>>>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>> > >>>>> > >>> 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> I've complained about RCU stalls couple days ago (in a different context)
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> on -next. I guess whatever causing them made it into Linus's tree?
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >>
>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > >>>> > >> https://lkml.org/lkml/2014/10/11/64
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > And on that one, I must confess that I don't see where the RCU read-side
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > critical section might be.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > 
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Hmmm...  Maybe someone forgot to put an rcu_read_unlock() somewhere.
>>>>>>>>>>>>>>>>> > > > >>>>>>> > >>> > > Can you reproduce this with CONFIG_PROVE_RCU=y?
>>>>>>>>>>>>> > > > >>>>> > >> > 
>>>>>>>>>>>>> > > > >>>>> > >> > Paul, if that was directed to me - Yes, I see stalls with CONFIG_PROVE_RCU
>>>>>>>>>>>>> > > > >>>>> > >> > set and nothing else is showing up before/after that.
>>>>>>>>> > > > >>> > > Indeed it was directed to you.  ;-)
>>>>>>>>> > > > >>> > > 
>>>>>>>>> > > > >>> > > Does the following crude diagnostic patch turn up anything?
>>>>>>> > > > >> > 
>>>>>>> > > > >> > Nope, seeing stalls but not seeing that pr_err() you added.
>>>>> > > > > OK, color me confused.  Could you please send me the full dmesg or a
>>>>> > > > > pointer to it?
>>>> > > > 
>>>> > > > Attached.
>>> > > 
>>> > > Thank you!  I would complain about the FAULT_INJECTION messages, but
>>> > > they don't appear to be happening all that frequently.
>>> > > 
>>> > > The stack dumps do look different here.  I suspect that this is a real
>>> > > issue in the VM code.
>> > 
>> > And to that end...  The filemap_map_pages() function does have loop over
>> > a list of pages.  I wonder if the rcu_read_lock() should be moved into
>> > the radix_tree_for_each_slot() loop.  CCing linux-mm for their thoughts,
>> > though it looks to me like the current radix_tree_for_each_slot() wants
>> > to be under RCU protection.  But I am not seeing anything that requires
>> > all iterations of the loop to be under the same RCU read-side critical
>> > section.  Maybe something like the following patch?
> Just following up, did the patch below help?

I'm not seeing any more stalls with filemap in them, but I don see different
traces.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  9:17                   ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  9:17 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries, Neeraj Upadhyay
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

On Wed, Sep 1, 2021 at 4:23 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> On 01/09/21, Zhouyi Zhou wrote:
> > Hi,
> >
> > I  perform following two new rounds of experiments:
> >
> >
> > Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> > 8 -hda ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> >
> > 1.   CONFIG_RCU_BOOST=y
> > 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 1.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> >
> > System begin OOM kill after 6 minutes:
> > 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> > stack:    0 pid: 1634 ppid:     2 flag\
> > s:0x00004000
> > Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> > addrconf_verify_work
> > Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> > Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> > Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> > Aug 31 19:41:12 debian kernel: [  847.171921]
> > schedule_preempt_disabled+0x14/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> > Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> > Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> > Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> > Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> > Aug 31 19:41:12 debian kernel: [  847.176007]
> > Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> > in the system:
> > Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> > Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> > (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> > ll_locks+0xe/0x1a0
> >
> > 2.  # CONFIG_RCU_BOOST is not set
> > 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 2.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> > System begin OOM kill after 6 minutes:
> > The system is so dead, that I can't save the backtrace to file nor did
> > kernel has a chance to save the log to /var/log/messages
> >
>
> all,
>
> Thanks for testing on x86. we can also reproduce on qemu arm64. So I
> think it will point out to the stress-ng test itself; I will debug it
> early next week - didnt expect so much support so fast TBH, it took me
> by surprise - and will report then (thanks again)

You are very welcome ;-)
I'm very glad that our effort can be of some help to you,
I've learned a lot from both of you during this process.
Looking forward to seeing your report.

Thanks
Zhouyi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  9:17                   ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  9:17 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries, Neeraj Upadhyay
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

On Wed, Sep 1, 2021 at 4:23 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> On 01/09/21, Zhouyi Zhou wrote:
> > Hi,
> >
> > I  perform following two new rounds of experiments:
> >
> >
> > Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> > 8 -hda ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> >
> > 1.   CONFIG_RCU_BOOST=y
> > 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 1.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> >
> > System begin OOM kill after 6 minutes:
> > 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> > stack:    0 pid: 1634 ppid:     2 flag\
> > s:0x00004000
> > Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> > addrconf_verify_work
> > Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> > Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> > Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> > Aug 31 19:41:12 debian kernel: [  847.171921]
> > schedule_preempt_disabled+0x14/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> > Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> > Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> > Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> > Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> > Aug 31 19:41:12 debian kernel: [  847.176007]
> > Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> > in the system:
> > Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> > Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> > (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> > ll_locks+0xe/0x1a0
> >
> > 2.  # CONFIG_RCU_BOOST is not set
> > 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 2.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> > System begin OOM kill after 6 minutes:
> > The system is so dead, that I can't save the backtrace to file nor did
> > kernel has a chance to save the log to /var/log/messages
> >
>
> all,
>
> Thanks for testing on x86. we can also reproduce on qemu arm64. So I
> think it will point out to the stress-ng test itself; I will debug it
> early next week - didnt expect so much support so fast TBH, it took me
> by surprise - and will report then (thanks again)

You are very welcome ;-)
I'm very glad that our effort can be of some help to you,
I've learned a lot from both of you during this process.
Looking forward to seeing your report.

Thanks
Zhouyi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  8:23               ` Jorge Ramirez-Ortiz, Foundries
  0 siblings, 0 replies; 46+ messages in thread
From: Jorge Ramirez-Ortiz, Foundries @ 2021-09-01  8:23 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Jorge Ramirez-Ortiz, Foundries, paulmck, Josh Triplett, rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Google, rcu,
	soc, linux-arm-kernel

On 01/09/21, Zhouyi Zhou wrote:
> Hi,
> 
> I  perform following two new rounds of experiments:
> 
> 
> Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> 8 -hda ./debian10.qcow2 -m 4096 -net
> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> -vnc :30)
> 
> 1.   CONFIG_RCU_BOOST=y
> 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 1.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> 
> System begin OOM kill after 6 minutes:
> 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> stack:    0 pid: 1634 ppid:     2 flag\
> s:0x00004000
> Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> addrconf_verify_work
> Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> Aug 31 19:41:12 debian kernel: [  847.171921]
> schedule_preempt_disabled+0x14/0x20
> Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> Aug 31 19:41:12 debian kernel: [  847.176007]
> Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> in the system:
> Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> ll_locks+0xe/0x1a0
> 
> 2.  # CONFIG_RCU_BOOST is not set
> 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 2.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> System begin OOM kill after 6 minutes:
> The system is so dead, that I can't save the backtrace to file nor did
> kernel has a chance to save the log to /var/log/messages
> 

all, 

Thanks for testing on x86. we can also reproduce on qemu arm64. So I
think it will point out to the stress-ng test itself; I will debug it
early next week - didnt expect so much support so fast TBH, it took me
by surprise - and will report then (thanks again)





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  8:23               ` Jorge Ramirez-Ortiz, Foundries
  0 siblings, 0 replies; 46+ messages in thread
From: Jorge Ramirez-Ortiz, Foundries @ 2021-09-01  8:23 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Jorge Ramirez-Ortiz, Foundries, paulmck, Josh Triplett, rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Google, rcu,
	soc, linux-arm-kernel

On 01/09/21, Zhouyi Zhou wrote:
> Hi,
> 
> I  perform following two new rounds of experiments:
> 
> 
> Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> 8 -hda ./debian10.qcow2 -m 4096 -net
> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> -vnc :30)
> 
> 1.   CONFIG_RCU_BOOST=y
> 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 1.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> 
> System begin OOM kill after 6 minutes:
> 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> stack:    0 pid: 1634 ppid:     2 flag\
> s:0x00004000
> Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> addrconf_verify_work
> Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> Aug 31 19:41:12 debian kernel: [  847.171921]
> schedule_preempt_disabled+0x14/0x20
> Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> Aug 31 19:41:12 debian kernel: [  847.176007]
> Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> in the system:
> Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> ll_locks+0xe/0x1a0
> 
> 2.  # CONFIG_RCU_BOOST is not set
> 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 2.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> System begin OOM kill after 6 minutes:
> The system is so dead, that I can't save the backtrace to file nor did
> kernel has a chance to save the log to /var/log/messages
> 

all, 

Thanks for testing on x86. we can also reproduce on qemu arm64. So I
think it will point out to the stress-ng test itself; I will debug it
early next week - didnt expect so much support so fast TBH, it took me
by surprise - and will report then (thanks again)





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  6:47                   ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  6:47 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: Jorge Ramirez-Ortiz, Foundries, paulmck, Josh Triplett, rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Google, rcu,
	soc, linux-arm-kernel

Hi Neeraj

Thank you for reviewing the original logs provided by jorge and my new
test logs.

On Wed, Sep 1, 2021 at 12:08 PM Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
>
> Hi,
>
>  >[   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies!
> g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
>  >[   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> time, OOM is now expected behavior.
>
> These logs tells us, rcu_preempt thread is in RUNNING state, however,
> the thread is not scheduled for long. So, can you please check the
> activity (scheduler traces, runqueue) on the CPU where rcu_preempt
> kthread is running (->cpu=1 in this case) , to see why rcu_preempt
> is starved of CPU cycles?

I start the qemu virtual machine with gdb remote debugging
kvm -cpu host -smp 8 -s -S -hda ./debian10.qcow2 -m 8192 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30&
gdb ./vmlinux
gdb)target remote :1234
gdb)break rcu_check_gp_kthread_starvation
gdb) continue

then rerun my second test case (2.  # CONFIG_RCU_BOOST is not set)
the gdb breakpoint does not get a hit!
then I use gdb command info threads, the system seems to struggling
with memory subsystem
(gdb) info threads
  Id   Target Id         Frame
  8    Thread 8 (CPU#7 [running]) check_preemption_disabled (
    what1=0xffffffff82604e07 "__this_cpu_",
    what2=0xffffffff82589772 "add_return") at lib/smp_processor_id.c:14
  7    Thread 7 (CPU#6 [running]) __lock_is_held (read=<optimized out>,
    lock=<optimized out>) at kernel/locking/lockdep.c:5366
  6    Thread 6 (CPU#5 [running]) nfsd_reply_cache_count (
    shrink=0xffff88804d10fd10, sc=0xffffc9000eb4f788) at fs/nfsd/nfscache.c:295
  5    Thread 5 (CPU#4 [running]) preempt_count ()
    at ./arch/x86/include/asm/preempt.h:27
  4    Thread 4 (CPU#3 [running]) rcu_read_lock ()
    at ./include/linux/rcupdate.h:688
  3    Thread 3 (CPU#2 [running]) 0xffffffff812c3b34 in find_next_bit (
    offset=<optimized out>, size=<optimized out>, addr=<optimized out>)
    at ./include/asm-generic/bitops/find.h:36
* 2    Thread 2 (CPU#1 [running]) do_shrink_slab (
    shrinkctl=0xffffc9000031f820, shrinker=0xffff888191c2c510, priority=1)
    at mm/vmscan.c:659
  1    Thread 1 (CPU#0 [running]) check_wait_context (next=<optimized out>,
    curr=<optimized out>) at kernel/locking/lockdep.c:4715
(gdb) bt
#0  do_shrink_slab (shrinkctl=0xffffc9000031f820, shrinker=0xffff888191c2c510,
    priority=1) at mm/vmscan.c:659
#1  0xffffffff812c3c75 in shrink_slab (gfp_mask=17829066, nid=0,
    memcg=0xffff888100322000, priority=1) at mm/vmscan.c:872
#2  0xffffffff812c656f in shrink_node_memcgs (sc=<optimized out>,
    pgdat=<optimized out>) at mm/vmscan.c:2879
#3  shrink_node (pgdat=0xffffc9000031f820, sc=0xffffc9000031f978)
    at mm/vmscan.c:2994
#4  0xffffffff812c6b9d in shrink_zones (sc=<optimized out>,
    zonelist=<optimized out>) at mm/vmscan.c:3197
#5  do_try_to_free_pages (zonelist=0xffff88823fffba00, sc=0xffffc9000031f978)
    at mm/vmscan.c:3252
#6  0xffffffff812c8357 in try_to_free_pages (zonelist=0xffff88823fffba00,
    order=0, gfp_mask=<optimized out>, nodemask=<optimized out>)
    at mm/vmscan.c:3491
#7  0xffffffff81315a9e in __perform_reclaim (ac=<optimized out>,
    ac=<optimized out>, order=<optimized out>, gfp_mask=<optimized out>)
    at mm/page_alloc.c:4607
#8  __alloc_pages_direct_reclaim (did_some_progress=<optimized out>,
    ac=<optimized out>, alloc_flags=<optimized out>, order=<optimized out>,
    gfp_mask=<optimized out>) at mm/page_alloc.c:4628
#9  __alloc_pages_slowpath (gfp_mask=3274784, order=17829066,
    ac=0xffffc9000031faf8) at mm/page_alloc.c:5032
---Type <return> to continue, or q <return> to quit---
#10 0xffffffff81316b53 in __alloc_pages (gfp=17829066, order=0,
    preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>)
    at mm/page_alloc.c:5403
#11 0xffffffff81334f13 in alloc_pages_vma (gfp=17829066, order=0,
    vma=0xffff888107ed4cc0, addr=<optimized out>, node=0,
    hugepage=<optimized out>) at mm/mempolicy.c:2208
#12 0xffffffff8131ee1e in __read_swap_cache_async (entry=...,
    gfp_mask=17829066, vma=0x1 <fixed_percpu_data+1>, addr=0,
    new_page_allocated=0x1 <fixed_percpu_data+1>) at mm/swap_state.c:459
#13 0xffffffff8131f0b8 in read_swap_cache_async (entry=...,
    gfp_mask=<optimized out>, vma=<optimized out>, addr=<optimized out>,
    do_poll=true) at mm/swap_state.c:525
#14 0xffffffff8131f2bd in swap_cluster_readahead (entry=...,
    gfp_mask=17829066, vmf=0x6d3b4) at mm/swap_state.c:661
#15 0xffffffff8131f4c5 in swapin_readahead (entry=..., gfp_mask=2445460752,
    vmf=0x1 <fixed_percpu_data+1>) at mm/swap_state.c:853
#16 0xffffffff812efcf0 in do_swap_page (vmf=0xffffc9000031fe00)
    at mm/memory.c:3551
#17 0xffffffff812f3a7d in handle_pte_fault (vmf=<optimized out>)
    at mm/memory.c:4562
#18 __handle_mm_fault (vma=<optimized out>, address=<optimized out>,
    flags=<optimized out>) at mm/memory.c:4693
#19 0xffffffff812f4633 in handle_mm_fault (vma=0xffff888107ed4cc0,
---Type <return> to continue, or q <return> to quit---
    address=140202217193320, flags=596, regs=0xffffc9000031ff58)
    at mm/memory.c:4791
#20 0xffffffff8110e36e in do_user_addr_fault (regs=0xffffc9000031f820,
    error_code=18446612688810460432, address=1) at arch/x86/mm/fault.c:1390
#21 0xffffffff81c13d69 in handle_page_fault (address=<optimized out>,
    error_code=<optimized out>, regs=<optimized out>)
    at arch/x86/mm/fault.c:1475
#22 exc_page_fault (regs=0xffffc9000031f820, error_code=18446612688810460432)
    at arch/x86/mm/fault.c:1531
#23 0xffffffff81e00ace in asm_exc_page_fault ()
    at ./arch/x86/include/asm/idtentry.h:568
#24 0x0000000000000016 in fixed_percpu_data ()
#25 0x0000000000000000 in ?? ()

So, I guess jorge's test may be more instructive than mine ;-)

>
>
> Thanks
> Neeraj
>
> On 9/1/2021 6:33 AM, Zhouyi Zhou wrote:
> > Hi,
> >
> > I  perform following two new rounds of experiments:
> >
> >
> > Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> > 8 -hda ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> >
> > 1.   CONFIG_RCU_BOOST=y
> > 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 1.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> >
> > System begin OOM kill after 6 minutes:
> > 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> > stack:    0 pid: 1634 ppid:     2 flag\
> > s:0x00004000
> > Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> > addrconf_verify_work
> > Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> > Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> > Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> > Aug 31 19:41:12 debian kernel: [  847.171921]
> > schedule_preempt_disabled+0x14/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> > Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> > Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> > Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> > Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> > Aug 31 19:41:12 debian kernel: [  847.176007]
> > Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> > in the system:
> > Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> > Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> > (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> > ll_locks+0xe/0x1a0
> >
> > 2.  # CONFIG_RCU_BOOST is not set
> > 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 2.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> > System begin OOM kill after 6 minutes:
> > The system is so dead, that I can't save the backtrace to file nor did
> > kernel has a chance to save the log to /var/log/messages
> >
> > Thanks
> > Zhouyi
> >
> > On Wed, Sep 1, 2021 at 1:11 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >>
> >> followings are some of my kernel config options:
> >> CONFIG_PREEMPT=y
> >> CONFIG_PREEMPT_COUNT=y
> >> CONFIG_PREEMPTION=y
> >> CONFIG_PREEMPT_DYNAMIC=y
> >> CONFIG_PREEMPT_RCU=y
> >> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> >> CONFIG_TREE_RCU=y
> >> CONFIG_PREEMPT_RCU=y
> >> CONFIG_RCU_EXPERT=y
> >> CONFIG_SRCU=y
> >> CONFIG_TREE_SRCU=y
> >> CONFIG_TASKS_RCU_GENERIC=y
> >> CONFIG_TASKS_RCU=y
> >> CONFIG_TASKS_RUDE_RCU=y
> >> CONFIG_TASKS_TRACE_RCU=y
> >> CONFIG_RCU_STALL_COMMON=y
> >> CONFIG_RCU_NEED_SEGCBLIST=y
> >> CONFIG_RCU_FANOUT=64
> >> CONFIG_RCU_FANOUT_LEAF=16
> >> # CONFIG_RCU_FAST_NO_HZ is not set
> >> # CONFIG_RCU_BOOST is not set
> >> CONFIG_RCU_NOCB_CPU=y
> >> # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
> >> # end of RCU Subsystem
> >> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> >> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
> >> # RCU Debugging
> >> CONFIG_PROVE_RCU=y
> >> # CONFIG_PROVE_RCU_LIST is not set
> >> # CONFIG_RCU_SCALE_TEST is not set
> >> CONFIG_RCU_TORTURE_TEST=m
> >> # CONFIG_RCU_REF_SCALE_TEST is not set
> >> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> >> # CONFIG_RCU_TRACE is not set
> >> # CONFIG_RCU_EQS_DEBUG is not set
> >>
> >> Thanks
> >> Zhouyi
> >>
> >> On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >>>
> >>> I did an experiment just now on x86_64 virtual machines, rcu did not
> >>> complain after 10 minutes's test, I hope my effort can provide some
> >>> clue.
> >>>
> >>> 1. I clone the fresh new linux kernel (git clone
> >>> https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> >>> 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> >>> 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> >>> ./debian10.qcow2 -m 4096 -net
> >>> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> >>> -vnc :30)
> >>> 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> >>> 5. monitor the system by constantly typing top and dmesg
> >>> 6. after 10 minutes, nothing else happens except that the dmesg report
> >>> following two messages
> >>> [  672.528192] sched: DL replenish lagged too much
> >>> [  751.127790] hrtimer: interrupt took 12143 ns
> >>>
> >>> So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
> >>>
> >>> Zhouyi
> >>>
> >>> On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> >>> <jorge@foundries.io> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> >>>> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> >>>>
> >>>> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>>> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> >>>> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> >>>> +->qsmask 0x0
> >>>> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> >>>> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> >>>> [   62.622359] rcu: RCU grace-period kthread stack dump:
> >>>> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> >>>> [   62.637308] Call trace:
> >>>> [   62.639748]  __switch_to+0x11c/0x190
> >>>> [   62.643319]  __schedule+0x3b8/0x8d8
> >>>> [   62.646796]  schedule+0x4c/0x108
> >>>> [   62.650018]  schedule_timeout+0x1ac/0x358
> >>>> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> >>>> [   62.657933]  kthread+0x14c/0x158
> >>>> [   62.661153]  ret_from_fork+0x10/0x18
> >>>> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> >>>> [   62.689604] Preemption disabled at:
> >>>> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> >>>> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> >>>> [   62.706296] Hardware name: Zynqmp new (DT)
> >>>> [   62.710115] Call trace:
> >>>> [   62.712548]  dump_backtrace+0x0/0x240
> >>>> [   62.716202]  show_stack+0x2c/0x38
> >>>> [   62.719510]  dump_stack+0xcc/0x104
> >>>> [   62.722904]  __schedule_bug+0x78/0xc8
> >>>> [   62.726556]  __schedule+0x70c/0x8d8
> >>>> [   62.730037]  schedule+0x4c/0x108
> >>>> [   62.733259]  do_notify_resume+0x224/0x5d8
> >>>> [   62.737259]  work_pending+0xc/0x2a4
> >>>>
> >>>> The error results in OOM eventually.
> >>>>
> >>>> RCU priority boosting does work around this issue but it seems to me
> >>>> a workaround more than a fix (otherwise boosting would be enabled
> >>>> by CONFIG_PREEMPT for arm64 I guess?).
> >>>>
> >>>> The question is: is this an arm64 bug that should be investigated? or
> >>>> is this some known corner case of running stress-ng that is already
> >>>> understood?
> >>>>
> >>>> thanks
> >>>> Jorge
> >>>>
> >>>>
> >>>>
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
Thanks
Zhouyi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  6:47                   ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  6:47 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: Jorge Ramirez-Ortiz, Foundries, paulmck, Josh Triplett, rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes, Google, rcu,
	soc, linux-arm-kernel

Hi Neeraj

Thank you for reviewing the original logs provided by jorge and my new
test logs.

On Wed, Sep 1, 2021 at 12:08 PM Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
>
> Hi,
>
>  >[   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies!
> g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
>  >[   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> time, OOM is now expected behavior.
>
> These logs tells us, rcu_preempt thread is in RUNNING state, however,
> the thread is not scheduled for long. So, can you please check the
> activity (scheduler traces, runqueue) on the CPU where rcu_preempt
> kthread is running (->cpu=1 in this case) , to see why rcu_preempt
> is starved of CPU cycles?

I start the qemu virtual machine with gdb remote debugging
kvm -cpu host -smp 8 -s -S -hda ./debian10.qcow2 -m 8192 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30&
gdb ./vmlinux
gdb)target remote :1234
gdb)break rcu_check_gp_kthread_starvation
gdb) continue

then rerun my second test case (2.  # CONFIG_RCU_BOOST is not set)
the gdb breakpoint does not get a hit!
then I use gdb command info threads, the system seems to struggling
with memory subsystem
(gdb) info threads
  Id   Target Id         Frame
  8    Thread 8 (CPU#7 [running]) check_preemption_disabled (
    what1=0xffffffff82604e07 "__this_cpu_",
    what2=0xffffffff82589772 "add_return") at lib/smp_processor_id.c:14
  7    Thread 7 (CPU#6 [running]) __lock_is_held (read=<optimized out>,
    lock=<optimized out>) at kernel/locking/lockdep.c:5366
  6    Thread 6 (CPU#5 [running]) nfsd_reply_cache_count (
    shrink=0xffff88804d10fd10, sc=0xffffc9000eb4f788) at fs/nfsd/nfscache.c:295
  5    Thread 5 (CPU#4 [running]) preempt_count ()
    at ./arch/x86/include/asm/preempt.h:27
  4    Thread 4 (CPU#3 [running]) rcu_read_lock ()
    at ./include/linux/rcupdate.h:688
  3    Thread 3 (CPU#2 [running]) 0xffffffff812c3b34 in find_next_bit (
    offset=<optimized out>, size=<optimized out>, addr=<optimized out>)
    at ./include/asm-generic/bitops/find.h:36
* 2    Thread 2 (CPU#1 [running]) do_shrink_slab (
    shrinkctl=0xffffc9000031f820, shrinker=0xffff888191c2c510, priority=1)
    at mm/vmscan.c:659
  1    Thread 1 (CPU#0 [running]) check_wait_context (next=<optimized out>,
    curr=<optimized out>) at kernel/locking/lockdep.c:4715
(gdb) bt
#0  do_shrink_slab (shrinkctl=0xffffc9000031f820, shrinker=0xffff888191c2c510,
    priority=1) at mm/vmscan.c:659
#1  0xffffffff812c3c75 in shrink_slab (gfp_mask=17829066, nid=0,
    memcg=0xffff888100322000, priority=1) at mm/vmscan.c:872
#2  0xffffffff812c656f in shrink_node_memcgs (sc=<optimized out>,
    pgdat=<optimized out>) at mm/vmscan.c:2879
#3  shrink_node (pgdat=0xffffc9000031f820, sc=0xffffc9000031f978)
    at mm/vmscan.c:2994
#4  0xffffffff812c6b9d in shrink_zones (sc=<optimized out>,
    zonelist=<optimized out>) at mm/vmscan.c:3197
#5  do_try_to_free_pages (zonelist=0xffff88823fffba00, sc=0xffffc9000031f978)
    at mm/vmscan.c:3252
#6  0xffffffff812c8357 in try_to_free_pages (zonelist=0xffff88823fffba00,
    order=0, gfp_mask=<optimized out>, nodemask=<optimized out>)
    at mm/vmscan.c:3491
#7  0xffffffff81315a9e in __perform_reclaim (ac=<optimized out>,
    ac=<optimized out>, order=<optimized out>, gfp_mask=<optimized out>)
    at mm/page_alloc.c:4607
#8  __alloc_pages_direct_reclaim (did_some_progress=<optimized out>,
    ac=<optimized out>, alloc_flags=<optimized out>, order=<optimized out>,
    gfp_mask=<optimized out>) at mm/page_alloc.c:4628
#9  __alloc_pages_slowpath (gfp_mask=3274784, order=17829066,
    ac=0xffffc9000031faf8) at mm/page_alloc.c:5032
---Type <return> to continue, or q <return> to quit---
#10 0xffffffff81316b53 in __alloc_pages (gfp=17829066, order=0,
    preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>)
    at mm/page_alloc.c:5403
#11 0xffffffff81334f13 in alloc_pages_vma (gfp=17829066, order=0,
    vma=0xffff888107ed4cc0, addr=<optimized out>, node=0,
    hugepage=<optimized out>) at mm/mempolicy.c:2208
#12 0xffffffff8131ee1e in __read_swap_cache_async (entry=...,
    gfp_mask=17829066, vma=0x1 <fixed_percpu_data+1>, addr=0,
    new_page_allocated=0x1 <fixed_percpu_data+1>) at mm/swap_state.c:459
#13 0xffffffff8131f0b8 in read_swap_cache_async (entry=...,
    gfp_mask=<optimized out>, vma=<optimized out>, addr=<optimized out>,
    do_poll=true) at mm/swap_state.c:525
#14 0xffffffff8131f2bd in swap_cluster_readahead (entry=...,
    gfp_mask=17829066, vmf=0x6d3b4) at mm/swap_state.c:661
#15 0xffffffff8131f4c5 in swapin_readahead (entry=..., gfp_mask=2445460752,
    vmf=0x1 <fixed_percpu_data+1>) at mm/swap_state.c:853
#16 0xffffffff812efcf0 in do_swap_page (vmf=0xffffc9000031fe00)
    at mm/memory.c:3551
#17 0xffffffff812f3a7d in handle_pte_fault (vmf=<optimized out>)
    at mm/memory.c:4562
#18 __handle_mm_fault (vma=<optimized out>, address=<optimized out>,
    flags=<optimized out>) at mm/memory.c:4693
#19 0xffffffff812f4633 in handle_mm_fault (vma=0xffff888107ed4cc0,
---Type <return> to continue, or q <return> to quit---
    address=140202217193320, flags=596, regs=0xffffc9000031ff58)
    at mm/memory.c:4791
#20 0xffffffff8110e36e in do_user_addr_fault (regs=0xffffc9000031f820,
    error_code=18446612688810460432, address=1) at arch/x86/mm/fault.c:1390
#21 0xffffffff81c13d69 in handle_page_fault (address=<optimized out>,
    error_code=<optimized out>, regs=<optimized out>)
    at arch/x86/mm/fault.c:1475
#22 exc_page_fault (regs=0xffffc9000031f820, error_code=18446612688810460432)
    at arch/x86/mm/fault.c:1531
#23 0xffffffff81e00ace in asm_exc_page_fault ()
    at ./arch/x86/include/asm/idtentry.h:568
#24 0x0000000000000016 in fixed_percpu_data ()
#25 0x0000000000000000 in ?? ()

So, I guess jorge's test may be more instructive than mine ;-)

>
>
> Thanks
> Neeraj
>
> On 9/1/2021 6:33 AM, Zhouyi Zhou wrote:
> > Hi,
> >
> > I  perform following two new rounds of experiments:
> >
> >
> > Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> > 8 -hda ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> >
> > 1.   CONFIG_RCU_BOOST=y
> > 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 1.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> >
> > System begin OOM kill after 6 minutes:
> > 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> > stack:    0 pid: 1634 ppid:     2 flag\
> > s:0x00004000
> > Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> > addrconf_verify_work
> > Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> > Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> > Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> > Aug 31 19:41:12 debian kernel: [  847.171921]
> > schedule_preempt_disabled+0x14/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> > Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> > Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> > Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> > Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> > Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> > Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> > Aug 31 19:41:12 debian kernel: [  847.176007]
> > Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> > in the system:
> > Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> > Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> > (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> > ll_locks+0xe/0x1a0
> >
> > 2.  # CONFIG_RCU_BOOST is not set
> > 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> > 2.2 as regular user at the same time, run $stress-ng --sequential 100
> > --class scheduler -t 5m --times
> > System begin OOM kill after 6 minutes:
> > The system is so dead, that I can't save the backtrace to file nor did
> > kernel has a chance to save the log to /var/log/messages
> >
> > Thanks
> > Zhouyi
> >
> > On Wed, Sep 1, 2021 at 1:11 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >>
> >> followings are some of my kernel config options:
> >> CONFIG_PREEMPT=y
> >> CONFIG_PREEMPT_COUNT=y
> >> CONFIG_PREEMPTION=y
> >> CONFIG_PREEMPT_DYNAMIC=y
> >> CONFIG_PREEMPT_RCU=y
> >> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> >> CONFIG_TREE_RCU=y
> >> CONFIG_PREEMPT_RCU=y
> >> CONFIG_RCU_EXPERT=y
> >> CONFIG_SRCU=y
> >> CONFIG_TREE_SRCU=y
> >> CONFIG_TASKS_RCU_GENERIC=y
> >> CONFIG_TASKS_RCU=y
> >> CONFIG_TASKS_RUDE_RCU=y
> >> CONFIG_TASKS_TRACE_RCU=y
> >> CONFIG_RCU_STALL_COMMON=y
> >> CONFIG_RCU_NEED_SEGCBLIST=y
> >> CONFIG_RCU_FANOUT=64
> >> CONFIG_RCU_FANOUT_LEAF=16
> >> # CONFIG_RCU_FAST_NO_HZ is not set
> >> # CONFIG_RCU_BOOST is not set
> >> CONFIG_RCU_NOCB_CPU=y
> >> # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
> >> # end of RCU Subsystem
> >> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> >> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
> >> # RCU Debugging
> >> CONFIG_PROVE_RCU=y
> >> # CONFIG_PROVE_RCU_LIST is not set
> >> # CONFIG_RCU_SCALE_TEST is not set
> >> CONFIG_RCU_TORTURE_TEST=m
> >> # CONFIG_RCU_REF_SCALE_TEST is not set
> >> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> >> # CONFIG_RCU_TRACE is not set
> >> # CONFIG_RCU_EQS_DEBUG is not set
> >>
> >> Thanks
> >> Zhouyi
> >>
> >> On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >>>
> >>> I did an experiment just now on x86_64 virtual machines, rcu did not
> >>> complain after 10 minutes's test, I hope my effort can provide some
> >>> clue.
> >>>
> >>> 1. I clone the fresh new linux kernel (git clone
> >>> https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> >>> 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> >>> 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> >>> ./debian10.qcow2 -m 4096 -net
> >>> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> >>> -vnc :30)
> >>> 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> >>> 5. monitor the system by constantly typing top and dmesg
> >>> 6. after 10 minutes, nothing else happens except that the dmesg report
> >>> following two messages
> >>> [  672.528192] sched: DL replenish lagged too much
> >>> [  751.127790] hrtimer: interrupt took 12143 ns
> >>>
> >>> So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
> >>>
> >>> Zhouyi
> >>>
> >>> On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> >>> <jorge@foundries.io> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> >>>> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> >>>>
> >>>> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>>> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> >>>> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> >>>> +->qsmask 0x0
> >>>> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> >>>> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> >>>> [   62.622359] rcu: RCU grace-period kthread stack dump:
> >>>> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> >>>> [   62.637308] Call trace:
> >>>> [   62.639748]  __switch_to+0x11c/0x190
> >>>> [   62.643319]  __schedule+0x3b8/0x8d8
> >>>> [   62.646796]  schedule+0x4c/0x108
> >>>> [   62.650018]  schedule_timeout+0x1ac/0x358
> >>>> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> >>>> [   62.657933]  kthread+0x14c/0x158
> >>>> [   62.661153]  ret_from_fork+0x10/0x18
> >>>> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> >>>> [   62.689604] Preemption disabled at:
> >>>> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> >>>> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> >>>> [   62.706296] Hardware name: Zynqmp new (DT)
> >>>> [   62.710115] Call trace:
> >>>> [   62.712548]  dump_backtrace+0x0/0x240
> >>>> [   62.716202]  show_stack+0x2c/0x38
> >>>> [   62.719510]  dump_stack+0xcc/0x104
> >>>> [   62.722904]  __schedule_bug+0x78/0xc8
> >>>> [   62.726556]  __schedule+0x70c/0x8d8
> >>>> [   62.730037]  schedule+0x4c/0x108
> >>>> [   62.733259]  do_notify_resume+0x224/0x5d8
> >>>> [   62.737259]  work_pending+0xc/0x2a4
> >>>>
> >>>> The error results in OOM eventually.
> >>>>
> >>>> RCU priority boosting does work around this issue but it seems to me
> >>>> a workaround more than a fix (otherwise boosting would be enabled
> >>>> by CONFIG_PREEMPT for arm64 I guess?).
> >>>>
> >>>> The question is: is this an arm64 bug that should be investigated? or
> >>>> is this some known corner case of running stress-ng that is already
> >>>> understood?
> >>>>
> >>>> thanks
> >>>> Jorge
> >>>>
> >>>>
> >>>>
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
Thanks
Zhouyi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
  2021-09-01  1:03             ` Zhouyi Zhou
  (?)
@ 2021-09-01  4:08             ` Neeraj Upadhyay
  2021-09-01  6:47                   ` Zhouyi Zhou
  -1 siblings, 1 reply; 46+ messages in thread
From: Neeraj Upadhyay @ 2021-09-01  4:08 UTC (permalink / raw)
  To: Zhouyi Zhou, Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

Hi,

 >[   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! 
g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
 >[   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU 
time, OOM is now expected behavior.

These logs tells us, rcu_preempt thread is in RUNNING state, however, 
the thread is not scheduled for long. So, can you please check the
activity (scheduler traces, runqueue) on the CPU where rcu_preempt 
kthread is running (->cpu=1 in this case) , to see why rcu_preempt
is starved of CPU cycles?


Thanks
Neeraj

On 9/1/2021 6:33 AM, Zhouyi Zhou wrote:
> Hi,
> 
> I  perform following two new rounds of experiments:
> 
> 
> Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
> 8 -hda ./debian10.qcow2 -m 4096 -net
> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> -vnc :30)
> 
> 1.   CONFIG_RCU_BOOST=y
> 1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 1.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> 
> System begin OOM kill after 6 minutes:
> 31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
> stack:    0 pid: 1634 ppid:     2 flag\
> s:0x00004000
> Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
> addrconf_verify_work
> Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
> Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
> Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
> Aug 31 19:41:12 debian kernel: [  847.171921]
> schedule_preempt_disabled+0x14/0x20
> Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
> Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
> Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
> Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
> Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
> Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
> Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
> Aug 31 19:41:12 debian kernel: [  847.176007]
> Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
> in the system:
> Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
> Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
> (rcu_read_lock){....}-{1:2}, at: debug_show_a\
> ll_locks+0xe/0x1a0
> 
> 2.  # CONFIG_RCU_BOOST is not set
> 2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
> 2.2 as regular user at the same time, run $stress-ng --sequential 100
> --class scheduler -t 5m --times
> System begin OOM kill after 6 minutes:
> The system is so dead, that I can't save the backtrace to file nor did
> kernel has a chance to save the log to /var/log/messages
> 
> Thanks
> Zhouyi
> 
> On Wed, Sep 1, 2021 at 1:11 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>
>> followings are some of my kernel config options:
>> CONFIG_PREEMPT=y
>> CONFIG_PREEMPT_COUNT=y
>> CONFIG_PREEMPTION=y
>> CONFIG_PREEMPT_DYNAMIC=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_HAVE_PREEMPT_DYNAMIC=y
>> CONFIG_TREE_RCU=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_RCU_EXPERT=y
>> CONFIG_SRCU=y
>> CONFIG_TREE_SRCU=y
>> CONFIG_TASKS_RCU_GENERIC=y
>> CONFIG_TASKS_RCU=y
>> CONFIG_TASKS_RUDE_RCU=y
>> CONFIG_TASKS_TRACE_RCU=y
>> CONFIG_RCU_STALL_COMMON=y
>> CONFIG_RCU_NEED_SEGCBLIST=y
>> CONFIG_RCU_FANOUT=64
>> CONFIG_RCU_FANOUT_LEAF=16
>> # CONFIG_RCU_FAST_NO_HZ is not set
>> # CONFIG_RCU_BOOST is not set
>> CONFIG_RCU_NOCB_CPU=y
>> # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
>> # end of RCU Subsystem
>> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
>> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
>> # RCU Debugging
>> CONFIG_PROVE_RCU=y
>> # CONFIG_PROVE_RCU_LIST is not set
>> # CONFIG_RCU_SCALE_TEST is not set
>> CONFIG_RCU_TORTURE_TEST=m
>> # CONFIG_RCU_REF_SCALE_TEST is not set
>> CONFIG_RCU_CPU_STALL_TIMEOUT=21
>> # CONFIG_RCU_TRACE is not set
>> # CONFIG_RCU_EQS_DEBUG is not set
>>
>> Thanks
>> Zhouyi
>>
>> On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>
>>> I did an experiment just now on x86_64 virtual machines, rcu did not
>>> complain after 10 minutes's test, I hope my effort can provide some
>>> clue.
>>>
>>> 1. I clone the fresh new linux kernel (git clone
>>> https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
>>> 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
>>> 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
>>> ./debian10.qcow2 -m 4096 -net
>>> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
>>> -vnc :30)
>>> 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
>>> 5. monitor the system by constantly typing top and dmesg
>>> 6. after 10 minutes, nothing else happens except that the dmesg report
>>> following two messages
>>> [  672.528192] sched: DL replenish lagged too much
>>> [  751.127790] hrtimer: interrupt took 12143 ns
>>>
>>> So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
>>>
>>> Zhouyi
>>>
>>> On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
>>> <jorge@foundries.io> wrote:
>>>>
>>>> Hi
>>>>
>>>> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
>>>> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>>>>
>>>> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
>>>> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
>>>> +->qsmask 0x0
>>>> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
>>>> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>>>> [   62.622359] rcu: RCU grace-period kthread stack dump:
>>>> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
>>>> [   62.637308] Call trace:
>>>> [   62.639748]  __switch_to+0x11c/0x190
>>>> [   62.643319]  __schedule+0x3b8/0x8d8
>>>> [   62.646796]  schedule+0x4c/0x108
>>>> [   62.650018]  schedule_timeout+0x1ac/0x358
>>>> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
>>>> [   62.657933]  kthread+0x14c/0x158
>>>> [   62.661153]  ret_from_fork+0x10/0x18
>>>> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
>>>> [   62.689604] Preemption disabled at:
>>>> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
>>>> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
>>>> [   62.706296] Hardware name: Zynqmp new (DT)
>>>> [   62.710115] Call trace:
>>>> [   62.712548]  dump_backtrace+0x0/0x240
>>>> [   62.716202]  show_stack+0x2c/0x38
>>>> [   62.719510]  dump_stack+0xcc/0x104
>>>> [   62.722904]  __schedule_bug+0x78/0xc8
>>>> [   62.726556]  __schedule+0x70c/0x8d8
>>>> [   62.730037]  schedule+0x4c/0x108
>>>> [   62.733259]  do_notify_resume+0x224/0x5d8
>>>> [   62.737259]  work_pending+0xc/0x2a4
>>>>
>>>> The error results in OOM eventually.
>>>>
>>>> RCU priority boosting does work around this issue but it seems to me
>>>> a workaround more than a fix (otherwise boosting would be enabled
>>>> by CONFIG_PREEMPT for arm64 I guess?).
>>>>
>>>> The question is: is this an arm64 bug that should be investigated? or
>>>> is this some known corner case of running stress-ng that is already
>>>> understood?
>>>>
>>>> thanks
>>>> Jorge
>>>>
>>>>
>>>>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  1:03             ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  1:03 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

Hi,

I  perform following two new rounds of experiments:


Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
8 -hda ./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)

1.   CONFIG_RCU_BOOST=y
1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
1.2 as regular user at the same time, run $stress-ng --sequential 100
--class scheduler -t 5m --times

System begin OOM kill after 6 minutes:
31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
stack:    0 pid: 1634 ppid:     2 flag\
s:0x00004000
Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
addrconf_verify_work
Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
Aug 31 19:41:12 debian kernel: [  847.171921]
schedule_preempt_disabled+0x14/0x20
Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
Aug 31 19:41:12 debian kernel: [  847.176007]
Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
in the system:
Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
(rcu_read_lock){....}-{1:2}, at: debug_show_a\
ll_locks+0xe/0x1a0

2.  # CONFIG_RCU_BOOST is not set
2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
2.2 as regular user at the same time, run $stress-ng --sequential 100
--class scheduler -t 5m --times
System begin OOM kill after 6 minutes:
The system is so dead, that I can't save the backtrace to file nor did
kernel has a chance to save the log to /var/log/messages

Thanks
Zhouyi

On Wed, Sep 1, 2021 at 1:11 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> followings are some of my kernel config options:
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_TREE_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_EXPERT=y
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU_GENERIC=y
> CONFIG_TASKS_RCU=y
> CONFIG_TASKS_RUDE_RCU=y
> CONFIG_TASKS_TRACE_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> CONFIG_RCU_FANOUT=64
> CONFIG_RCU_FANOUT_LEAF=16
> # CONFIG_RCU_FAST_NO_HZ is not set
> # CONFIG_RCU_BOOST is not set
> CONFIG_RCU_NOCB_CPU=y
> # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
> # end of RCU Subsystem
> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> # CONFIG_PROVE_RCU_LIST is not set
> # CONFIG_RCU_SCALE_TEST is not set
> CONFIG_RCU_TORTURE_TEST=m
> # CONFIG_RCU_REF_SCALE_TEST is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> # CONFIG_RCU_TRACE is not set
> # CONFIG_RCU_EQS_DEBUG is not set
>
> Thanks
> Zhouyi
>
> On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >
> > I did an experiment just now on x86_64 virtual machines, rcu did not
> > complain after 10 minutes's test, I hope my effort can provide some
> > clue.
> >
> > 1. I clone the fresh new linux kernel (git clone
> > https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> > 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> > 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> > ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> > 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> > 5. monitor the system by constantly typing top and dmesg
> > 6. after 10 minutes, nothing else happens except that the dmesg report
> > following two messages
> > [  672.528192] sched: DL replenish lagged too much
> > [  751.127790] hrtimer: interrupt took 12143 ns
> >
> > So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
> >
> > Zhouyi
> >
> > On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> > <jorge@foundries.io> wrote:
> > >
> > > Hi
> > >
> > > When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> > > tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> > >
> > > [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> > > [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> > > +->qsmask 0x0
> > > [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> > > [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > [   62.622359] rcu: RCU grace-period kthread stack dump:
> > > [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> > > [   62.637308] Call trace:
> > > [   62.639748]  __switch_to+0x11c/0x190
> > > [   62.643319]  __schedule+0x3b8/0x8d8
> > > [   62.646796]  schedule+0x4c/0x108
> > > [   62.650018]  schedule_timeout+0x1ac/0x358
> > > [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> > > [   62.657933]  kthread+0x14c/0x158
> > > [   62.661153]  ret_from_fork+0x10/0x18
> > > [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> > > [   62.689604] Preemption disabled at:
> > > [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> > > [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> > > [   62.706296] Hardware name: Zynqmp new (DT)
> > > [   62.710115] Call trace:
> > > [   62.712548]  dump_backtrace+0x0/0x240
> > > [   62.716202]  show_stack+0x2c/0x38
> > > [   62.719510]  dump_stack+0xcc/0x104
> > > [   62.722904]  __schedule_bug+0x78/0xc8
> > > [   62.726556]  __schedule+0x70c/0x8d8
> > > [   62.730037]  schedule+0x4c/0x108
> > > [   62.733259]  do_notify_resume+0x224/0x5d8
> > > [   62.737259]  work_pending+0xc/0x2a4
> > >
> > > The error results in OOM eventually.
> > >
> > > RCU priority boosting does work around this issue but it seems to me
> > > a workaround more than a fix (otherwise boosting would be enabled
> > > by CONFIG_PREEMPT for arm64 I guess?).
> > >
> > > The question is: is this an arm64 bug that should be investigated? or
> > > is this some known corner case of running stress-ng that is already
> > > understood?
> > >
> > > thanks
> > > Jorge
> > >
> > >
> > >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-09-01  1:03             ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-09-01  1:03 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

Hi,

I  perform following two new rounds of experiments:


Test environment (x86_64 debian10 virtual machine: kvm -cpu host -smp
8 -hda ./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)

1.   CONFIG_RCU_BOOST=y
1.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
1.2 as regular user at the same time, run $stress-ng --sequential 100
--class scheduler -t 5m --times

System begin OOM kill after 6 minutes:
31 19:41:12 debian kernel: [  847.171884] task:kworker/1:0     state:D
stack:    0 pid: 1634 ppid:     2 flag\
s:0x00004000
Aug 31 19:41:12 debian kernel: [  847.171890] Workqueue: ipv6_addrconf
addrconf_verify_work
Aug 31 19:41:12 debian kernel: [  847.171897] Call Trace:
Aug 31 19:41:12 debian kernel: [  847.171903]  __schedule+0x368/0xa40
Aug 31 19:41:12 debian kernel: [  847.171915]  schedule+0x44/0xe0
Aug 31 19:41:12 debian kernel: [  847.171921]
schedule_preempt_disabled+0x14/0x20
Aug 31 19:41:12 debian kernel: [  847.171924]  __mutex_lock+0x4b1/0xa10
Aug 31 19:41:12 debian kernel: [  847.171935]  ? addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171948]  ? addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171951]  addrconf_verify_work+0xa/0x20
Aug 31 19:41:12 debian kernel: [  847.171955]  process_one_work+0x1fa/0x5b0
Aug 31 19:41:12 debian kernel: [  847.171967]  worker_thread+0x64/0x3d0
Aug 31 19:41:12 debian kernel: [  847.171974]  ? process_one_work+0x5b0/0x5b0
Aug 31 19:41:12 debian kernel: [  847.171978]  kthread+0x131/0x180
Aug 31 19:41:12 debian kernel: [  847.171982]  ? set_kthread_struct+0x40/0x40
Aug 31 19:41:12 debian kernel: [  847.171989]  ret_from_fork+0x1f/0x30
Aug 31 19:41:12 debian kernel: [  847.176007]
Aug 31 19:41:12 debian kernel: [  847.176007] Showing all locks held
in the system:
Aug 31 19:41:12 debian kernel: [  847.176016] 1 lock held by khungtaskd/56:
Aug 31 19:41:12 debian kernel: [  847.176018]  #0: ffffffff82918b60
(rcu_read_lock){....}-{1:2}, at: debug_show_a\
ll_locks+0xe/0x1a0

2.  # CONFIG_RCU_BOOST is not set
2.1 as root, run #stress-ng --sequential 100  --class scheduler -t 5m --times
2.2 as regular user at the same time, run $stress-ng --sequential 100
--class scheduler -t 5m --times
System begin OOM kill after 6 minutes:
The system is so dead, that I can't save the backtrace to file nor did
kernel has a chance to save the log to /var/log/messages

Thanks
Zhouyi

On Wed, Sep 1, 2021 at 1:11 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> followings are some of my kernel config options:
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_TREE_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_EXPERT=y
> CONFIG_SRCU=y
> CONFIG_TREE_SRCU=y
> CONFIG_TASKS_RCU_GENERIC=y
> CONFIG_TASKS_RCU=y
> CONFIG_TASKS_RUDE_RCU=y
> CONFIG_TASKS_TRACE_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_NEED_SEGCBLIST=y
> CONFIG_RCU_FANOUT=64
> CONFIG_RCU_FANOUT_LEAF=16
> # CONFIG_RCU_FAST_NO_HZ is not set
> # CONFIG_RCU_BOOST is not set
> CONFIG_RCU_NOCB_CPU=y
> # CONFIG_TASKS_TRACE_RCU_READ_MB is not set
> # end of RCU Subsystem
> CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
> # RCU Debugging
> CONFIG_PROVE_RCU=y
> # CONFIG_PROVE_RCU_LIST is not set
> # CONFIG_RCU_SCALE_TEST is not set
> CONFIG_RCU_TORTURE_TEST=m
> # CONFIG_RCU_REF_SCALE_TEST is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> # CONFIG_RCU_TRACE is not set
> # CONFIG_RCU_EQS_DEBUG is not set
>
> Thanks
> Zhouyi
>
> On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> >
> > I did an experiment just now on x86_64 virtual machines, rcu did not
> > complain after 10 minutes's test, I hope my effort can provide some
> > clue.
> >
> > 1. I clone the fresh new linux kernel (git clone
> > https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> > 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> > 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> > ./debian10.qcow2 -m 4096 -net
> > user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> > -vnc :30)
> > 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> > 5. monitor the system by constantly typing top and dmesg
> > 6. after 10 minutes, nothing else happens except that the dmesg report
> > following two messages
> > [  672.528192] sched: DL replenish lagged too much
> > [  751.127790] hrtimer: interrupt took 12143 ns
> >
> > So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
> >
> > Zhouyi
> >
> > On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> > <jorge@foundries.io> wrote:
> > >
> > > Hi
> > >
> > > When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> > > tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> > >
> > > [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> > > [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> > > +->qsmask 0x0
> > > [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> > > [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > > [   62.622359] rcu: RCU grace-period kthread stack dump:
> > > [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> > > [   62.637308] Call trace:
> > > [   62.639748]  __switch_to+0x11c/0x190
> > > [   62.643319]  __schedule+0x3b8/0x8d8
> > > [   62.646796]  schedule+0x4c/0x108
> > > [   62.650018]  schedule_timeout+0x1ac/0x358
> > > [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> > > [   62.657933]  kthread+0x14c/0x158
> > > [   62.661153]  ret_from_fork+0x10/0x18
> > > [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> > > [   62.689604] Preemption disabled at:
> > > [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> > > [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> > > [   62.706296] Hardware name: Zynqmp new (DT)
> > > [   62.710115] Call trace:
> > > [   62.712548]  dump_backtrace+0x0/0x240
> > > [   62.716202]  show_stack+0x2c/0x38
> > > [   62.719510]  dump_stack+0xcc/0x104
> > > [   62.722904]  __schedule_bug+0x78/0xc8
> > > [   62.726556]  __schedule+0x70c/0x8d8
> > > [   62.730037]  schedule+0x4c/0x108
> > > [   62.733259]  do_notify_resume+0x224/0x5d8
> > > [   62.737259]  work_pending+0xc/0x2a4
> > >
> > > The error results in OOM eventually.
> > >
> > > RCU priority boosting does work around this issue but it seems to me
> > > a workaround more than a fix (otherwise boosting would be enabled
> > > by CONFIG_PREEMPT for arm64 I guess?).
> > >
> > > The question is: is this an arm64 bug that should be investigated? or
> > > is this some known corner case of running stress-ng that is already
> > > understood?
> > >
> > > thanks
> > > Jorge
> > >
> > >
> > >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-08-31 17:11         ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-08-31 17:11 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

followings are some of my kernel config options:
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_RCU_BOOST is not set
CONFIG_RCU_NOCB_CPU=y
# CONFIG_TASKS_TRACE_RCU_READ_MB is not set
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# RCU Debugging
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_LIST is not set
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set

Thanks
Zhouyi

On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> I did an experiment just now on x86_64 virtual machines, rcu did not
> complain after 10 minutes's test, I hope my effort can provide some
> clue.
>
> 1. I clone the fresh new linux kernel (git clone
> https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> ./debian10.qcow2 -m 4096 -net
> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> -vnc :30)
> 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> 5. monitor the system by constantly typing top and dmesg
> 6. after 10 minutes, nothing else happens except that the dmesg report
> following two messages
> [  672.528192] sched: DL replenish lagged too much
> [  751.127790] hrtimer: interrupt took 12143 ns
>
> So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
>
> Zhouyi
>
> On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> <jorge@foundries.io> wrote:
> >
> > Hi
> >
> > When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> > tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> >
> > [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> > [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> > +->qsmask 0x0
> > [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> > [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > [   62.622359] rcu: RCU grace-period kthread stack dump:
> > [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> > [   62.637308] Call trace:
> > [   62.639748]  __switch_to+0x11c/0x190
> > [   62.643319]  __schedule+0x3b8/0x8d8
> > [   62.646796]  schedule+0x4c/0x108
> > [   62.650018]  schedule_timeout+0x1ac/0x358
> > [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> > [   62.657933]  kthread+0x14c/0x158
> > [   62.661153]  ret_from_fork+0x10/0x18
> > [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> > [   62.689604] Preemption disabled at:
> > [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> > [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> > [   62.706296] Hardware name: Zynqmp new (DT)
> > [   62.710115] Call trace:
> > [   62.712548]  dump_backtrace+0x0/0x240
> > [   62.716202]  show_stack+0x2c/0x38
> > [   62.719510]  dump_stack+0xcc/0x104
> > [   62.722904]  __schedule_bug+0x78/0xc8
> > [   62.726556]  __schedule+0x70c/0x8d8
> > [   62.730037]  schedule+0x4c/0x108
> > [   62.733259]  do_notify_resume+0x224/0x5d8
> > [   62.737259]  work_pending+0xc/0x2a4
> >
> > The error results in OOM eventually.
> >
> > RCU priority boosting does work around this issue but it seems to me
> > a workaround more than a fix (otherwise boosting would be enabled
> > by CONFIG_PREEMPT for arm64 I guess?).
> >
> > The question is: is this an arm64 bug that should be investigated? or
> > is this some known corner case of running stress-ng that is already
> > understood?
> >
> > thanks
> > Jorge
> >
> >
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-08-31 17:11         ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-08-31 17:11 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

followings are some of my kernel config options:
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_RCU_BOOST is not set
CONFIG_RCU_NOCB_CPU=y
# CONFIG_TASKS_TRACE_RCU_READ_MB is not set
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# RCU Debugging
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_LIST is not set
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set

Thanks
Zhouyi

On Wed, Sep 1, 2021 at 1:01 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> I did an experiment just now on x86_64 virtual machines, rcu did not
> complain after 10 minutes's test, I hope my effort can provide some
> clue.
>
> 1. I clone the fresh new linux kernel (git clone
> https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
> 2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
> 3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
> ./debian10.qcow2 -m 4096 -net
> user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
> -vnc :30)
> 4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
> 5. monitor the system by constantly typing top and dmesg
> 6. after 10 minutes, nothing else happens except that the dmesg report
> following two messages
> [  672.528192] sched: DL replenish lagged too much
> [  751.127790] hrtimer: interrupt took 12143 ns
>
> So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines
>
> Zhouyi
>
> On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
> <jorge@foundries.io> wrote:
> >
> > Hi
> >
> > When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> > tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
> >
> > [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> > [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> > +->qsmask 0x0
> > [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> > [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> > [   62.622359] rcu: RCU grace-period kthread stack dump:
> > [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> > [   62.637308] Call trace:
> > [   62.639748]  __switch_to+0x11c/0x190
> > [   62.643319]  __schedule+0x3b8/0x8d8
> > [   62.646796]  schedule+0x4c/0x108
> > [   62.650018]  schedule_timeout+0x1ac/0x358
> > [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> > [   62.657933]  kthread+0x14c/0x158
> > [   62.661153]  ret_from_fork+0x10/0x18
> > [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> > [   62.689604] Preemption disabled at:
> > [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> > [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> > [   62.706296] Hardware name: Zynqmp new (DT)
> > [   62.710115] Call trace:
> > [   62.712548]  dump_backtrace+0x0/0x240
> > [   62.716202]  show_stack+0x2c/0x38
> > [   62.719510]  dump_stack+0xcc/0x104
> > [   62.722904]  __schedule_bug+0x78/0xc8
> > [   62.726556]  __schedule+0x70c/0x8d8
> > [   62.730037]  schedule+0x4c/0x108
> > [   62.733259]  do_notify_resume+0x224/0x5d8
> > [   62.737259]  work_pending+0xc/0x2a4
> >
> > The error results in OOM eventually.
> >
> > RCU priority boosting does work around this issue but it seems to me
> > a workaround more than a fix (otherwise boosting would be enabled
> > by CONFIG_PREEMPT for arm64 I guess?).
> >
> > The question is: is this an arm64 bug that should be investigated? or
> > is this some known corner case of running stress-ng that is already
> > understood?
> >
> > thanks
> > Jorge
> >
> >
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-08-31 17:01     ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-08-31 17:01 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

I did an experiment just now on x86_64 virtual machines, rcu did not
complain after 10 minutes's test, I hope my effort can provide some
clue.

1. I clone the fresh new linux kernel (git clone
https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)
4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
5. monitor the system by constantly typing top and dmesg
6. after 10 minutes, nothing else happens except that the dmesg report
following two messages
[  672.528192] sched: DL replenish lagged too much
[  751.127790] hrtimer: interrupt took 12143 ns

So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines

Zhouyi

On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   62.622359] rcu: RCU grace-period kthread stack dump:
> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> [   62.637308] Call trace:
> [   62.639748]  __switch_to+0x11c/0x190
> [   62.643319]  __schedule+0x3b8/0x8d8
> [   62.646796]  schedule+0x4c/0x108
> [   62.650018]  schedule_timeout+0x1ac/0x358
> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> [   62.657933]  kthread+0x14c/0x158
> [   62.661153]  ret_from_fork+0x10/0x18
> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [   62.689604] Preemption disabled at:
> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [   62.706296] Hardware name: Zynqmp new (DT)
> [   62.710115] Call trace:
> [   62.712548]  dump_backtrace+0x0/0x240
> [   62.716202]  show_stack+0x2c/0x38
> [   62.719510]  dump_stack+0xcc/0x104
> [   62.722904]  __schedule_bug+0x78/0xc8
> [   62.726556]  __schedule+0x70c/0x8d8
> [   62.730037]  schedule+0x4c/0x108
> [   62.733259]  do_notify_resume+0x224/0x5d8
> [   62.737259]  work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).
>
> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?
>
> thanks
> Jorge
>
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-08-31 17:01     ` Zhouyi Zhou
  0 siblings, 0 replies; 46+ messages in thread
From: Zhouyi Zhou @ 2021-08-31 17:01 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: paulmck, Josh Triplett, rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Joel Fernandes, Google, rcu, soc,
	linux-arm-kernel

I did an experiment just now on x86_64 virtual machines, rcu did not
complain after 10 minutes's test, I hope my effort can provide some
clue.

1. I clone the fresh new linux kernel (git clone
https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)
4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
5. monitor the system by constantly typing top and dmesg
6. after 10 minutes, nothing else happens except that the dmesg report
following two messages
[  672.528192] sched: DL replenish lagged too much
[  751.127790] hrtimer: interrupt took 12143 ns

So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines

Zhouyi

On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   62.622359] rcu: RCU grace-period kthread stack dump:
> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> [   62.637308] Call trace:
> [   62.639748]  __switch_to+0x11c/0x190
> [   62.643319]  __schedule+0x3b8/0x8d8
> [   62.646796]  schedule+0x4c/0x108
> [   62.650018]  schedule_timeout+0x1ac/0x358
> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> [   62.657933]  kthread+0x14c/0x158
> [   62.661153]  ret_from_fork+0x10/0x18
> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [   62.689604] Preemption disabled at:
> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [   62.706296] Hardware name: Zynqmp new (DT)
> [   62.710115] Call trace:
> [   62.712548]  dump_backtrace+0x0/0x240
> [   62.716202]  show_stack+0x2c/0x38
> [   62.719510]  dump_stack+0xcc/0x104
> [   62.722904]  __schedule_bug+0x78/0xc8
> [   62.726556]  __schedule+0x70c/0x8d8
> [   62.730037]  schedule+0x4c/0x108
> [   62.733259]  do_notify_resume+0x224/0x5d8
> [   62.737259]  work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).
>
> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?
>
> thanks
> Jorge
>
>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: rcu_preempt detected stalls
@ 2021-08-31 15:53   ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2021-08-31 15:53 UTC (permalink / raw)
  To: Jorge Ramirez-Ortiz, Foundries
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, soc,
	linux-arm-kernel

On Tue, Aug 31, 2021 at 05:21:44PM +0200, Jorge Ramirez-Ortiz, Foundries wrote:
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.

The message above really does mean what it says:  If your workload
prevents RCU's grace-period kthread ("rcu_preempt" in this case) from
running, you just bought yourself an OOM.

> [   62.622359] rcu: RCU grace-period kthread stack dump:
> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> [   62.637308] Call trace:
> [   62.639748]  __switch_to+0x11c/0x190
> [   62.643319]  __schedule+0x3b8/0x8d8
> [   62.646796]  schedule+0x4c/0x108
> [   62.650018]  schedule_timeout+0x1ac/0x358
> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> [   62.657933]  kthread+0x14c/0x158
> [   62.661153]  ret_from_fork+0x10/0x18
> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [   62.689604] Preemption disabled at:
> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [   62.706296] Hardware name: Zynqmp new (DT)
> [   62.710115] Call trace:
> [   62.712548]  dump_backtrace+0x0/0x240
> [   62.716202]  show_stack+0x2c/0x38
> [   62.719510]  dump_stack+0xcc/0x104
> [   62.722904]  __schedule_bug+0x78/0xc8
> [   62.726556]  __schedule+0x70c/0x8d8
> [   62.730037]  schedule+0x4c/0x108
> [   62.733259]  do_notify_resume+0x224/0x5d8
> [   62.737259]  work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).

RCU priority boosting sets the rcu_preempt kthread's scheduling priority
to SCHED_FIFO priority level 1 instead of the normal SCHED_OTHER.
Therefore, if you build with CONFIG_RCU_BOOST=n, but manually set the
priority of rcu_preempt to SCHED_FIFO priority level 1, you might also
see this RCU CPU stall warning go away.

> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?

I have not looked at stress-ng, but it is possible to configure your
system so that rcu_preempt gets little or no CPU time, for example,
by placing it into a CPU-poor cgroup on the one hand or by disabling
throttling and running a heavy real-time workload on the other.
Is stress-ng doing something like this?

There could of course also be an arm64 problem that affect scheduling,
but I suggest looking closely at what stress-ng is doing first.

Please let me know how it goes!

							Thanx, Paul

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* rcu_preempt detected stalls
@ 2021-08-31 15:21 ` Jorge Ramirez-Ortiz, Foundries
  0 siblings, 0 replies; 46+ messages in thread
From: Jorge Ramirez-Ortiz, Foundries @ 2021-08-31 15:21 UTC (permalink / raw)
  To: paulmck
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, soc,
	linux-arm-kernel

Hi

When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.

[   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:                                                              
[   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)                                                                   
[   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root  
+->qsmask 0x0                                                                                                                     
[   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1               
[   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.                        
[   62.622359] rcu: RCU grace-period kthread stack dump:                                                                          
[   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028                  
[   62.637308] Call trace:                                                                                                        
[   62.639748]  __switch_to+0x11c/0x190                                                                                           
[   62.643319]  __schedule+0x3b8/0x8d8                                                                                            
[   62.646796]  schedule+0x4c/0x108                                                                                               
[   62.650018]  schedule_timeout+0x1ac/0x358                                                                                      
[   62.654021]  rcu_gp_kthread+0x6a8/0x12b8                                                                                       
[   62.657933]  kthread+0x14c/0x158                                                                                               
[   62.661153]  ret_from_fork+0x10/0x18                                                                                           
[   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002                                                       
[   62.689604] Preemption disabled at:                                                                                            
[   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58                                                                       
[   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5                                         
[   62.706296] Hardware name: Zynqmp new (DT)                                                                                        
[   62.710115] Call trace:                                                                                                        
[   62.712548]  dump_backtrace+0x0/0x240                                                                                          
[   62.716202]  show_stack+0x2c/0x38                                                                                              
[   62.719510]  dump_stack+0xcc/0x104                                                                                             
[   62.722904]  __schedule_bug+0x78/0xc8                                                                                          
[   62.726556]  __schedule+0x70c/0x8d8                                                                                            
[   62.730037]  schedule+0x4c/0x108                                                                                               
[   62.733259]  do_notify_resume+0x224/0x5d8                                                                                      
[   62.737259]  work_pending+0xc/0x2a4

The error results in OOM eventually.

RCU priority boosting does work around this issue but it seems to me
a workaround more than a fix (otherwise boosting would be enabled
by CONFIG_PREEMPT for arm64 I guess?).

The question is: is this an arm64 bug that should be investigated? or
is this some known corner case of running stress-ng that is already
understood?

thanks
Jorge




^ permalink raw reply	[flat|nested] 46+ messages in thread

* rcu_preempt detected stalls
@ 2021-08-31 15:21 ` Jorge Ramirez-Ortiz, Foundries
  0 siblings, 0 replies; 46+ messages in thread
From: Jorge Ramirez-Ortiz, Foundries @ 2021-08-31 15:21 UTC (permalink / raw)
  To: paulmck
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, soc,
	linux-arm-kernel

Hi

When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.

[   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:                                                              
[   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)                                                                   
[   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root  
+->qsmask 0x0                                                                                                                     
[   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1               
[   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.                        
[   62.622359] rcu: RCU grace-period kthread stack dump:                                                                          
[   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028                  
[   62.637308] Call trace:                                                                                                        
[   62.639748]  __switch_to+0x11c/0x190                                                                                           
[   62.643319]  __schedule+0x3b8/0x8d8                                                                                            
[   62.646796]  schedule+0x4c/0x108                                                                                               
[   62.650018]  schedule_timeout+0x1ac/0x358                                                                                      
[   62.654021]  rcu_gp_kthread+0x6a8/0x12b8                                                                                       
[   62.657933]  kthread+0x14c/0x158                                                                                               
[   62.661153]  ret_from_fork+0x10/0x18                                                                                           
[   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002                                                       
[   62.689604] Preemption disabled at:                                                                                            
[   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58                                                                       
[   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5                                         
[   62.706296] Hardware name: Zynqmp new (DT)                                                                                        
[   62.710115] Call trace:                                                                                                        
[   62.712548]  dump_backtrace+0x0/0x240                                                                                          
[   62.716202]  show_stack+0x2c/0x38                                                                                              
[   62.719510]  dump_stack+0xcc/0x104                                                                                             
[   62.722904]  __schedule_bug+0x78/0xc8                                                                                          
[   62.726556]  __schedule+0x70c/0x8d8                                                                                            
[   62.730037]  schedule+0x4c/0x108                                                                                               
[   62.733259]  do_notify_resume+0x224/0x5d8                                                                                      
[   62.737259]  work_pending+0xc/0x2a4

The error results in OOM eventually.

RCU priority boosting does work around this issue but it seems to me
a workaround more than a fix (otherwise boosting would be enabled
by CONFIG_PREEMPT for arm64 I guess?).

The question is: is this an arm64 bug that should be investigated? or
is this some known corner case of running stress-ng that is already
understood?

thanks
Jorge




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-09-01  9:49 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-13 17:35 rcu_preempt detected stalls Dave Jones
2014-10-15  2:35 ` Sasha Levin
2014-10-23 18:39   ` Paul E. McKenney
2014-10-23 18:55     ` Sasha Levin
2014-10-23 19:58       ` Paul E. McKenney
2014-10-24 12:28         ` Sasha Levin
2014-10-24 16:13           ` Paul E. McKenney
2014-10-24 16:39             ` Sasha Levin
2014-10-27 21:13               ` Paul E. McKenney
2014-10-27 23:44                 ` Paul E. McKenney
2014-10-27 23:44                   ` Paul E. McKenney
2014-11-13 23:07                   ` Paul E. McKenney
2014-11-13 23:07                     ` Paul E. McKenney
2014-11-13 23:10                     ` Sasha Levin
2014-11-13 23:10                       ` Sasha Levin
2014-10-30 23:41                 ` Sasha Levin
2014-10-23 18:32 ` Paul E. McKenney
2014-10-23 18:40   ` Dave Jones
2014-10-23 19:28     ` Paul E. McKenney
2014-10-23 19:37       ` Dave Jones
2014-10-23 19:52         ` Paul E. McKenney
2014-10-23 20:28           ` Dave Jones
2014-10-23 20:44             ` Paul E. McKenney
2014-10-23 19:13   ` Oleg Nesterov
2014-10-23 19:38     ` Paul E. McKenney
2014-10-23 19:53       ` Oleg Nesterov
2014-10-23 20:24         ` Paul E. McKenney
2014-10-23 21:13           ` Oleg Nesterov
2014-10-23 21:38             ` Paul E. McKenney
2014-10-25  3:16 ` Dâniel Fraga
2021-08-31 15:21 Jorge Ramirez-Ortiz, Foundries
2021-08-31 15:21 ` Jorge Ramirez-Ortiz, Foundries
2021-08-31 15:53 ` Paul E. McKenney
2021-08-31 15:53   ` Paul E. McKenney
2021-08-31 17:01 ` Zhouyi Zhou
2021-08-31 17:01   ` Zhouyi Zhou
2021-08-31 17:01     ` Zhouyi Zhou
2021-08-31 17:11     ` Zhouyi Zhou
2021-08-31 17:11       ` Zhouyi Zhou
2021-08-31 17:11         ` Zhouyi Zhou
2021-09-01  1:03         ` Zhouyi Zhou
2021-09-01  1:03           ` Zhouyi Zhou
2021-09-01  1:03             ` Zhouyi Zhou
2021-09-01  4:08             ` Neeraj Upadhyay
2021-09-01  6:47               ` Zhouyi Zhou
2021-09-01  6:47                 ` Zhouyi Zhou
2021-09-01  6:47                   ` Zhouyi Zhou
2021-09-01  8:23             ` Jorge Ramirez-Ortiz, Foundries
2021-09-01  8:23               ` Jorge Ramirez-Ortiz, Foundries
2021-09-01  9:17               ` Zhouyi Zhou
2021-09-01  9:17                 ` Zhouyi Zhou
2021-09-01  9:17                   ` Zhouyi Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.