All of lore.kernel.org
 help / color / mirror / Atom feed
* sched_rt_period_timer causing large latencies
@ 2018-04-04 23:11 Nicholas Piggin
  2018-04-05  6:46 ` Mike Galbraith
  2018-04-05  8:27 ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Nicholas Piggin @ 2018-04-04 23:11 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Peter Zijlstra, Steven Rostedt

Hi,

I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
out of a nohz idle. Looks like it's due to the taking a lot of remote
locks and cache lines. irqoff trace:

latency: 407 us, #608/608, CPU#3 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)

  <idle>-0       3d...    0us : decrementer_common
  <idle>-0       3d...    1us : timer_interrupt <-decrementer_common
  <idle>-0       3d...    2us : irq_enter <-timer_interrupt
  <idle>-0       3d...    2us : rcu_irq_enter <-irq_enter
  <idle>-0       3d...    3us : rcu_nmi_enter <-irq_enter
  <idle>-0       3d...    4us : rcu_dynticks_eqs_exit <-rcu_nmi_enter
  <idle>-0       3d...    4us : __local_bh_disable_ip <-irq_enter
  <idle>-0       3d...    5us : tick_irq_enter <-irq_enter
  <idle>-0       3d...    6us : tick_check_oneshot_broadcast_this_cpu <-tick_irq_enter
  <idle>-0       3d...    6us : ktime_get <-tick_irq_enter
  <idle>-0       3d...    7us : update_ts_time_stats <-tick_irq_enter
  <idle>-0       3d...    8us : nr_iowait_cpu <-update_ts_time_stats
  <idle>-0       3d...    9us : _local_bh_enable <-irq_enter
  <idle>-0       3d...   10us : __local_bh_enable <-irq_enter
  <idle>-0       3d.h.   10us : __timer_interrupt <-timer_interrupt
  <idle>-0       3d.h.   11us : hrtimer_interrupt <-__timer_interrupt
  <idle>-0       3d.h.   12us : _raw_spin_lock_irqsave <-hrtimer_interrupt
  <idle>-0       3d.h.   13us : ktime_get_update_offsets_now <-hrtimer_interrupt
  <idle>-0       3d.h.   13us : __hrtimer_run_queues <-hrtimer_interrupt
  <idle>-0       3d.h.   14us : __remove_hrtimer <-__hrtimer_run_queues
  <idle>-0       3d.h.   15us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
  <idle>-0       3d.h.   16us : tick_sched_timer <-__hrtimer_run_queues
  <idle>-0       3d.h.   16us : ktime_get <-tick_sched_timer
  <idle>-0       3d.h.   17us : tick_sched_do_timer <-tick_sched_timer
  <idle>-0       3d.h.   18us : tick_sched_handle.isra.5 <-tick_sched_timer
  <idle>-0       3d.h.   19us : update_process_times <-tick_sched_handle.isra.5
  <idle>-0       3d.h.   19us : account_process_tick <-update_process_times
  <idle>-0       3d.h.   20us : run_local_timers <-update_process_times
  <idle>-0       3d.h.   21us : hrtimer_run_queues <-run_local_timers
  <idle>-0       3d.h.   21us : raise_softirq <-run_local_timers
  <idle>-0       3d.h.   22us : __raise_softirq_irqoff <-raise_softirq
  <idle>-0       3d.h.   23us : rcu_check_callbacks <-update_process_times
  <idle>-0       3d.h.   24us : rcu_sched_qs <-rcu_check_callbacks
  <idle>-0       3d.h.   25us : rcu_bh_qs <-rcu_check_callbacks
  <idle>-0       3d.h.   25us : rcu_segcblist_ready_cbs <-rcu_check_callbacks
  <idle>-0       3d.h.   26us : cpu_needs_another_gp <-rcu_check_callbacks
  <idle>-0       3d.h.   27us : invoke_rcu_core <-rcu_check_callbacks
  <idle>-0       3d.h.   28us : raise_softirq <-invoke_rcu_core
  <idle>-0       3d.h.   28us : __raise_softirq_irqoff <-raise_softirq
  <idle>-0       3d.h.   29us : scheduler_tick <-update_process_times
  <idle>-0       3d.h.   30us : _raw_spin_lock <-scheduler_tick
  <idle>-0       3d.h.   31us : update_rq_clock <-scheduler_tick
  <idle>-0       3d.h.   31us : task_tick_idle <-scheduler_tick
  <idle>-0       3d.h.   32us : cpu_load_update_active <-scheduler_tick
  <idle>-0       3d.h.   33us : tick_nohz_tick_stopped <-cpu_load_update_active
  <idle>-0       3d.h.   33us : cpu_load_update <-scheduler_tick
  <idle>-0       3d.h.   34us : sched_avg_update <-cpu_load_update
  <idle>-0       3d.h.   35us : calc_global_load_tick <-scheduler_tick
  <idle>-0       3d.h.   36us : trigger_load_balance <-scheduler_tick
  <idle>-0       3d.h.   36us : raise_softirq <-trigger_load_balance
  <idle>-0       3d.h.   37us : __raise_softirq_irqoff <-raise_softirq
  <idle>-0       3d.h.   38us : run_posix_cpu_timers <-update_process_times
  <idle>-0       3d.h.   39us : profile_tick <-tick_sched_handle.isra.5
  <idle>-0       3d.h.   39us : hrtimer_forward <-tick_sched_timer
  <idle>-0       3d.h.   40us : _raw_spin_lock_irq <-__hrtimer_run_queues
  <idle>-0       3d.h.   41us : enqueue_hrtimer <-__hrtimer_run_queues
  <idle>-0       3d.h.   42us : __remove_hrtimer <-__hrtimer_run_queues
  <idle>-0       3d.h.   42us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
  <idle>-0       3d.h.   44us : sched_rt_period_timer <-__hrtimer_run_queues
  <idle>-0       3d.h.   44us : _raw_spin_lock <-sched_rt_period_timer
  <idle>-0       3d.h.   45us : ktime_get <-sched_rt_period_timer
  <idle>-0       3d.h.   46us : hrtimer_forward <-sched_rt_period_timer
  <idle>-0       3d.h.   47us : _raw_spin_lock <-sched_rt_period_timer
  <idle>-0       3d.h.   48us : _raw_spin_lock <-sched_rt_period_timer
  ... 527 more locks snipped ...
  <idle>-0       3d.h.  403us : ktime_get <-sched_rt_period_timer
  <idle>-0       3d.h.  403us : hrtimer_forward <-sched_rt_period_timer
  <idle>-0       3d.h.  404us : _raw_spin_lock_irq <-__hrtimer_run_queues
  <idle>-0       3d.h.  404us : __hrtimer_get_next_event <-hrtimer_interrupt
  <idle>-0       3d.h.  404us : __hrtimer_next_event_base <-__hrtimer_get_next_event
  <idle>-0       3d.h.  405us : __hrtimer_next_event_base <-hrtimer_interrupt
  <idle>-0       3d.h.  405us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
  <idle>-0       3d.h.  405us : tick_program_event <-hrtimer_interrupt
  <idle>-0       3d.h.  406us : clockevents_program_event <-tick_program_event
  <idle>-0       3d.h.  406us : ktime_get <-clockevents_program_event
  <idle>-0       3d.h.  406us : decrementer_set_next_event <-clockevents_program_event
  <idle>-0       3d.h.  407us : irq_exit <-timer_interrupt
  <idle>-0       3d...  407us : __do_softirq <-irq_exit
  <idle>-0       3d...  407us : __local_bh_disable_ip <-__do_softirq
  <idle>-0       3d.s.  408us : __do_softirq
  <idle>-0       3d.s.  408us : trace_hardirqs_on <-__do_softirq
  <idle>-0       3d.s.  411us : <stack trace>

This is only a 2 socket system with 176 CPUs, so not huge or extreme
remote latency. I suspect this contributes to significantly higher
occasional latency spikes when it hits some runqueue lock contention.

Any way this can be improved? Any config options or settings should be
changed?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-04 23:11 sched_rt_period_timer causing large latencies Nicholas Piggin
@ 2018-04-05  6:46 ` Mike Galbraith
  2018-04-05  7:44   ` Nicholas Piggin
  2018-04-05  8:27 ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2018-04-05  6:46 UTC (permalink / raw)
  To: Nicholas Piggin, Linux Kernel Mailing List; +Cc: Peter Zijlstra, Steven Rostedt

On Thu, 2018-04-05 at 09:11 +1000, Nicholas Piggin wrote:
> Hi,
> 
> I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> out of a nohz idle. Looks like it's due to the taking a lot of remote
> locks and cache lines. irqoff trace:
> 
> latency: 407 us, #608/608, CPU#3 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)
> 
>   <idle>-0       3d...    0us : decrementer_common
>   <idle>-0       3d...    1us : timer_interrupt <-decrementer_common
>   <idle>-0       3d...    2us : irq_enter <-timer_interrupt
>   <idle>-0       3d...    2us : rcu_irq_enter <-irq_enter
>   <idle>-0       3d...    3us : rcu_nmi_enter <-irq_enter
>   <idle>-0       3d...    4us : rcu_dynticks_eqs_exit <-rcu_nmi_enter
>   <idle>-0       3d...    4us : __local_bh_disable_ip <-irq_enter
>   <idle>-0       3d...    5us : tick_irq_enter <-irq_enter
>   <idle>-0       3d...    6us : tick_check_oneshot_broadcast_this_cpu <-tick_irq_enter
>   <idle>-0       3d...    6us : ktime_get <-tick_irq_enter
>   <idle>-0       3d...    7us : update_ts_time_stats <-tick_irq_enter
>   <idle>-0       3d...    8us : nr_iowait_cpu <-update_ts_time_stats
>   <idle>-0       3d...    9us : _local_bh_enable <-irq_enter
>   <idle>-0       3d...   10us : __local_bh_enable <-irq_enter
>   <idle>-0       3d.h.   10us : __timer_interrupt <-timer_interrupt
>   <idle>-0       3d.h.   11us : hrtimer_interrupt <-__timer_interrupt
>   <idle>-0       3d.h.   12us : _raw_spin_lock_irqsave <-hrtimer_interrupt
>   <idle>-0       3d.h.   13us : ktime_get_update_offsets_now <-hrtimer_interrupt
>   <idle>-0       3d.h.   13us : __hrtimer_run_queues <-hrtimer_interrupt
>   <idle>-0       3d.h.   14us : __remove_hrtimer <-__hrtimer_run_queues
>   <idle>-0       3d.h.   15us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
>   <idle>-0       3d.h.   16us : tick_sched_timer <-__hrtimer_run_queues
>   <idle>-0       3d.h.   16us : ktime_get <-tick_sched_timer
>   <idle>-0       3d.h.   17us : tick_sched_do_timer <-tick_sched_timer
>   <idle>-0       3d.h.   18us : tick_sched_handle.isra.5 <-tick_sched_timer
>   <idle>-0       3d.h.   19us : update_process_times <-tick_sched_handle.isra.5
>   <idle>-0       3d.h.   19us : account_process_tick <-update_process_times
>   <idle>-0       3d.h.   20us : run_local_timers <-update_process_times
>   <idle>-0       3d.h.   21us : hrtimer_run_queues <-run_local_timers
>   <idle>-0       3d.h.   21us : raise_softirq <-run_local_timers
>   <idle>-0       3d.h.   22us : __raise_softirq_irqoff <-raise_softirq
>   <idle>-0       3d.h.   23us : rcu_check_callbacks <-update_process_times
>   <idle>-0       3d.h.   24us : rcu_sched_qs <-rcu_check_callbacks
>   <idle>-0       3d.h.   25us : rcu_bh_qs <-rcu_check_callbacks
>   <idle>-0       3d.h.   25us : rcu_segcblist_ready_cbs <-rcu_check_callbacks
>   <idle>-0       3d.h.   26us : cpu_needs_another_gp <-rcu_check_callbacks
>   <idle>-0       3d.h.   27us : invoke_rcu_core <-rcu_check_callbacks
>   <idle>-0       3d.h.   28us : raise_softirq <-invoke_rcu_core
>   <idle>-0       3d.h.   28us : __raise_softirq_irqoff <-raise_softirq
>   <idle>-0       3d.h.   29us : scheduler_tick <-update_process_times
>   <idle>-0       3d.h.   30us : _raw_spin_lock <-scheduler_tick
>   <idle>-0       3d.h.   31us : update_rq_clock <-scheduler_tick
>   <idle>-0       3d.h.   31us : task_tick_idle <-scheduler_tick
>   <idle>-0       3d.h.   32us : cpu_load_update_active <-scheduler_tick
>   <idle>-0       3d.h.   33us : tick_nohz_tick_stopped <-cpu_load_update_active
>   <idle>-0       3d.h.   33us : cpu_load_update <-scheduler_tick
>   <idle>-0       3d.h.   34us : sched_avg_update <-cpu_load_update
>   <idle>-0       3d.h.   35us : calc_global_load_tick <-scheduler_tick
>   <idle>-0       3d.h.   36us : trigger_load_balance <-scheduler_tick
>   <idle>-0       3d.h.   36us : raise_softirq <-trigger_load_balance
>   <idle>-0       3d.h.   37us : __raise_softirq_irqoff <-raise_softirq
>   <idle>-0       3d.h.   38us : run_posix_cpu_timers <-update_process_times
>   <idle>-0       3d.h.   39us : profile_tick <-tick_sched_handle.isra.5
>   <idle>-0       3d.h.   39us : hrtimer_forward <-tick_sched_timer
>   <idle>-0       3d.h.   40us : _raw_spin_lock_irq <-__hrtimer_run_queues
>   <idle>-0       3d.h.   41us : enqueue_hrtimer <-__hrtimer_run_queues
>   <idle>-0       3d.h.   42us : __remove_hrtimer <-__hrtimer_run_queues
>   <idle>-0       3d.h.   42us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
>   <idle>-0       3d.h.   44us : sched_rt_period_timer <-__hrtimer_run_queues
>   <idle>-0       3d.h.   44us : _raw_spin_lock <-sched_rt_period_timer
>   <idle>-0       3d.h.   45us : ktime_get <-sched_rt_period_timer
>   <idle>-0       3d.h.   46us : hrtimer_forward <-sched_rt_period_timer
>   <idle>-0       3d.h.   47us : _raw_spin_lock <-sched_rt_period_timer
>   <idle>-0       3d.h.   48us : _raw_spin_lock <-sched_rt_period_timer
>   ... 527 more locks snipped ...
>   <idle>-0       3d.h.  403us : ktime_get <-sched_rt_period_timer
>   <idle>-0       3d.h.  403us : hrtimer_forward <-sched_rt_period_timer
>   <idle>-0       3d.h.  404us : _raw_spin_lock_irq <-__hrtimer_run_queues
>   <idle>-0       3d.h.  404us : __hrtimer_get_next_event <-hrtimer_interrupt
>   <idle>-0       3d.h.  404us : __hrtimer_next_event_base <-__hrtimer_get_next_event
>   <idle>-0       3d.h.  405us : __hrtimer_next_event_base <-hrtimer_interrupt
>   <idle>-0       3d.h.  405us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
>   <idle>-0       3d.h.  405us : tick_program_event <-hrtimer_interrupt
>   <idle>-0       3d.h.  406us : clockevents_program_event <-tick_program_event
>   <idle>-0       3d.h.  406us : ktime_get <-clockevents_program_event
>   <idle>-0       3d.h.  406us : decrementer_set_next_event <-clockevents_program_event
>   <idle>-0       3d.h.  407us : irq_exit <-timer_interrupt
>   <idle>-0       3d...  407us : __do_softirq <-irq_exit
>   <idle>-0       3d...  407us : __local_bh_disable_ip <-__do_softirq
>   <idle>-0       3d.s.  408us : __do_softirq
>   <idle>-0       3d.s.  408us : trace_hardirqs_on <-__do_softirq
>   <idle>-0       3d.s.  411us : <stack trace>
> 
> This is only a 2 socket system with 176 CPUs, so not huge or extreme
> remote latency. I suspect this contributes to significantly higher
> occasional latency spikes when it hits some runqueue lock contention.
> 
> Any way this can be improved? Any config options or settings should be
> changed?

Settings wise, you probably want skew_tick=1 on the commandline or in
your .config CONFIG_CMDLINE.

My method of dealing with the throttle beast from hell for ~big box RT
is to stomp it flat during boot, as otherwise jitter is awful.

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05  6:46 ` Mike Galbraith
@ 2018-04-05  7:44   ` Nicholas Piggin
  2018-04-05  8:11     ` Mike Galbraith
  0 siblings, 1 reply; 9+ messages in thread
From: Nicholas Piggin @ 2018-04-05  7:44 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Steven Rostedt

On Thu, 05 Apr 2018 08:46:57 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Thu, 2018-04-05 at 09:11 +1000, Nicholas Piggin wrote:
> > Hi,
> > 
> > I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> > out of a nohz idle. Looks like it's due to the taking a lot of remote
> > locks and cache lines. irqoff trace:
> > 
> > latency: 407 us, #608/608, CPU#3 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)
> > 
> >   <idle>-0       3d...    0us : decrementer_common
> >   <idle>-0       3d...    1us : timer_interrupt <-decrementer_common
> >   <idle>-0       3d...    2us : irq_enter <-timer_interrupt
> >   <idle>-0       3d...    2us : rcu_irq_enter <-irq_enter
> >   <idle>-0       3d...    3us : rcu_nmi_enter <-irq_enter
> >   <idle>-0       3d...    4us : rcu_dynticks_eqs_exit <-rcu_nmi_enter
> >   <idle>-0       3d...    4us : __local_bh_disable_ip <-irq_enter
> >   <idle>-0       3d...    5us : tick_irq_enter <-irq_enter
> >   <idle>-0       3d...    6us : tick_check_oneshot_broadcast_this_cpu <-tick_irq_enter
> >   <idle>-0       3d...    6us : ktime_get <-tick_irq_enter
> >   <idle>-0       3d...    7us : update_ts_time_stats <-tick_irq_enter
> >   <idle>-0       3d...    8us : nr_iowait_cpu <-update_ts_time_stats
> >   <idle>-0       3d...    9us : _local_bh_enable <-irq_enter
> >   <idle>-0       3d...   10us : __local_bh_enable <-irq_enter
> >   <idle>-0       3d.h.   10us : __timer_interrupt <-timer_interrupt
> >   <idle>-0       3d.h.   11us : hrtimer_interrupt <-__timer_interrupt
> >   <idle>-0       3d.h.   12us : _raw_spin_lock_irqsave <-hrtimer_interrupt
> >   <idle>-0       3d.h.   13us : ktime_get_update_offsets_now <-hrtimer_interrupt
> >   <idle>-0       3d.h.   13us : __hrtimer_run_queues <-hrtimer_interrupt
> >   <idle>-0       3d.h.   14us : __remove_hrtimer <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   15us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   16us : tick_sched_timer <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   16us : ktime_get <-tick_sched_timer
> >   <idle>-0       3d.h.   17us : tick_sched_do_timer <-tick_sched_timer
> >   <idle>-0       3d.h.   18us : tick_sched_handle.isra.5 <-tick_sched_timer
> >   <idle>-0       3d.h.   19us : update_process_times <-tick_sched_handle.isra.5
> >   <idle>-0       3d.h.   19us : account_process_tick <-update_process_times
> >   <idle>-0       3d.h.   20us : run_local_timers <-update_process_times
> >   <idle>-0       3d.h.   21us : hrtimer_run_queues <-run_local_timers
> >   <idle>-0       3d.h.   21us : raise_softirq <-run_local_timers
> >   <idle>-0       3d.h.   22us : __raise_softirq_irqoff <-raise_softirq
> >   <idle>-0       3d.h.   23us : rcu_check_callbacks <-update_process_times
> >   <idle>-0       3d.h.   24us : rcu_sched_qs <-rcu_check_callbacks
> >   <idle>-0       3d.h.   25us : rcu_bh_qs <-rcu_check_callbacks
> >   <idle>-0       3d.h.   25us : rcu_segcblist_ready_cbs <-rcu_check_callbacks
> >   <idle>-0       3d.h.   26us : cpu_needs_another_gp <-rcu_check_callbacks
> >   <idle>-0       3d.h.   27us : invoke_rcu_core <-rcu_check_callbacks
> >   <idle>-0       3d.h.   28us : raise_softirq <-invoke_rcu_core
> >   <idle>-0       3d.h.   28us : __raise_softirq_irqoff <-raise_softirq
> >   <idle>-0       3d.h.   29us : scheduler_tick <-update_process_times
> >   <idle>-0       3d.h.   30us : _raw_spin_lock <-scheduler_tick
> >   <idle>-0       3d.h.   31us : update_rq_clock <-scheduler_tick
> >   <idle>-0       3d.h.   31us : task_tick_idle <-scheduler_tick
> >   <idle>-0       3d.h.   32us : cpu_load_update_active <-scheduler_tick
> >   <idle>-0       3d.h.   33us : tick_nohz_tick_stopped <-cpu_load_update_active
> >   <idle>-0       3d.h.   33us : cpu_load_update <-scheduler_tick
> >   <idle>-0       3d.h.   34us : sched_avg_update <-cpu_load_update
> >   <idle>-0       3d.h.   35us : calc_global_load_tick <-scheduler_tick
> >   <idle>-0       3d.h.   36us : trigger_load_balance <-scheduler_tick
> >   <idle>-0       3d.h.   36us : raise_softirq <-trigger_load_balance
> >   <idle>-0       3d.h.   37us : __raise_softirq_irqoff <-raise_softirq
> >   <idle>-0       3d.h.   38us : run_posix_cpu_timers <-update_process_times
> >   <idle>-0       3d.h.   39us : profile_tick <-tick_sched_handle.isra.5
> >   <idle>-0       3d.h.   39us : hrtimer_forward <-tick_sched_timer
> >   <idle>-0       3d.h.   40us : _raw_spin_lock_irq <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   41us : enqueue_hrtimer <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   42us : __remove_hrtimer <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   42us : _raw_spin_unlock_irqrestore <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   44us : sched_rt_period_timer <-__hrtimer_run_queues
> >   <idle>-0       3d.h.   44us : _raw_spin_lock <-sched_rt_period_timer
> >   <idle>-0       3d.h.   45us : ktime_get <-sched_rt_period_timer
> >   <idle>-0       3d.h.   46us : hrtimer_forward <-sched_rt_period_timer
> >   <idle>-0       3d.h.   47us : _raw_spin_lock <-sched_rt_period_timer
> >   <idle>-0       3d.h.   48us : _raw_spin_lock <-sched_rt_period_timer
> >   ... 527 more locks snipped ...
> >   <idle>-0       3d.h.  403us : ktime_get <-sched_rt_period_timer
> >   <idle>-0       3d.h.  403us : hrtimer_forward <-sched_rt_period_timer
> >   <idle>-0       3d.h.  404us : _raw_spin_lock_irq <-__hrtimer_run_queues
> >   <idle>-0       3d.h.  404us : __hrtimer_get_next_event <-hrtimer_interrupt
> >   <idle>-0       3d.h.  404us : __hrtimer_next_event_base <-__hrtimer_get_next_event
> >   <idle>-0       3d.h.  405us : __hrtimer_next_event_base <-hrtimer_interrupt
> >   <idle>-0       3d.h.  405us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
> >   <idle>-0       3d.h.  405us : tick_program_event <-hrtimer_interrupt
> >   <idle>-0       3d.h.  406us : clockevents_program_event <-tick_program_event
> >   <idle>-0       3d.h.  406us : ktime_get <-clockevents_program_event
> >   <idle>-0       3d.h.  406us : decrementer_set_next_event <-clockevents_program_event
> >   <idle>-0       3d.h.  407us : irq_exit <-timer_interrupt
> >   <idle>-0       3d...  407us : __do_softirq <-irq_exit
> >   <idle>-0       3d...  407us : __local_bh_disable_ip <-__do_softirq
> >   <idle>-0       3d.s.  408us : __do_softirq
> >   <idle>-0       3d.s.  408us : trace_hardirqs_on <-__do_softirq
> >   <idle>-0       3d.s.  411us : <stack trace>
> > 
> > This is only a 2 socket system with 176 CPUs, so not huge or extreme
> > remote latency. I suspect this contributes to significantly higher
> > occasional latency spikes when it hits some runqueue lock contention.
> > 
> > Any way this can be improved? Any config options or settings should be
> > changed?  
> 
> Settings wise, you probably want skew_tick=1 on the commandline or in
> your .config CONFIG_CMDLINE.

Hey, thanks for the suggestion I'll give it a test.

> My method of dealing with the throttle beast from hell for ~big box RT
> is to stomp it flat during boot, as otherwise jitter is awful.

How do you stomp it flat?

If there could be some basic detection for throttling on a per-CPU
basis before invoking the best from hell, would that be possible to
move this out of normal paths?

Alternatively I think most/all of the RT runtime is coming from
kernel threads, could they be exempted so this doesn't hit? Although
it would be sad to suddenly inflict these latencies on someone who
starts up an RT program :)

I guess this is a lot of CPUs mainly due to SMT4, but everyone's
slowly going up. Intel, ARM, AMD all increasing cores and threads
quite steadily in average servers.

I accept defaults for average desktop/workstation is fine though...
Maybe a CONFIG_ option for it? (Just because some environments are
adverse to adding command line options for some reason)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05  7:44   ` Nicholas Piggin
@ 2018-04-05  8:11     ` Mike Galbraith
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Galbraith @ 2018-04-05  8:11 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Steven Rostedt

On Thu, 2018-04-05 at 17:44 +1000, Nicholas Piggin wrote:
> 
> > My method of dealing with the throttle beast from hell for ~big box RT
> > is to stomp it flat during boot, as otherwise jitter is awful.
> 
> How do you stomp it flat?

With a size 12 boot originally from SGI.  Their extra hairy beasts
tended to grind themselves to fine powder.

---
 Documentation/admin-guide/kernel-parameters.txt |    2 ++
 kernel/sched/core.c                             |   11 +++++++++++
 2 files changed, 13 insertions(+)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2788,6 +2788,8 @@
 	noresume	[SWSUSP] Disables resume and restores original swap
 			space.
 
+	nortsched       [KNL] Disables realtime task bandwidth control.
+
 	no-scroll	[VGA] Disables scrollback.
 			This is required for the Braillex ib80-piezo Braille
 			reader made by F.H. Papenmeier (Germany).
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7091,3 +7091,14 @@ const u32 sched_prio_to_wmult[40] = {
 };
 
 #undef CREATE_TRACE_POINTS
+
+/*
+ * Bootline option to disable sched_rt_runtime.
+ */
+static int __init parse_nortsched(char *arg)
+{
+	sysctl_sched_rt_runtime = -1;
+	return 0;
+}
+early_param("nortsched", parse_nortsched);
+

> If there could be some basic detection for throttling on a per-CPU
> basis before invoking the best from hell, would that be possible to
> move this out of normal paths?

It's one beast mauling the entire box, that's the problem.  I just
consider it a debugging tool, and turn it off (if it kicks in, the RT
game was already over).  RT disables RT group scheduling, so I don't
have to care about that evilness.

> Alternatively I think most/all of the RT runtime is coming from
> kernel threads, could they be exempted so this doesn't hit? Although
> it would be sad to suddenly inflict these latencies on someone who
> starts up an RT program :)

Yup, very sad.  It should probably become percpu.

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-04 23:11 sched_rt_period_timer causing large latencies Nicholas Piggin
  2018-04-05  6:46 ` Mike Galbraith
@ 2018-04-05  8:27 ` Peter Zijlstra
  2018-04-05  8:40   ` Mike Galbraith
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2018-04-05  8:27 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Linux Kernel Mailing List, Steven Rostedt

On Thu, Apr 05, 2018 at 09:11:38AM +1000, Nicholas Piggin wrote:
> Hi,
> 
> I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> out of a nohz idle. Looks like it's due to the taking a lot of remote
> locks and cache lines. irqoff trace:

On RT I think we default RT_RUNTIME_SHARE to false, maybe we should do
the same for mainline. It's always been a dodgy feature and if we ever
figure out how to do FIFO servers this code will be deleted entirely.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05  8:27 ` Peter Zijlstra
@ 2018-04-05  8:40   ` Mike Galbraith
  2018-04-05 10:08     ` Nicholas Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2018-04-05  8:40 UTC (permalink / raw)
  To: Peter Zijlstra, Nicholas Piggin; +Cc: Linux Kernel Mailing List, Steven Rostedt

On Thu, 2018-04-05 at 10:27 +0200, Peter Zijlstra wrote:
> On Thu, Apr 05, 2018 at 09:11:38AM +1000, Nicholas Piggin wrote:
> > Hi,
> > 
> > I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> > out of a nohz idle. Looks like it's due to the taking a lot of remote
> > locks and cache lines. irqoff trace:
> 
> On RT I think we default RT_RUNTIME_SHARE to false, maybe we should do
> the same for mainline.

Probably.  My very first enterprise encounter with the thing was it NOT
saving a box from it's not so clever driver due to that.

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05  8:40   ` Mike Galbraith
@ 2018-04-05 10:08     ` Nicholas Piggin
  2018-04-05 10:44       ` Nicholas Piggin
  2018-04-06 16:23       ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Nicholas Piggin @ 2018-04-05 10:08 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt

On Thu, 05 Apr 2018 10:40:20 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Thu, 2018-04-05 at 10:27 +0200, Peter Zijlstra wrote:
> > On Thu, Apr 05, 2018 at 09:11:38AM +1000, Nicholas Piggin wrote:  
> > > Hi,
> > > 
> > > I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> > > out of a nohz idle. Looks like it's due to the taking a lot of remote
> > > locks and cache lines. irqoff trace:  
> > 
> > On RT I think we default RT_RUNTIME_SHARE to false, maybe we should do
> > the same for mainline.  
> 
> Probably.  My very first enterprise encounter with the thing was it NOT
> saving a box from it's not so clever driver due to that.

Well I would think a simpler per-cpu limiter might actually stand a
better chance of saving you there. Or even something attached to the
softlockup watchdog.

I'm still getting a lot of locks coming from sched_rt_period_timer
with RT_RUNTIME_SHARE false, it's just that it's now down to about
NR_CPUS locks rather than 3*NR_CPUS.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05 10:08     ` Nicholas Piggin
@ 2018-04-05 10:44       ` Nicholas Piggin
  2018-04-06 16:23       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Nicholas Piggin @ 2018-04-05 10:44 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, Linux Kernel Mailing List, Steven Rostedt

On Thu, 5 Apr 2018 20:08:59 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Thu, 05 Apr 2018 10:40:20 +0200
> Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Thu, 2018-04-05 at 10:27 +0200, Peter Zijlstra wrote:  
> > > On Thu, Apr 05, 2018 at 09:11:38AM +1000, Nicholas Piggin wrote:    
> > > > Hi,
> > > > 
> > > > I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> > > > out of a nohz idle. Looks like it's due to the taking a lot of remote
> > > > locks and cache lines. irqoff trace:    
> > > 
> > > On RT I think we default RT_RUNTIME_SHARE to false, maybe we should do
> > > the same for mainline.    
> > 
> > Probably.  My very first enterprise encounter with the thing was it NOT
> > saving a box from it's not so clever driver due to that.  
> 
> Well I would think a simpler per-cpu limiter might actually stand a
> better chance of saving you there. Or even something attached to the
> softlockup watchdog.
> 
> I'm still getting a lot of locks coming from sched_rt_period_timer
> with RT_RUNTIME_SHARE false, it's just that it's now down to about
> NR_CPUS locks rather than 3*NR_CPUS.

Oh yeah, putting -1 into sched_rt_runtime_us looks like it fixes it,
I didn't look at Mike's patch close enough.

If the code stays, it would be nice to be able to default it off at
least.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched_rt_period_timer causing large latencies
  2018-04-05 10:08     ` Nicholas Piggin
  2018-04-05 10:44       ` Nicholas Piggin
@ 2018-04-06 16:23       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2018-04-06 16:23 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Mike Galbraith, Linux Kernel Mailing List, Steven Rostedt

On Thu, Apr 05, 2018 at 08:08:59PM +1000, Nicholas Piggin wrote:
> On Thu, 05 Apr 2018 10:40:20 +0200
> Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Thu, 2018-04-05 at 10:27 +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 05, 2018 at 09:11:38AM +1000, Nicholas Piggin wrote:  
> > > > Hi,
> > > > 
> > > > I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> > > > out of a nohz idle. Looks like it's due to the taking a lot of remote
> > > > locks and cache lines. irqoff trace:  
> > > 
> > > On RT I think we default RT_RUNTIME_SHARE to false, maybe we should do
> > > the same for mainline.  
> > 
> > Probably.  My very first enterprise encounter with the thing was it NOT
> > saving a box from it's not so clever driver due to that.
> 
> Well I would think a simpler per-cpu limiter might actually stand a
> better chance of saving you there. Or even something attached to the
> softlockup watchdog.
> 
> I'm still getting a lot of locks coming from sched_rt_period_timer
> with RT_RUNTIME_SHARE false, it's just that it's now down to about
> NR_CPUS locks rather than 3*NR_CPUS.

Argh, right you are. I had a brief look at what it would take to fix
that, and while it's not completely horrible, it takes a bit of effort.

I'm currently a bit tied up with things, but I'll try not to forget.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-06 16:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04 23:11 sched_rt_period_timer causing large latencies Nicholas Piggin
2018-04-05  6:46 ` Mike Galbraith
2018-04-05  7:44   ` Nicholas Piggin
2018-04-05  8:11     ` Mike Galbraith
2018-04-05  8:27 ` Peter Zijlstra
2018-04-05  8:40   ` Mike Galbraith
2018-04-05 10:08     ` Nicholas Piggin
2018-04-05 10:44       ` Nicholas Piggin
2018-04-06 16:23       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.