All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] hrtimer: avoid retrigger_next_event IPI
@ 2021-04-07 13:53 Marcelo Tosatti
  2021-04-07 19:28   ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-07 13:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal,
	Alex Belits


Setting the realtime clock triggers an IPI to all CPUs to reprogram
hrtimers.

However, only base, boottime and tai clocks have their offsets updated
(and therefore potentially require a reprogram).

If the CPU is a nohz_full one, check if it only has 
monotonic active timers, and in that case update the 
realtime base offsets, skipping the IPI.

This reduces interruptions to nohz_full CPUs.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 743c852e10f2..b42b1a434b22 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -853,6 +853,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
+			 (1U << HRTIMER_BASE_BOOTTIME)|		\
+			 (1U << HRTIMER_BASE_BOOTTIME_SOFT)|	\
+			 (1U << HRTIMER_BASE_TAI)|		\
+			 (1U << HRTIMER_BASE_TAI_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	unsigned int active = 0;
+
+	if (!cpu_base->softirq_activated)
+		active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
+
+	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
+
+	if ((active & CLOCK_SET_BASES) == 0)
+		return false;
+
+	return true;
+}
+
 /*
  * Clock realtime was set
  *
@@ -867,9 +889,41 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (!tick_nohz_full_enabled()) {
+		/* Retrigger the CPU local events everywhere */
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting nohz_full CPUs if possible */
+	preempt_disable();
+	for_each_online_cpu(cpu) {
+		if (tick_nohz_full_cpu(cpu)) {
+			unsigned long flags;
+			struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+			raw_spin_lock_irqsave(&cpu_base->lock, flags);
+			if (need_reprogram_timer(cpu_base))
+				cpumask_set_cpu(cpu, mask);
+			else
+				hrtimer_update_base(cpu_base);
+			raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+		}
+	}
+
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	free_cpumask_var(mask);
 #endif
+set_timerfd:
 	timerfd_clock_was_set();
 }
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-07 13:53 [PATCH] hrtimer: avoid retrigger_next_event IPI Marcelo Tosatti
@ 2021-04-07 19:28   ` kernel test robot
  2021-04-07 22:14 ` Frederic Weisbecker
  2021-04-09 14:15 ` Thomas Gleixner
  2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2021-04-07 19:28 UTC (permalink / raw)
  To: Marcelo Tosatti, Thomas Gleixner
  Cc: kbuild-all, clang-built-linux, linux-kernel, Frederic Weisbecker,
	Peter Xu, Nitesh Narayan Lal, Alex Belits

[-- Attachment #1: Type: text/plain, Size: 6205 bytes --]

Hi Marcelo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/timers/core]
[also build test WARNING on linux/master linus/master v5.12-rc6 next-20210407]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git d4c7c28806616809e3baa0b7cd8c665516b2726d
config: arm64-randconfig-r032-20210407 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project c060945b23a1c54d4b2a053ff4b093a2277b303d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005
        git checkout defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/time/hrtimer.c:120:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_REALTIME]        = HRTIMER_BASE_REALTIME,
                                     ^~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:121:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_MONOTONIC]       = HRTIMER_BASE_MONOTONIC,
                                     ^~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:122:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_BOOTTIME]        = HRTIMER_BASE_BOOTTIME,
                                     ^~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:123:17: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_TAI]             = HRTIMER_BASE_TAI,
                                     ^~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
>> kernel/time/hrtimer.c:944:1: warning: unused label 'set_timerfd' [-Wunused-label]
   set_timerfd:
   ^~~~~~~~~~~~
   kernel/time/hrtimer.c:147:20: warning: unused function 'is_migration_base' [-Wunused-function]
   static inline bool is_migration_base(struct hrtimer_clock_base *base)
                      ^
   kernel/time/hrtimer.c:650:19: warning: unused function 'hrtimer_hres_active' [-Wunused-function]
   static inline int hrtimer_hres_active(void)
                     ^
   kernel/time/hrtimer.c:881:13: warning: unused function 'need_reprogram_timer' [-Wunused-function]
   static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
               ^
   kernel/time/hrtimer.c:1793:20: warning: unused function '__hrtimer_peek_ahead_timers' [-Wunused-function]
   static inline void __hrtimer_peek_ahead_timers(void) { }
                      ^
   9 warnings generated.


vim +/set_timerfd +944 kernel/time/hrtimer.c

   895	
   896	/*
   897	 * Clock realtime was set
   898	 *
   899	 * Change the offset of the realtime clock vs. the monotonic
   900	 * clock.
   901	 *
   902	 * We might have to reprogram the high resolution timer interrupt. On
   903	 * SMP we call the architecture specific code to retrigger _all_ high
   904	 * resolution timer interrupts. On UP we just disable interrupts and
   905	 * call the high resolution interrupt code.
   906	 */
   907	void clock_was_set(void)
   908	{
   909	#ifdef CONFIG_HIGH_RES_TIMERS
   910		cpumask_var_t mask;
   911		int cpu;
   912	
   913		if (!tick_nohz_full_enabled()) {
   914			/* Retrigger the CPU local events everywhere */
   915			on_each_cpu(retrigger_next_event, NULL, 1);
   916			goto set_timerfd;
   917		}
   918	
   919		if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
   920			on_each_cpu(retrigger_next_event, NULL, 1);
   921			goto set_timerfd;
   922		}
   923	
   924		/* Avoid interrupting nohz_full CPUs if possible */
   925		preempt_disable();
   926		for_each_online_cpu(cpu) {
   927			if (tick_nohz_full_cpu(cpu)) {
   928				unsigned long flags;
   929				struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
   930	
   931				raw_spin_lock_irqsave(&cpu_base->lock, flags);
   932				if (need_reprogram_timer(cpu_base))
   933					cpumask_set_cpu(cpu, mask);
   934				else
   935					hrtimer_update_base(cpu_base);
   936				raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
   937			}
   938		}
   939	
   940		smp_call_function_many(mask, retrigger_next_event, NULL, 1);
   941		preempt_enable();
   942		free_cpumask_var(mask);
   943	#endif
 > 944	set_timerfd:
   945		timerfd_clock_was_set();
   946	}
   947	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27570 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
@ 2021-04-07 19:28   ` kernel test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2021-04-07 19:28 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6339 bytes --]

Hi Marcelo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/timers/core]
[also build test WARNING on linux/master linus/master v5.12-rc6 next-20210407]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git d4c7c28806616809e3baa0b7cd8c665516b2726d
config: arm64-randconfig-r032-20210407 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project c060945b23a1c54d4b2a053ff4b093a2277b303d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005
        git checkout defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/time/hrtimer.c:120:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_REALTIME]        = HRTIMER_BASE_REALTIME,
                                     ^~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:121:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_MONOTONIC]       = HRTIMER_BASE_MONOTONIC,
                                     ^~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:122:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_BOOTTIME]        = HRTIMER_BASE_BOOTTIME,
                                     ^~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:123:17: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
           [CLOCK_TAI]             = HRTIMER_BASE_TAI,
                                     ^~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:118:27: note: previous initialization is here
           [0 ... MAX_CLOCKS - 1]  = HRTIMER_MAX_CLOCK_BASES,
                                     ^~~~~~~~~~~~~~~~~~~~~~~
>> kernel/time/hrtimer.c:944:1: warning: unused label 'set_timerfd' [-Wunused-label]
   set_timerfd:
   ^~~~~~~~~~~~
   kernel/time/hrtimer.c:147:20: warning: unused function 'is_migration_base' [-Wunused-function]
   static inline bool is_migration_base(struct hrtimer_clock_base *base)
                      ^
   kernel/time/hrtimer.c:650:19: warning: unused function 'hrtimer_hres_active' [-Wunused-function]
   static inline int hrtimer_hres_active(void)
                     ^
   kernel/time/hrtimer.c:881:13: warning: unused function 'need_reprogram_timer' [-Wunused-function]
   static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
               ^
   kernel/time/hrtimer.c:1793:20: warning: unused function '__hrtimer_peek_ahead_timers' [-Wunused-function]
   static inline void __hrtimer_peek_ahead_timers(void) { }
                      ^
   9 warnings generated.


vim +/set_timerfd +944 kernel/time/hrtimer.c

   895	
   896	/*
   897	 * Clock realtime was set
   898	 *
   899	 * Change the offset of the realtime clock vs. the monotonic
   900	 * clock.
   901	 *
   902	 * We might have to reprogram the high resolution timer interrupt. On
   903	 * SMP we call the architecture specific code to retrigger _all_ high
   904	 * resolution timer interrupts. On UP we just disable interrupts and
   905	 * call the high resolution interrupt code.
   906	 */
   907	void clock_was_set(void)
   908	{
   909	#ifdef CONFIG_HIGH_RES_TIMERS
   910		cpumask_var_t mask;
   911		int cpu;
   912	
   913		if (!tick_nohz_full_enabled()) {
   914			/* Retrigger the CPU local events everywhere */
   915			on_each_cpu(retrigger_next_event, NULL, 1);
   916			goto set_timerfd;
   917		}
   918	
   919		if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
   920			on_each_cpu(retrigger_next_event, NULL, 1);
   921			goto set_timerfd;
   922		}
   923	
   924		/* Avoid interrupting nohz_full CPUs if possible */
   925		preempt_disable();
   926		for_each_online_cpu(cpu) {
   927			if (tick_nohz_full_cpu(cpu)) {
   928				unsigned long flags;
   929				struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
   930	
   931				raw_spin_lock_irqsave(&cpu_base->lock, flags);
   932				if (need_reprogram_timer(cpu_base))
   933					cpumask_set_cpu(cpu, mask);
   934				else
   935					hrtimer_update_base(cpu_base);
   936				raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
   937			}
   938		}
   939	
   940		smp_call_function_many(mask, retrigger_next_event, NULL, 1);
   941		preempt_enable();
   942		free_cpumask_var(mask);
   943	#endif
 > 944	set_timerfd:
   945		timerfd_clock_was_set();
   946	}
   947	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27570 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-07 13:53 [PATCH] hrtimer: avoid retrigger_next_event IPI Marcelo Tosatti
  2021-04-07 19:28   ` kernel test robot
@ 2021-04-07 22:14 ` Frederic Weisbecker
  2021-04-08 12:27   ` Marcelo Tosatti
  2021-04-09 14:15 ` Thomas Gleixner
  2 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2021-04-07 22:14 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Thomas Gleixner, linux-kernel, Peter Xu, Nitesh Narayan Lal, Alex Belits

On Wed, Apr 07, 2021 at 10:53:01AM -0300, Marcelo Tosatti wrote:
> 
> Setting the realtime clock triggers an IPI to all CPUs to reprogram
> hrtimers.
> 
> However, only base, boottime and tai clocks have their offsets updated
> (and therefore potentially require a reprogram).
> 
> If the CPU is a nohz_full one, check if it only has 
> monotonic active timers, and in that case update the 
> realtime base offsets, skipping the IPI.
> 
> This reduces interruptions to nohz_full CPUs.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 743c852e10f2..b42b1a434b22 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -853,6 +853,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>  	tick_program_event(expires, 1);
>  }
>  
> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
> +			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
> +			 (1U << HRTIMER_BASE_BOOTTIME)|		\
> +			 (1U << HRTIMER_BASE_BOOTTIME_SOFT)|	\
> +			 (1U << HRTIMER_BASE_TAI)|		\
> +			 (1U << HRTIMER_BASE_TAI_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	unsigned int active = 0;
> +
> +	if (!cpu_base->softirq_activated)
> +		active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
> +
> +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
> +
> +	if ((active & CLOCK_SET_BASES) == 0)
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * Clock realtime was set
>   *
> @@ -867,9 +889,41 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>  void clock_was_set(void)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
> -	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	cpumask_var_t mask;
> +	int cpu;
> +
> +	if (!tick_nohz_full_enabled()) {
> +		/* Retrigger the CPU local events everywhere */
> +		on_each_cpu(retrigger_next_event, NULL, 1);
> +		goto set_timerfd;
> +	}
> +
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> +		on_each_cpu(retrigger_next_event, NULL, 1);
> +		goto set_timerfd;
> +	}
> +
> +	/* Avoid interrupting nohz_full CPUs if possible */
> +	preempt_disable();
> +	for_each_online_cpu(cpu) {
> +		if (tick_nohz_full_cpu(cpu)) {
> +			unsigned long flags;
> +			struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> +
> +			raw_spin_lock_irqsave(&cpu_base->lock, flags);
> +			if (need_reprogram_timer(cpu_base))
> +				cpumask_set_cpu(cpu, mask);
> +			else
> +				hrtimer_update_base(cpu_base);
> +			raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> +		}

You forgot to add the housekeeping CPUs to the mask.
As for the need_reprogram_timer() trick, I'll rather defer to Thomas review...

Thanks.

> +	}
> +
> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
> +	preempt_enable();
> +	free_cpumask_var(mask);
>  #endif
> +set_timerfd:
>  	timerfd_clock_was_set();
>  }
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-07 22:14 ` Frederic Weisbecker
@ 2021-04-08 12:27   ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-08 12:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, Peter Xu, Nitesh Narayan Lal, Alex Belits

On Thu, Apr 08, 2021 at 12:14:57AM +0200, Frederic Weisbecker wrote:
> On Wed, Apr 07, 2021 at 10:53:01AM -0300, Marcelo Tosatti wrote:
> > 
> > Setting the realtime clock triggers an IPI to all CPUs to reprogram
> > hrtimers.
> > 
> > However, only base, boottime and tai clocks have their offsets updated
> > (and therefore potentially require a reprogram).
> > 
> > If the CPU is a nohz_full one, check if it only has 
> > monotonic active timers, and in that case update the 
> > realtime base offsets, skipping the IPI.
> > 
> > This reduces interruptions to nohz_full CPUs.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> > index 743c852e10f2..b42b1a434b22 100644
> > --- a/kernel/time/hrtimer.c
> > +++ b/kernel/time/hrtimer.c
> > @@ -853,6 +853,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
> >  	tick_program_event(expires, 1);
> >  }
> >  
> > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
> > +			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
> > +			 (1U << HRTIMER_BASE_BOOTTIME)|		\
> > +			 (1U << HRTIMER_BASE_BOOTTIME_SOFT)|	\
> > +			 (1U << HRTIMER_BASE_TAI)|		\
> > +			 (1U << HRTIMER_BASE_TAI_SOFT))
> > +
> > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> > +{
> > +	unsigned int active = 0;
> > +
> > +	if (!cpu_base->softirq_activated)
> > +		active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;

If cpu_base->softirq_activated == 1, should IPI as well.

> > +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
> > +
> > +	if ((active & CLOCK_SET_BASES) == 0)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> >  /*
> >   * Clock realtime was set
> >   *
> > @@ -867,9 +889,41 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
> >  void clock_was_set(void)
> >  {
> >  #ifdef CONFIG_HIGH_RES_TIMERS
> > -	/* Retrigger the CPU local events everywhere */
> > -	on_each_cpu(retrigger_next_event, NULL, 1);
> > +	cpumask_var_t mask;
> > +	int cpu;
> > +
> > +	if (!tick_nohz_full_enabled()) {
> > +		/* Retrigger the CPU local events everywhere */
> > +		on_each_cpu(retrigger_next_event, NULL, 1);
> > +		goto set_timerfd;
> > +	}
> > +
> > +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> > +		on_each_cpu(retrigger_next_event, NULL, 1);
> > +		goto set_timerfd;
> > +	}
> > +
> > +	/* Avoid interrupting nohz_full CPUs if possible */
> > +	preempt_disable();
> > +	for_each_online_cpu(cpu) {
> > +		if (tick_nohz_full_cpu(cpu)) {
> > +			unsigned long flags;
> > +			struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> > +
> > +			raw_spin_lock_irqsave(&cpu_base->lock, flags);
> > +			if (need_reprogram_timer(cpu_base))
> > +				cpumask_set_cpu(cpu, mask);
> > +			else
> > +				hrtimer_update_base(cpu_base);
> > +			raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> > +		}
> 
> You forgot to add the housekeeping CPUs to the mask.

So people are using:

console=tty0 console=ttyS0,115200n8 skew_tick=1 nohz=on rcu_nocbs=8-31 tuned.non_isolcpus=000000ff intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,8-31 systemd.cpu_affinity=0,1,2,3,4,5,6,7 default_hugepagesz=1G hugepagesz=2M hugepages=128 nohz_full=8-31

And using the nohz_full= CPUs (or subsets of nohz_full= CPUs) in two modes:

Either "generic non-isolated applications" 
(with load-balancing enabled for those CPUs), or for 
latency sensitive applications. And switching between the modes.

In this case, it would only be possible to check for
housekeeping CPUs of type MANAGED_IRQ, which would be strange.

> As for the need_reprogram_timer() trick, I'll rather defer to Thomas review...
> 
> Thanks.

Thanks!

> 
> > +	}
> > +
> > +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
> > +	preempt_enable();
> > +	free_cpumask_var(mask);
> >  #endif
> > +set_timerfd:
> >  	timerfd_clock_was_set();
> >  }
> >  
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-07 13:53 [PATCH] hrtimer: avoid retrigger_next_event IPI Marcelo Tosatti
  2021-04-07 19:28   ` kernel test robot
  2021-04-07 22:14 ` Frederic Weisbecker
@ 2021-04-09 14:15 ` Thomas Gleixner
  2021-04-09 16:51   ` Marcelo Tosatti
  2 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-09 14:15 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal,
	Alex Belits

On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote:
> Setting the realtime clock triggers an IPI to all CPUs to reprogram
> hrtimers.
>
> However, only base, boottime and tai clocks have their offsets updated

base clock? And why boottime? Boottime is not affected by a clock
realtime set. It's clock REALTIME and TAI, nothing else.

> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
> +			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
> +			 (1U << HRTIMER_BASE_BOOTTIME)|		\
> +			 (1U << HRTIMER_BASE_BOOTTIME_SOFT)|	\
> +			 (1U << HRTIMER_BASE_TAI)|		\
> +			 (1U << HRTIMER_BASE_TAI_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	unsigned int active = 0;
> +
> +	if (!cpu_base->softirq_activated)
> +		active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
> +
> +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
> +
> +	if ((active & CLOCK_SET_BASES) == 0)
> +		return false;
> +
> +	return true;
> +}

Errm. 

> +	/* Avoid interrupting nohz_full CPUs if possible */
> +	preempt_disable();
> +	for_each_online_cpu(cpu) {
> +		if (tick_nohz_full_cpu(cpu)) {
> +			unsigned long flags;
> +			struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> +
> +			raw_spin_lock_irqsave(&cpu_base->lock, flags);
> +			if (need_reprogram_timer(cpu_base))
> +				cpumask_set_cpu(cpu, mask);
> +			else
> +				hrtimer_update_base(cpu_base);
> +			raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> +		}
> +	}

How is that supposed to be correct?

CPU0                    	CPU1

clock_was_set()                 hrtimer_start(CLOCK_REALTIME)

  if (!active_mask[CPU1] & XXX)
  	continue;
                                active_mask |= REALTIME;

---> fail because that newly started timer is on the old offset.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-09 14:15 ` Thomas Gleixner
@ 2021-04-09 16:51   ` Marcelo Tosatti
  2021-04-10  7:53     ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-09 16:51 UTC (permalink / raw)
  To: Thomas Gleixner, Anna-Maria Behnsen
  Cc: linux-kernel, Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal,
	Alex Belits


+CC Anna-Maria.

On Fri, Apr 09, 2021 at 04:15:13PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote:
> > Setting the realtime clock triggers an IPI to all CPUs to reprogram
> > hrtimers.
> >
> > However, only base, boottime and tai clocks have their offsets updated
> 
> base clock? 

Heh...

> And why boottime? Boottime is not affected by a clock
> realtime set. It's clock REALTIME and TAI, nothing else.

OK!

> > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
> > +			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
> > +			 (1U << HRTIMER_BASE_BOOTTIME)|		\
> > +			 (1U << HRTIMER_BASE_BOOTTIME_SOFT)|	\
> > +			 (1U << HRTIMER_BASE_TAI)|		\
> > +			 (1U << HRTIMER_BASE_TAI_SOFT))
> > +
> > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> > +{
> > +	unsigned int active = 0;
> > +
> > +	if (!cpu_base->softirq_activated)
> > +		active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;

Again, if (cpu_base->softirq_activated), need to IPI (will resend).

> > +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
> > +
> > +	if ((active & CLOCK_SET_BASES) == 0)
> > +		return false;
> > +
> > +	return true;
> > +}
> 
> Errm. 

What?

> > +	/* Avoid interrupting nohz_full CPUs if possible */
> > +	preempt_disable();
> > +	for_each_online_cpu(cpu) {
> > +		if (tick_nohz_full_cpu(cpu)) {
> > +			unsigned long flags;
> > +			struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> > +
> > +			raw_spin_lock_irqsave(&cpu_base->lock, flags);
> > +			if (need_reprogram_timer(cpu_base))
> > +				cpumask_set_cpu(cpu, mask);
> > +			else
> > +				hrtimer_update_base(cpu_base);
> > +			raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> > +		}
> > +	}
> 
> How is that supposed to be correct?
> 
> CPU0                    	CPU1
> 
> clock_was_set()                 hrtimer_start(CLOCK_REALTIME)
> 
>   if (!active_mask[CPU1] & XXX)
>   	continue;
>                                 active_mask |= REALTIME;
> 
> ---> fail because that newly started timer is on the old offset.

CPU0								CPU1


clock_was_set()
							Case-1: CPU-1 grabs base->lock before CPU-0:
							CPU-0 sees active_mask[CPU1] and IPIs.

							base = lock_hrtimer_base(timer, &flags);
        						if (__hrtimer_start_range_ns(timer, tim, ...
                						hrtimer_reprogram(timer, true);

        						unlock_hrtimer_base(timer, &flags);


raw_spin_lock_irqsave(&cpu_base->lock, flags);
if (need_reprogram_timer(cpu_base))
        cpumask_set_cpu(cpu, mask);
else
        hrtimer_update_base(cpu_base);
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);

							Case-2: CPU-1 grabs base->lock after CPU-0:
							CPU-0 will have updated the offsets remotely.

							base = lock_hrtimer_base(timer, &flags);
        						if (__hrtimer_start_range_ns(timer, tim, ...
                						hrtimer_reprogram(timer, true);

        						unlock_hrtimer_base(timer, &flags);


No?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
  2021-04-09 16:51   ` Marcelo Tosatti
@ 2021-04-10  7:53     ` Thomas Gleixner
  2021-04-13 17:04       ` [PATCH v2] " Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-10  7:53 UTC (permalink / raw)
  To: Marcelo Tosatti, Anna-Maria Behnsen
  Cc: linux-kernel, Frederic Weisbecker, Peter Xu, Nitesh Narayan Lal,
	Alex Belits

On Fri, Apr 09 2021 at 13:51, Marcelo Tosatti wrote:
> On Fri, Apr 09, 2021 at 04:15:13PM +0200, Thomas Gleixner wrote:
>> On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote:
>> ---> fail because that newly started timer is on the old offset.
>
> CPU0								CPU1
>
>
> clock_was_set()
> 							Case-1: CPU-1 grabs base->lock before CPU-0:
> 							CPU-0 sees active_mask[CPU1] and IPIs.
>
> 							base = lock_hrtimer_base(timer, &flags);
>         						if (__hrtimer_start_range_ns(timer, tim, ...
>                 						hrtimer_reprogram(timer, true);
>
>         						unlock_hrtimer_base(timer, &flags);
>
>
> raw_spin_lock_irqsave(&cpu_base->lock, flags);
> if (need_reprogram_timer(cpu_base))
>         cpumask_set_cpu(cpu, mask);
> else
>         hrtimer_update_base(cpu_base);
> raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
>
> 							Case-2: CPU-1 grabs base->lock after CPU-0:
> 							CPU-0 will have updated the offsets remotely.
>
> 							base = lock_hrtimer_base(timer, &flags);
>         						if (__hrtimer_start_range_ns(timer, tim, ...
>                 						hrtimer_reprogram(timer, true);
>
>         						unlock_hrtimer_base(timer, &flags);
>
>
> No?

Yeah, you're right. I misread the loop logic.

Can we please make that unconditional independent of nohz full. There is
no reason to special case it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] hrtimer: avoid retrigger_next_event IPI
  2021-04-10  7:53     ` Thomas Gleixner
@ 2021-04-13 17:04       ` Marcelo Tosatti
  2021-04-14 17:19         ` Thomas Gleixner
  2021-04-15 15:39         ` [PATCH v3] " Marcelo Tosatti
  0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-13 17:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits



Setting the realtime clock triggers an IPI to all CPUs to reprogram
hrtimers.

However, only realtime and TAI clocks have their offsets updated
(and therefore potentially require a reprogram).

Check if it only has monotonic active timers, and in that case 
update the realtime and TAI base offsets remotely, skipping the IPI.

This reduces interruptions to latency sensitive applications.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).
   

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5c9d968187ae..be21b85c679d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -871,6 +871,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
+			 (1U << HRTIMER_BASE_TAI)|		\
+			 (1U << HRTIMER_BASE_TAI_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	unsigned int active = 0;
+
+	if (cpu_base->softirq_activated)
+		return true;
+
+	active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
+
+	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
+
+	if ((active & CLOCK_SET_BASES) == 0)
+		return false;
+
+	return true;
+}
+
 /*
  * Clock realtime was set
  *
@@ -885,9 +907,31 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	preempt_disable();
+	for_each_online_cpu(cpu) {
+		unsigned long flags;
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		if (need_reprogram_timer(cpu_base))
+			cpumask_set_cpu(cpu, mask);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	free_cpumask_var(mask);
 #endif
+set_timerfd:
 	timerfd_clock_was_set();
 }


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] hrtimer: avoid retrigger_next_event IPI
  2021-04-13 17:04       ` [PATCH v2] " Marcelo Tosatti
@ 2021-04-14 17:19         ` Thomas Gleixner
  2021-04-15 15:39         ` [PATCH v3] " Marcelo Tosatti
  1 sibling, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-14 17:19 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits

Marcelo,

On Tue, Apr 13 2021 at 14:04, Marcelo Tosatti wrote:
> Setting the realtime clock triggers an IPI to all CPUs to reprogram
> hrtimers.

s/hrtimers/clock event device/

> However, only realtime and TAI clocks have their offsets updated
> (and therefore potentially require a reprogram).
>
> Check if it only has monotonic active timers, and in that case

boottime != monotonic 

> update the realtime and TAI base offsets remotely, skipping the IPI.

Something like this perhaps:

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
that's not the case, update the realtime and TAI base offsets remotely and
skip the IPI. This ensures that any subsequently armed timers on
CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.

Hmm?

> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|		\

Missing space between ) and |

> +			 (1U << HRTIMER_BASE_REALTIME_SOFT)|	\
> +			 (1U << HRTIMER_BASE_TAI)|		\
> +			 (1U << HRTIMER_BASE_TAI_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	unsigned int active = 0;
> +
> +	if (cpu_base->softirq_activated)
> +		return true;
> +
> +	active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
> +
> +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
> +
> +	if ((active & CLOCK_SET_BASES) == 0)
> +		return false;
> +
> +	return true;

That's a pretty elaborate way of writing:

       return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;

> +}
> +
>  /*
>   * Clock realtime was set
>   *
> @@ -885,9 +907,31 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>  void clock_was_set(void)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
> -	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	cpumask_var_t mask;
> +	int cpu;
> +
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> +		on_each_cpu(retrigger_next_event, NULL, 1);
> +		goto set_timerfd;
> +	}
> +
> +	/* Avoid interrupting CPUs if possible */
> +	preempt_disable();

That preempt disable is only required around the function call, for the
loop cpus_read_lock() is sufficient.

> +	for_each_online_cpu(cpu) {
> +		unsigned long flags;
> +		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> +
> +		raw_spin_lock_irqsave(&cpu_base->lock, flags);
> +		if (need_reprogram_timer(cpu_base))
> +			cpumask_set_cpu(cpu, mask);
> +		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
> +	}
> +
> +	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
> +	preempt_enable();
> +	free_cpumask_var(mask);
>  #endif
> +set_timerfd:

My brain compiler tells me that with CONFIG_HIGH_RES_TIMERS=n a real
compiler will emit a warning about a defined, but unused label....

>  	timerfd_clock_was_set();
>  }

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] hrtimer: avoid retrigger_next_event IPI
  2021-04-13 17:04       ` [PATCH v2] " Marcelo Tosatti
  2021-04-14 17:19         ` Thomas Gleixner
@ 2021-04-15 15:39         ` Marcelo Tosatti
  2021-04-15 18:59           ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-15 15:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits


Setting the realtime clock triggers an IPI to all CPUs to reprogram
the clock event device.

However, only realtime and TAI clocks have their offsets updated
(and therefore potentially require a reprogram).

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
that's not the case, update the realtime and TAI base offsets remotely and
skip the IPI. This ensures that any subsequently armed timers on
CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v3:
   - Nicer changelog  (Thomas).
   - Code style fixes (Thomas).
   - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
   - Shrink preemption disabled section (Thomas).

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5c9d968187ae..dd9c0d2f469f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -871,6 +871,24 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
+			 (1U << HRTIMER_BASE_TAI) |		\
+			 (1U << HRTIMER_BASE_TAI_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	unsigned int active = 0;
+
+	if (cpu_base->softirq_activated)
+		return true;
+
+	active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
+	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);
+
+	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
+}
+
 /*
  * Clock realtime was set
  *
@@ -885,8 +903,32 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		unsigned long flags;
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		if (need_reprogram_timer(cpu_base))
+			cpumask_set_cpu(cpu, mask);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	preempt_disable();
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	cpus_read_unlock();
+	free_cpumask_var(mask);
+set_timerfd:
 #endif
 	timerfd_clock_was_set();
 }





^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] hrtimer: avoid retrigger_next_event IPI
  2021-04-15 15:39         ` [PATCH v3] " Marcelo Tosatti
@ 2021-04-15 18:59           ` Thomas Gleixner
  2021-04-15 20:40             ` [PATCH v4] " Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-15 18:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits

On Thu, Apr 15 2021 at 12:39, Marcelo Tosatti wrote:
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	unsigned int active = 0;
> +
> +	if (cpu_base->softirq_activated)
> +		return true;
> +
> +	active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT;
> +	active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD);

It's not entirely clear to me what that 'active' variable business above
is doing and why it's needed at all. But I might be missing something.

> +	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
> +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] hrtimer: avoid retrigger_next_event IPI
  2021-04-15 18:59           ` Thomas Gleixner
@ 2021-04-15 20:40             ` Marcelo Tosatti
  2021-04-16 16:00               ` [PATCH v5] " Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-15 20:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits

Setting the realtime clock triggers an IPI to all CPUs to reprogram
the clock event device.

However, only realtime and TAI clocks have their offsets updated
(and therefore potentially require a reprogram).

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
that's not the case, update the realtime and TAI base offsets remotely and
skip the IPI. This ensures that any subsequently armed timers on
CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v4:
   - Drop unused code (Thomas).

v3:
   - Nicer changelog  (Thomas).
   - Code style fixes (Thomas).
   - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
   - Shrink preemption disabled section (Thomas).

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5c9d968187ae..e228c0a0c98f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -871,6 +871,19 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
+			 (1U << HRTIMER_BASE_TAI) |		\
+			 (1U << HRTIMER_BASE_TAI_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	if (cpu_base->softirq_activated)
+		return true;
+
+	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
+}
+
 /*
  * Clock realtime was set
  *
@@ -885,8 +898,32 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		unsigned long flags;
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		if (need_reprogram_timer(cpu_base))
+			cpumask_set_cpu(cpu, mask);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	preempt_disable();
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	cpus_read_unlock();
+	free_cpumask_var(mask);
+set_timerfd:
 #endif
 	timerfd_clock_was_set();
 }


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5] hrtimer: avoid retrigger_next_event IPI
  2021-04-15 20:40             ` [PATCH v4] " Marcelo Tosatti
@ 2021-04-16 16:00               ` Marcelo Tosatti
  2021-04-16 17:13                 ` Peter Xu
  2021-04-19 19:39                 ` [PATCH v6] " Marcelo Tosatti
  0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-16 16:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits


Setting the realtime clock triggers an IPI to all CPUs to reprogram
the clock event device.

However, only realtime and TAI clocks have their offsets updated
(and therefore potentially require a reprogram).

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
that's not the case, update the realtime and TAI base offsets remotely and
skip the IPI. This ensures that any subsequently armed timers on
CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v5:
  - Add missing hrtimer_update_base (Peter Xu).

v4:
   - Drop unused code (Thomas).

v3:
   - Nicer changelog  (Thomas).
   - Code style fixes (Thomas).
   - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
   - Shrink preemption disabled section (Thomas).

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).


diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5c9d968187ae..06fcc272e28d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -871,6 +871,19 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
+			 (1U << HRTIMER_BASE_TAI) |		\
+			 (1U << HRTIMER_BASE_TAI_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	if (cpu_base->softirq_activated)
+		return true;
+
+	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
+}
+
 /*
  * Clock realtime was set
  *
@@ -885,8 +898,34 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		unsigned long flags;
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		if (need_reprogram_timer(cpu_base))
+			cpumask_set_cpu(cpu, mask);
+		else
+			hrtimer_update_base(cpu_base);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	preempt_disable();
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	cpus_read_unlock();
+	free_cpumask_var(mask);
+set_timerfd:
 #endif
 	timerfd_clock_was_set();
 }


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI
  2021-04-16 16:00               ` [PATCH v5] " Marcelo Tosatti
@ 2021-04-16 17:13                 ` Peter Xu
  2021-04-17 16:24                   ` Thomas Gleixner
  2021-04-19 19:39                 ` [PATCH v6] " Marcelo Tosatti
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Xu @ 2021-04-16 17:13 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Thomas Gleixner, Anna-Maria Behnsen, linux-kernel,
	Frederic Weisbecker, Nitesh Narayan Lal, Alex Belits

Hi, all,

On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
> 
> Setting the realtime clock triggers an IPI to all CPUs to reprogram
> the clock event device.
> 
> However, only realtime and TAI clocks have their offsets updated
> (and therefore potentially require a reprogram).
> 
> Instead of sending an IPI unconditionally, check each per CPU hrtimer base
> whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
> that's not the case, update the realtime and TAI base offsets remotely and
> skip the IPI. This ensures that any subsequently armed timers on
> CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
> 
> v5:
>   - Add missing hrtimer_update_base (Peter Xu).
> 
> v4:
>    - Drop unused code (Thomas).
> 
> v3:
>    - Nicer changelog  (Thomas).
>    - Code style fixes (Thomas).
>    - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
>    - Shrink preemption disabled section (Thomas).
> 
> v2:
>    - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
>    - Don't special case nohz_full CPUs (Thomas).
> 
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 5c9d968187ae..06fcc272e28d 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -871,6 +871,19 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>  	tick_program_event(expires, 1);
>  }
>  
> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
> +			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
> +			 (1U << HRTIMER_BASE_TAI) |		\
> +			 (1U << HRTIMER_BASE_TAI_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	if (cpu_base->softirq_activated)
> +		return true;

A pure question on whether this check is needed...

Here even if softirq_activated==1 (as softirq is going to happen), as long as
(cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
"yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
As that question seems to be orthogonal to whether a softirq is going to
trigger on that cpu.

Thanks,

> +
> +	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
> +}

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI
  2021-04-16 17:13                 ` Peter Xu
@ 2021-04-17 16:24                   ` Thomas Gleixner
  2021-04-17 16:51                     ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-17 16:24 UTC (permalink / raw)
  To: Peter Xu, Marcelo Tosatti
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker,
	Nitesh Narayan Lal, Alex Belits

On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
>>  
>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
>> +			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
>> +			 (1U << HRTIMER_BASE_TAI) |		\
>> +			 (1U << HRTIMER_BASE_TAI_SOFT))
>> +
>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
>> +{
>> +	if (cpu_base->softirq_activated)
>> +		return true;
>
> A pure question on whether this check is needed...
>
> Here even if softirq_activated==1 (as softirq is going to happen), as long as
> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
> "yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
> As that question seems to be orthogonal to whether a softirq is going to
> trigger on that cpu.

That's correct and it's not any different from firing the IPI because in
both cases the update happens with the base lock of the CPU in question
held. And if there are no active timers in any of the affected bases,
then there is no need to reevaluate the next expiry because the offset
update does not affect any armed timers. It just makes sure that the
next enqueu of a timer on such a base will see the the correct offset.

I'll just zap it.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI
  2021-04-17 16:24                   ` Thomas Gleixner
@ 2021-04-17 16:51                     ` Thomas Gleixner
  2021-04-19 18:56                       ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-17 16:51 UTC (permalink / raw)
  To: Peter Xu, Marcelo Tosatti
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker,
	Nitesh Narayan Lal, Alex Belits

On Sat, Apr 17 2021 at 18:24, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
>> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
>>>  
>>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
>>> +			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
>>> +			 (1U << HRTIMER_BASE_TAI) |		\
>>> +			 (1U << HRTIMER_BASE_TAI_SOFT))
>>> +
>>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
>>> +{
>>> +	if (cpu_base->softirq_activated)
>>> +		return true;
>>
>> A pure question on whether this check is needed...
>>
>> Here even if softirq_activated==1 (as softirq is going to happen), as long as
>> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
>> "yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
>> As that question seems to be orthogonal to whether a softirq is going to
>> trigger on that cpu.
>
> That's correct and it's not any different from firing the IPI because in
> both cases the update happens with the base lock of the CPU in question
> held. And if there are no active timers in any of the affected bases,
> then there is no need to reevaluate the next expiry because the offset
> update does not affect any armed timers. It just makes sure that the
> next enqueu of a timer on such a base will see the the correct offset.
>
> I'll just zap it.

But the whole thing is still wrong in two aspects:

    1) BOOTTIME can be one of the affected clocks when sleep time
       (suspended time) is injected because that uses the same mechanism.

       Sorry for missing that earlier when I asked to remove it, but
       that's trivial to fix by adding the BOOTTIME base back.

    2) What's worse is that on resume this might break because that
       mechanism is also used to enforce the reprogramming of the clock
       event devices and there we cannot be selective on clock bases.

       I need to dig deeper into that because suspend/resume has changed
       a lot over time, so this might be just a historical leftover. But
       without proper analysis we might end up with subtle and hard to
       debug wreckage.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI
  2021-04-17 16:51                     ` Thomas Gleixner
@ 2021-04-19 18:56                       ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-19 18:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Xu, Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker,
	Nitesh Narayan Lal, Alex Belits

On Sat, Apr 17, 2021 at 06:51:08PM +0200, Thomas Gleixner wrote:
> On Sat, Apr 17 2021 at 18:24, Thomas Gleixner wrote:
> > On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
> >> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
> >>>  
> >>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
> >>> +			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
> >>> +			 (1U << HRTIMER_BASE_TAI) |		\
> >>> +			 (1U << HRTIMER_BASE_TAI_SOFT))
> >>> +
> >>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> >>> +{
> >>> +	if (cpu_base->softirq_activated)
> >>> +		return true;
> >>
> >> A pure question on whether this check is needed...
> >>
> >> Here even if softirq_activated==1 (as softirq is going to happen), as long as
> >> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
> >> "yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
> >> As that question seems to be orthogonal to whether a softirq is going to
> >> trigger on that cpu.
> >
> > That's correct and it's not any different from firing the IPI because in
> > both cases the update happens with the base lock of the CPU in question
> > held. And if there are no active timers in any of the affected bases,
> > then there is no need to reevaluate the next expiry because the offset
> > update does not affect any armed timers. It just makes sure that the
> > next enqueu of a timer on such a base will see the the correct offset.
> >
> > I'll just zap it.
> 
> But the whole thing is still wrong in two aspects:
> 
>     1) BOOTTIME can be one of the affected clocks when sleep time
>        (suspended time) is injected because that uses the same mechanism.
> 
>        Sorry for missing that earlier when I asked to remove it, but
>        that's trivial to fix by adding the BOOTTIME base back.
> 
>     2) What's worse is that on resume this might break because that
>        mechanism is also used to enforce the reprogramming of the clock
>        event devices and there we cannot be selective on clock bases.
> 
>        I need to dig deeper into that because suspend/resume has changed
>        a lot over time, so this might be just a historical leftover. But
>        without proper analysis we might end up with subtle and hard to
>        debug wreckage.
> 
> Thanks,
> 
>         tglx

Thomas,

There is no gain in avoiding the IPIs for the suspend/resume case 
(since suspending is a large interruption anyway). To avoid 
the potential complexity (and associated bugs), one option would 
be to NOT skip IPIs for the resume case.

Sending -v6 with that (and other suggestions/fixes).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6] hrtimer: avoid retrigger_next_event IPI
  2021-04-16 16:00               ` [PATCH v5] " Marcelo Tosatti
  2021-04-16 17:13                 ` Peter Xu
@ 2021-04-19 19:39                 ` Marcelo Tosatti
  2021-04-19 20:52                   ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-04-19 19:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits


Setting the realtime clock triggers an IPI to all CPUs to reprogram
the clock event device.

However, only realtime and TAI clocks have their offsets updated
(and therefore potentially require a reprogram).

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the CLOCK_REALTIME and CLOCK_TAI bases. If
that's not the case, update the realtime and TAI base offsets remotely and
skip the IPI. This ensures that any subsequently armed timers on
CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v6:
  - Do not take softirq_raised into account (Peter Xu).
  - Include BOOTTIME as base that requires IPI (Thomas).
  - Unconditional reprogram on resume path, since there is
    nothing to gain in such path anyway.

v5:
  - Add missing hrtimer_update_base (Peter Xu).

v4:
   - Drop unused code (Thomas).

v3:
   - Nicer changelog  (Thomas).
   - Code style fixes (Thomas).
   - Compilation warning with CONFIG_HIGH_RES_TIMERS=n (Thomas).
   - Shrink preemption disabled section (Thomas).

v2:
   - Only REALTIME and TAI bases are affected by offset-to-monotonic changes (Thomas).
   - Don't special case nohz_full CPUs (Thomas).


diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index bb5e7b0a4274..14a6e449b221 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -318,7 +318,7 @@ struct clock_event_device;
 
 extern void hrtimer_interrupt(struct clock_event_device *dev);
 
-extern void clock_was_set_delayed(void);
+extern void clock_was_set_delayed(bool force_reprogram);
 
 extern unsigned int hrtimer_resolution;
 
@@ -326,7 +326,7 @@ extern unsigned int hrtimer_resolution;
 
 #define hrtimer_resolution	(unsigned int)LOW_RES_NSEC
 
-static inline void clock_was_set_delayed(void) { }
+static inline void clock_was_set_delayed(bool force_reprogram) { }
 
 #endif
 
@@ -351,7 +351,7 @@ hrtimer_expires_remaining_adjusted(const struct hrtimer *timer)
 						    timer->base->get_time());
 }
 
-extern void clock_was_set(void);
+extern void clock_was_set(bool);
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
 #else
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 5c9d968187ae..2258782fd714 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -758,9 +758,17 @@ static void hrtimer_switch_to_hres(void)
 	retrigger_next_event(NULL);
 }
 
+static void clock_was_set_force_reprogram_work(struct work_struct *work)
+{
+	clock_was_set(true);
+}
+
+static DECLARE_WORK(hrtimer_force_reprogram_work, clock_was_set_force_reprogram_work);
+
+
 static void clock_was_set_work(struct work_struct *work)
 {
-	clock_was_set();
+	clock_was_set(false);
 }
 
 static DECLARE_WORK(hrtimer_work, clock_was_set_work);
@@ -769,9 +777,12 @@ static DECLARE_WORK(hrtimer_work, clock_was_set_work);
  * Called from timekeeping and resume code to reprogram the hrtimer
  * interrupt device on all cpus.
  */
-void clock_was_set_delayed(void)
+void clock_was_set_delayed(bool force_reprogram)
 {
-	schedule_work(&hrtimer_work);
+	if (force_reprogram)
+		schedule_work(&hrtimer_force_reprogram_work);
+	else
+		schedule_work(&hrtimer_work);
 }
 
 #else
@@ -871,6 +882,18 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
 	tick_program_event(expires, 1);
 }
 
+#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
+			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
+			 (1U << HRTIMER_BASE_TAI) |		\
+			 (1U << HRTIMER_BASE_TAI_SOFT) |	\
+			 (1U << HRTIMER_BASE_BOOTTIME) |	\
+			 (1U << HRTIMER_BASE_BOOTTIME_SOFT))
+
+static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
+{
+	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
+}
+
 /*
  * Clock realtime was set
  *
@@ -882,11 +905,42 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
  * resolution timer interrupts. On UP we just disable interrupts and
  * call the high resolution interrupt code.
  */
-void clock_was_set(void)
+void clock_was_set(bool force_reprogram)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
-	/* Retrigger the CPU local events everywhere */
-	on_each_cpu(retrigger_next_event, NULL, 1);
+	cpumask_var_t mask;
+	int cpu;
+
+	if (force_reprogram == true) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+		on_each_cpu(retrigger_next_event, NULL, 1);
+		goto set_timerfd;
+	}
+
+	/* Avoid interrupting CPUs if possible */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		unsigned long flags;
+		struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+
+		raw_spin_lock_irqsave(&cpu_base->lock, flags);
+		if (need_reprogram_timer(cpu_base))
+			cpumask_set_cpu(cpu, mask);
+		else
+			hrtimer_update_base(cpu_base);
+		raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+	}
+
+	preempt_disable();
+	smp_call_function_many(mask, retrigger_next_event, NULL, 1);
+	preempt_enable();
+	cpus_read_unlock();
+	free_cpumask_var(mask);
+set_timerfd:
 #endif
 	timerfd_clock_was_set();
 }
@@ -903,7 +957,7 @@ void hrtimers_resume(void)
 	/* Retrigger on the local CPU */
 	retrigger_next_event(NULL);
 	/* And schedule a retrigger for all others */
-	clock_was_set_delayed();
+	clock_was_set_delayed(true);
 }
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6aee5768c86f..3fef237267bd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1323,7 +1323,7 @@ int do_settimeofday64(const struct timespec64 *ts)
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
-	clock_was_set();
+	clock_was_set(false);
 
 	if (!ret)
 		audit_tk_injoffset(ts_delta);
@@ -1371,7 +1371,7 @@ static int timekeeping_inject_offset(const struct timespec64 *ts)
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
-	clock_was_set();
+	clock_was_set(false);
 
 	return ret;
 }
@@ -1736,7 +1736,7 @@ void timekeeping_inject_sleeptime64(const struct timespec64 *delta)
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
-	clock_was_set();
+	clock_was_set(true);
 }
 #endif
 
@@ -2187,7 +2187,7 @@ static void timekeeping_advance(enum timekeeping_adv_mode mode)
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 	if (clock_set)
 		/* Have to call _delayed version, since in irq context*/
-		clock_was_set_delayed();
+		clock_was_set_delayed(false);
 }
 
 /**
@@ -2425,7 +2425,7 @@ int do_adjtimex(struct __kernel_timex *txc)
 		timekeeping_advance(TK_ADV_FREQ);
 
 	if (tai != orig_tai)
-		clock_was_set();
+		clock_was_set(false);
 
 	ntp_notify_cmos_timer();
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6] hrtimer: avoid retrigger_next_event IPI
  2021-04-19 19:39                 ` [PATCH v6] " Marcelo Tosatti
@ 2021-04-19 20:52                   ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2021-04-19 20:52 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Anna-Maria Behnsen, linux-kernel, Frederic Weisbecker, Peter Xu,
	Nitesh Narayan Lal, Alex Belits

On Mon, Apr 19 2021 at 16:39, Marcelo Tosatti wrote:
>  
> +static void clock_was_set_force_reprogram_work(struct work_struct *work)
> +{
> +	clock_was_set(true);
> +}
> +
> +static DECLARE_WORK(hrtimer_force_reprogram_work, clock_was_set_force_reprogram_work);
> +
> +
>  static void clock_was_set_work(struct work_struct *work)
>  {
> -	clock_was_set();
> +	clock_was_set(false);
>  }
>  
>  static DECLARE_WORK(hrtimer_work, clock_was_set_work);
> @@ -769,9 +777,12 @@ static DECLARE_WORK(hrtimer_work, clock_was_set_work);
>   * Called from timekeeping and resume code to reprogram the hrtimer
>   * interrupt device on all cpus.
>   */
> -void clock_was_set_delayed(void)
> +void clock_was_set_delayed(bool force_reprogram)
>  {
> -	schedule_work(&hrtimer_work);
> +	if (force_reprogram)
> +		schedule_work(&hrtimer_force_reprogram_work);
> +	else
> +		schedule_work(&hrtimer_work);
>  }
>  
>  #else
> @@ -871,6 +882,18 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>  	tick_program_event(expires, 1);
>  }
>  
> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |	\
> +			 (1U << HRTIMER_BASE_REALTIME_SOFT) |	\
> +			 (1U << HRTIMER_BASE_TAI) |		\
> +			 (1U << HRTIMER_BASE_TAI_SOFT) |	\
> +			 (1U << HRTIMER_BASE_BOOTTIME) |	\
> +			 (1U << HRTIMER_BASE_BOOTTIME_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> +	return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
> +}
> +
>  /*
>   * Clock realtime was set
>   *
> @@ -882,11 +905,42 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
>   * resolution timer interrupts. On UP we just disable interrupts and
>   * call the high resolution interrupt code.
>   */
> -void clock_was_set(void)
> +void clock_was_set(bool force_reprogram)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
> -	/* Retrigger the CPU local events everywhere */
> -	on_each_cpu(retrigger_next_event, NULL, 1);
> +	cpumask_var_t mask;
> +	int cpu;
> +
> +	if (force_reprogram == true) {
> +		on_each_cpu(retrigger_next_event, NULL, 1);
> +		goto set_timerfd;
> +	}
> +
> +	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> +		on_each_cpu(retrigger_next_event, NULL, 1);
> +		goto set_timerfd;
> +	}

This is really horrible and the proper thing to do is to seperate the
s2idle/resume related functionality which allows to do some other
cleanups as well.

I already started to look at that over the weekend, but got stuck due to
a couple of correctness issues I found with the whole clock_was_set()
mechanism while looking. That stuff needs to be fixed first.

Back to the above. For the s2idle resume path there is no reason for an
IPI at all. It's just that way because it has been bolted on the
existing machinery.

So today it does:

   tick_unfreeze() {
     if (first_cpu_to_unfreeze()) {
        timekeeping_resume();
          tick_resume();
            tick_resume_local();
          clock_was_set_delayed();
     } else {
        tick_resume_local();
     }
   }

Then after everything is unfrozen including workqueues the delayed
clock_was_set() runs and does the IPI dance. That's just silly.

We can be smarter than that:

   tick_unfreeze() {
     if (first_cpu_to_unfreeze())
        timekeeping_resume();
          tick_resume();
            tick_resume_local();
              hrtimer_resume_local();
     } else {
        tick_resume_local();
          hrtimer_resume_local();
     }
   }

See? No IPI required at all and no magic force argument and as a bonus
we get also understandable code.

Splitting this properly allows to be smarter about the affected clocks
because most of the clock_was_set() events only care about
CLOCK_REALTIME and CLOCK_TAI and just the sleep time injection affects
CLOCK_BOOTTIME as well.

There are a few other things which can be done to avoid even more IPIs,
but let me address the correctness issues of clock_was_set() first.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-04-19 20:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 13:53 [PATCH] hrtimer: avoid retrigger_next_event IPI Marcelo Tosatti
2021-04-07 19:28 ` kernel test robot
2021-04-07 19:28   ` kernel test robot
2021-04-07 22:14 ` Frederic Weisbecker
2021-04-08 12:27   ` Marcelo Tosatti
2021-04-09 14:15 ` Thomas Gleixner
2021-04-09 16:51   ` Marcelo Tosatti
2021-04-10  7:53     ` Thomas Gleixner
2021-04-13 17:04       ` [PATCH v2] " Marcelo Tosatti
2021-04-14 17:19         ` Thomas Gleixner
2021-04-15 15:39         ` [PATCH v3] " Marcelo Tosatti
2021-04-15 18:59           ` Thomas Gleixner
2021-04-15 20:40             ` [PATCH v4] " Marcelo Tosatti
2021-04-16 16:00               ` [PATCH v5] " Marcelo Tosatti
2021-04-16 17:13                 ` Peter Xu
2021-04-17 16:24                   ` Thomas Gleixner
2021-04-17 16:51                     ` Thomas Gleixner
2021-04-19 18:56                       ` Marcelo Tosatti
2021-04-19 19:39                 ` [PATCH v6] " Marcelo Tosatti
2021-04-19 20:52                   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.