linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
@ 2013-12-17 21:23 Kevin Hilman
  2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
  2014-01-05 13:21 ` [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Frederic Weisbecker
  0 siblings, 2 replies; 11+ messages in thread
From: Kevin Hilman @ 2013-12-17 21:23 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner; +Cc: linux-kernel, linaro-kernel

Allow debugfs override of sched_tick_max_deferment in order to ease
finding/fixing the remaining issues with full nohz.

The value to be written is in jiffies, and -1 means the max deferment
is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5ac63c9a995a..4b1fe3e69fe4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2175,6 +2175,8 @@ void scheduler_tick(void)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
+static u32 sched_tick_max_deferment = HZ;
+
 /**
  * scheduler_tick_max_deferment
  *
@@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
 	struct rq *rq = this_rq();
 	unsigned long next, now = ACCESS_ONCE(jiffies);
 
-	next = rq->last_sched_tick + HZ;
+	if (sched_tick_max_deferment == -1)
+		return KTIME_MAX;
+
+	next = rq->last_sched_tick + sched_tick_max_deferment;
 
 	if (time_before_eq(next, now))
 		return 0;
 
 	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
+
+static __init int sched_nohz_full_init_debug(void)
+{
+	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
+			   &sched_tick_max_deferment);
+
+	return 0;
+}
+late_initcall(sched_nohz_full_init_debug);
 #endif
 
 notrace unsigned long get_parent_ip(unsigned long addr)
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-12-17 21:23 [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Kevin Hilman
@ 2013-12-17 21:23 ` Kevin Hilman
  2014-01-05 13:06   ` Frederic Weisbecker
  2014-01-25 14:22   ` [tip:timers/urgent] sched/nohz: Fix " tip-bot for Kevin Hilman
  2014-01-05 13:21 ` [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Frederic Weisbecker
  1 sibling, 2 replies; 11+ messages in thread
From: Kevin Hilman @ 2013-12-17 21:23 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner; +Cc: linux-kernel, linaro-kernel

The conversion of the max deferment from usecs to nsecs can easily
overflow on platforms where a long is 32-bits.  To fix, cast the usecs
value to u64 before multiplying by NSECS_PER_USEC.

This was discovered on 32-bit ARM platform when extending the max
deferment value.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b1fe3e69fe4..3d7c80e1c4d9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
 
 static __init int sched_nohz_full_init_debug(void)
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
@ 2014-01-05 13:06   ` Frederic Weisbecker
  2014-01-06 18:27     ` Kevin Hilman
  2014-01-25 14:22   ` [tip:timers/urgent] sched/nohz: Fix " tip-bot for Kevin Hilman
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2014-01-05 13:06 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Peter Zijlstra,
	Ingo Molnar

On Tue, Dec 17, 2013 at 01:23:08PM -0800, Kevin Hilman wrote:
> The conversion of the max deferment from usecs to nsecs can easily
> overflow on platforms where a long is 32-bits.  To fix, cast the usecs
> value to u64 before multiplying by NSECS_PER_USEC.
> 
> This was discovered on 32-bit ARM platform when extending the max
> deferment value.
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4b1fe3e69fe4..3d7c80e1c4d9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
>  	if (time_before_eq(next, now))
>  		return 0;
>  
> -	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
> +	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;

Just to be sure I understand the issue. The problem is that jiffies_to_usecs()
return an unsigned int which is then multiplied by NSEC_PER_USEC. If the result
of the mul is too big to be stored in an unsigned int, we overflow and may loose
some high part of the result. Right?

>  }
>  
>  static __init int sched_nohz_full_init_debug(void)
> -- 
> 1.8.3
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
  2013-12-17 21:23 [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Kevin Hilman
  2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
@ 2014-01-05 13:21 ` Frederic Weisbecker
  2014-01-06 18:37   ` Kevin Hilman
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2014-01-05 13:21 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Ingo Molnar,
	Peter Zijlstra

On Tue, Dec 17, 2013 at 01:23:07PM -0800, Kevin Hilman wrote:
> Allow debugfs override of sched_tick_max_deferment in order to ease
> finding/fixing the remaining issues with full nohz.
> 
> The value to be written is in jiffies, and -1 means the max deferment
> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>
> ---
>  kernel/sched/core.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5ac63c9a995a..4b1fe3e69fe4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2175,6 +2175,8 @@ void scheduler_tick(void)
>  }
>  
>  #ifdef CONFIG_NO_HZ_FULL
> +static u32 sched_tick_max_deferment = HZ;
> +
>  /**
>   * scheduler_tick_max_deferment
>   *
> @@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
>  	struct rq *rq = this_rq();
>  	unsigned long next, now = ACCESS_ONCE(jiffies);
>  
> -	next = rq->last_sched_tick + HZ;
> +	if (sched_tick_max_deferment == -1)
> +		return KTIME_MAX;
> +
> +	next = rq->last_sched_tick + sched_tick_max_deferment;
>  
>  	if (time_before_eq(next, now))
>  		return 0;
>  
>  	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>  }
> +
> +static __init int sched_nohz_full_init_debug(void)
> +{
> +	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
> +			   &sched_tick_max_deferment);
> +
> +	return 0;
> +}
> +late_initcall(sched_nohz_full_init_debug);

If the goal is mostly to turn off sched_tick_max_deferment (set to -1), we should
perhaps make it a boolean sched feature (see kernel/sched/features.h) as it's a pretty
well consolidated interface.

>  #endif
>  
>  notrace unsigned long get_parent_ip(unsigned long addr)
> -- 
> 1.8.3
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment()
  2014-01-05 13:06   ` Frederic Weisbecker
@ 2014-01-06 18:27     ` Kevin Hilman
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Hilman @ 2014-01-06 18:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Peter Zijlstra,
	Ingo Molnar

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Tue, Dec 17, 2013 at 01:23:08PM -0800, Kevin Hilman wrote:
>> The conversion of the max deferment from usecs to nsecs can easily
>> overflow on platforms where a long is 32-bits.  To fix, cast the usecs
>> value to u64 before multiplying by NSECS_PER_USEC.
>> 
>> This was discovered on 32-bit ARM platform when extending the max
>> deferment value.
>> 
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Signed-off-by: Kevin Hilman <khilman@linaro.org>
>> ---
>>  kernel/sched/core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 4b1fe3e69fe4..3d7c80e1c4d9 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2203,7 +2203,7 @@ u64 scheduler_tick_max_deferment(void)
>>  	if (time_before_eq(next, now))
>>  		return 0;
>>  
>> -	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>> +	return (u64)jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>
> Just to be sure I understand the issue. The problem is that jiffies_to_usecs()
> return an unsigned int which is then multiplied by NSEC_PER_USEC. If the result
> of the mul is too big to be stored in an unsigned int, we overflow and may loose
> some high part of the result. Right?

Correct.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
  2014-01-05 13:21 ` [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Frederic Weisbecker
@ 2014-01-06 18:37   ` Kevin Hilman
  2014-01-10 15:17     ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Hilman @ 2014-01-06 18:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Ingo Molnar,
	Peter Zijlstra

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Tue, Dec 17, 2013 at 01:23:07PM -0800, Kevin Hilman wrote:
>> Allow debugfs override of sched_tick_max_deferment in order to ease
>> finding/fixing the remaining issues with full nohz.
>> 
>> The value to be written is in jiffies, and -1 means the max deferment
>> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
>> 
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Signed-off-by: Kevin Hilman <khilman@linaro.org>
>> ---
>>  kernel/sched/core.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 5ac63c9a995a..4b1fe3e69fe4 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2175,6 +2175,8 @@ void scheduler_tick(void)
>>  }
>>  
>>  #ifdef CONFIG_NO_HZ_FULL
>> +static u32 sched_tick_max_deferment = HZ;
>> +
>>  /**
>>   * scheduler_tick_max_deferment
>>   *
>> @@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
>>  	struct rq *rq = this_rq();
>>  	unsigned long next, now = ACCESS_ONCE(jiffies);
>>  
>> -	next = rq->last_sched_tick + HZ;
>> +	if (sched_tick_max_deferment == -1)
>> +		return KTIME_MAX;
>> +
>> +	next = rq->last_sched_tick + sched_tick_max_deferment;
>>  
>>  	if (time_before_eq(next, now))
>>  		return 0;
>>  
>>  	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>>  }
>> +
>> +static __init int sched_nohz_full_init_debug(void)
>> +{
>> +	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
>> +			   &sched_tick_max_deferment);
>> +
>> +	return 0;
>> +}
>> +late_initcall(sched_nohz_full_init_debug);
>
> If the goal is mostly to turn off sched_tick_max_deferment (set to -1), we should
> perhaps make it a boolean sched feature (see kernel/sched/features.h) as it's a pretty
> well consolidated interface.

Well, I suspect folks may want to set it to various values, depending on
workload to experiment with the results.

Also, my first attempt was to add control over this via sysctl[1] (though
not sched_features) and you suggested[2] I use debugfs instead since this
should be a temporary hack until we can remove the 1Hz residual tick.

Kevin

[1] http://marc.info/?l=linux-kernel&m=137159992306877&w=2
[2] http://marc.info/?l=linux-kernel&m=137166737830821&w=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
  2014-01-06 18:37   ` Kevin Hilman
@ 2014-01-10 15:17     ` Frederic Weisbecker
  2014-01-14 20:46       ` Kevin Hilman
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2014-01-10 15:17 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Ingo Molnar,
	Peter Zijlstra

On Mon, Jan 06, 2014 at 10:37:27AM -0800, Kevin Hilman wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > On Tue, Dec 17, 2013 at 01:23:07PM -0800, Kevin Hilman wrote:
> >> Allow debugfs override of sched_tick_max_deferment in order to ease
> >> finding/fixing the remaining issues with full nohz.
> >> 
> >> The value to be written is in jiffies, and -1 means the max deferment
> >> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
> >> 
> >> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> >> Signed-off-by: Kevin Hilman <khilman@linaro.org>
> >> ---
> >>  kernel/sched/core.c | 16 +++++++++++++++-
> >>  1 file changed, 15 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 5ac63c9a995a..4b1fe3e69fe4 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -2175,6 +2175,8 @@ void scheduler_tick(void)
> >>  }
> >>  
> >>  #ifdef CONFIG_NO_HZ_FULL
> >> +static u32 sched_tick_max_deferment = HZ;
> >> +
> >>  /**
> >>   * scheduler_tick_max_deferment
> >>   *
> >> @@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
> >>  	struct rq *rq = this_rq();
> >>  	unsigned long next, now = ACCESS_ONCE(jiffies);
> >>  
> >> -	next = rq->last_sched_tick + HZ;
> >> +	if (sched_tick_max_deferment == -1)
> >> +		return KTIME_MAX;
> >> +
> >> +	next = rq->last_sched_tick + sched_tick_max_deferment;
> >>  
> >>  	if (time_before_eq(next, now))
> >>  		return 0;
> >>  
> >>  	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
> >>  }
> >> +
> >> +static __init int sched_nohz_full_init_debug(void)
> >> +{
> >> +	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
> >> +			   &sched_tick_max_deferment);
> >> +
> >> +	return 0;
> >> +}
> >> +late_initcall(sched_nohz_full_init_debug);
> >
> > If the goal is mostly to turn off sched_tick_max_deferment (set to -1), we should
> > perhaps make it a boolean sched feature (see kernel/sched/features.h) as it's a pretty
> > well consolidated interface.
> 
> Well, I suspect folks may want to set it to various values, depending on
> workload to experiment with the results.

Another option is to add an integer file in sched_features/ debugfs directory. But all other
files there are boolean, so that wouldn't integrate there very well.

One of the things I would like to try is to offline sched_class(current[$CPU])::scheduler_tick()
to the timekeeper or any housekeeping CPU.

So the housekeeper could handle the periodic tick on behalf of full dynticks CPUs. And then
being able to tune the frequency of this sounds interesting.

So yeah having an tunable integer makes sense after all.

> 
> Also, my first attempt was to add control over this via sysctl[1] (though
> not sched_features) and you suggested[2] I use debugfs instead since this
> should be a temporary hack until we can remove the 1Hz residual tick.

Right, but SCHED_FEAT are debugfs :)  And I thought we could either reuse it
or reuse the sched feature debugfs directory. But again I realize it's all made of bool values
so it's not very welcoming for consistency.

Anyway thinking about it more, perhaps we should actually use your patch that use sysctl since
the rest of the scheduler does that for tunable numbers.

Now since it's sysctl, I'm kind of more picky about correctness limits: what if people set high values,
thinking the kernel can handle them just fine, while it can't yet obviously? Should we ignore values that
goes too far? And how to we draw the line?

Thoughts?

> Kevin
> 
> [1] http://marc.info/?l=linux-kernel&m=137159992306877&w=2
> [2] http://marc.info/?l=linux-kernel&m=137166737830821&w=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
  2014-01-10 15:17     ` Frederic Weisbecker
@ 2014-01-14 20:46       ` Kevin Hilman
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Hilman @ 2014-01-14 20:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, linux-kernel, linaro-kernel, Ingo Molnar,
	Peter Zijlstra

Frederic Weisbecker <fweisbec@gmail.com> writes:

> On Mon, Jan 06, 2014 at 10:37:27AM -0800, Kevin Hilman wrote:
>> Frederic Weisbecker <fweisbec@gmail.com> writes:
>> 
>> > On Tue, Dec 17, 2013 at 01:23:07PM -0800, Kevin Hilman wrote:
>> >> Allow debugfs override of sched_tick_max_deferment in order to ease
>> >> finding/fixing the remaining issues with full nohz.
>> >> 
>> >> The value to be written is in jiffies, and -1 means the max deferment
>> >> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
>> >> 
>> >> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> >> Signed-off-by: Kevin Hilman <khilman@linaro.org>
>> >> ---
>> >>  kernel/sched/core.c | 16 +++++++++++++++-
>> >>  1 file changed, 15 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index 5ac63c9a995a..4b1fe3e69fe4 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -2175,6 +2175,8 @@ void scheduler_tick(void)
>> >>  }
>> >>  
>> >>  #ifdef CONFIG_NO_HZ_FULL
>> >> +static u32 sched_tick_max_deferment = HZ;
>> >> +
>> >>  /**
>> >>   * scheduler_tick_max_deferment
>> >>   *
>> >> @@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
>> >>  	struct rq *rq = this_rq();
>> >>  	unsigned long next, now = ACCESS_ONCE(jiffies);
>> >>  
>> >> -	next = rq->last_sched_tick + HZ;
>> >> +	if (sched_tick_max_deferment == -1)
>> >> +		return KTIME_MAX;
>> >> +
>> >> +	next = rq->last_sched_tick + sched_tick_max_deferment;
>> >>  
>> >>  	if (time_before_eq(next, now))
>> >>  		return 0;
>> >>  
>> >>  	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>> >>  }
>> >> +
>> >> +static __init int sched_nohz_full_init_debug(void)
>> >> +{
>> >> +	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
>> >> +			   &sched_tick_max_deferment);
>> >> +
>> >> +	return 0;
>> >> +}
>> >> +late_initcall(sched_nohz_full_init_debug);
>> >
>> > If the goal is mostly to turn off sched_tick_max_deferment (set to -1), we should
>> > perhaps make it a boolean sched feature (see kernel/sched/features.h) as it's a pretty
>> > well consolidated interface.
>> 
>> Well, I suspect folks may want to set it to various values, depending on
>> workload to experiment with the results.
>
> Another option is to add an integer file in sched_features/ debugfs directory. But all other
> files there are boolean, so that wouldn't integrate there very well.
>
> One of the things I would like to try is to offline sched_class(current[$CPU])::scheduler_tick()
> to the timekeeper or any housekeeping CPU.
>
> So the housekeeper could handle the periodic tick on behalf of full dynticks CPUs. And then
> being able to tune the frequency of this sounds interesting.
>
> So yeah having an tunable integer makes sense after all.
>
>> 
>> Also, my first attempt was to add control over this via sysctl[1] (though
>> not sched_features) and you suggested[2] I use debugfs instead since this
>> should be a temporary hack until we can remove the 1Hz residual tick.
>
> Right, but SCHED_FEAT are debugfs :)  And I thought we could either reuse it
> or reuse the sched feature debugfs directory. But again I realize it's all made of bool values
> so it's not very welcoming for consistency.
>
> Anyway thinking about it more, perhaps we should actually use your patch that use sysctl since
> the rest of the scheduler does that for tunable numbers.
>
> Now since it's sysctl, I'm kind of more picky about correctness limits: what if people set high values,
> thinking the kernel can handle them just fine, while it can't yet obviously? Should we ignore values that
> goes too far? And how to we draw the line?
>
> Thoughts?

Well, since I think itt's more of a debug feature, I don't think we
should draw a line.  For example, the current patch lets you disable it
completly by setting it to -1.  Surely this will break some use cases,
but breaking things is kinda the point so we are better able to see
where the problems are.

Also, you previously made the case that it shouldn't be a sysctl
option since this will (hopefully) be going away in the not too distant
future.  Isn't that still the case?

Kevin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:timers/urgent] sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
  2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
  2014-01-05 13:06   ` Frederic Weisbecker
@ 2014-01-25 14:22   ` tip-bot for Kevin Hilman
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Kevin Hilman @ 2014-01-25 14:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, john.stultz, alex.shi, paulmck,
	fweisbec, rostedt, tglx, khilman

Commit-ID:  8fe8ff09ce3b5750e1f3e45a1f4a81d59c7ff1f1
Gitweb:     http://git.kernel.org/tip/8fe8ff09ce3b5750e1f3e45a1f4a81d59c7ff1f1
Author:     Kevin Hilman <khilman@linaro.org>
AuthorDate: Wed, 15 Jan 2014 14:51:38 +0100
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Thu, 16 Jan 2014 00:08:12 +0100

sched/nohz: Fix overflow error in scheduler_tick_max_deferment()

While calculating the scheduler tick max deferment, the delta is
converted from microseconds to nanoseconds through a multiplication
against NSEC_PER_USEC.

But this microseconds operand is an unsigned int, thus the result may
likely overflow. The result is cast to u64 but only once the operation
is completed, which is too late to avoid overflown result.

This is currently not a problem because the scheduler tick max deferment
is 1 second. But this may become an issue as we plan to make this
value tunable.

So lets fix this by casting the usecs value to u64 before multiplying by
NSECS_PER_USEC.

Also to prevent from this kind of mistake to happen again, move this
ad-hoc jiffies -> nsecs conversion to a new helper.

Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387315388-31676-2-git-send-email-khilman@linaro.org
[move ad-hoc conversion to jiffies_to_nsecs helper]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/jiffies.h | 6 ++++++
 kernel/sched/core.c     | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index d235e88..1f44466 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -294,6 +294,12 @@ extern unsigned long preset_lpj;
  */
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);
+
+static inline u64 jiffies_to_nsecs(const unsigned long j)
+{
+	return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
+}
+
 extern unsigned long msecs_to_jiffies(const unsigned int m);
 extern unsigned long usecs_to_jiffies(const unsigned int u);
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a88f4a4..61e601f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2325,7 +2325,7 @@ u64 scheduler_tick_max_deferment(void)
 	if (time_before_eq(next, now))
 		return 0;
 
-	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
+	return jiffies_to_nsecs(next - now);
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
  2013-09-16 22:43 Kevin Hilman
@ 2013-11-18 21:42 ` Kevin Hilman
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Hilman @ 2013-11-18 21:42 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-arm-kernel, linaro-kernel, Paul McKenney, LKML

Frederic,

On Mon, Sep 16, 2013 at 3:43 PM, Kevin Hilman <khilman@linaro.org> wrote:
> Allow debugfs override of sched_tick_max_deferment in order to ease
> finding/fixing the remaining issues with full nohz.
>
> The value to be written is in jiffies, and -1 means the max deferment
> is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)
>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Kevin Hilman <khilman@linaro.org>

Any objections to this series for ease of debugging and tracking down
issues with the residual tick?

Kevin

> ---
>  kernel/sched/core.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5ac63c9..4b1fe3e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2175,6 +2175,8 @@ void scheduler_tick(void)
>  }
>
>  #ifdef CONFIG_NO_HZ_FULL
> +static u32 sched_tick_max_deferment = HZ;
> +
>  /**
>   * scheduler_tick_max_deferment
>   *
> @@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
>         struct rq *rq = this_rq();
>         unsigned long next, now = ACCESS_ONCE(jiffies);
>
> -       next = rq->last_sched_tick + HZ;
> +       if (sched_tick_max_deferment == -1)
> +               return KTIME_MAX;
> +
> +       next = rq->last_sched_tick + sched_tick_max_deferment;
>
>         if (time_before_eq(next, now))
>                 return 0;
>
>         return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
>  }
> +
> +static __init int sched_nohz_full_init_debug(void)
> +{
> +       debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
> +                          &sched_tick_max_deferment);
> +
> +       return 0;
> +}
> +late_initcall(sched_nohz_full_init_debug);
>  #endif
>
>  notrace unsigned long get_parent_ip(unsigned long addr)
> --
> 1.8.3
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment
@ 2013-09-16 22:43 Kevin Hilman
  2013-11-18 21:42 ` Kevin Hilman
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Hilman @ 2013-09-16 22:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-arm-kernel, linaro-kernel, Paul McKenney, linux-kernel

Allow debugfs override of sched_tick_max_deferment in order to ease
finding/fixing the remaining issues with full nohz.

The value to be written is in jiffies, and -1 means the max deferment
is disabled (scheduler_tick_max_deferment() returns KTIME_MAX.)

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
---
 kernel/sched/core.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5ac63c9..4b1fe3e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2175,6 +2175,8 @@ void scheduler_tick(void)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
+static u32 sched_tick_max_deferment = HZ;
+
 /**
  * scheduler_tick_max_deferment
  *
@@ -2193,13 +2195,25 @@ u64 scheduler_tick_max_deferment(void)
 	struct rq *rq = this_rq();
 	unsigned long next, now = ACCESS_ONCE(jiffies);
 
-	next = rq->last_sched_tick + HZ;
+	if (sched_tick_max_deferment == -1)
+		return KTIME_MAX;
+
+	next = rq->last_sched_tick + sched_tick_max_deferment;
 
 	if (time_before_eq(next, now))
 		return 0;
 
 	return jiffies_to_usecs(next - now) * NSEC_PER_USEC;
 }
+
+static __init int sched_nohz_full_init_debug(void)
+{
+	debugfs_create_u32("sched_tick_max_deferment", 0644, NULL,
+			   &sched_tick_max_deferment);
+
+	return 0;
+}
+late_initcall(sched_nohz_full_init_debug);
 #endif
 
 notrace unsigned long get_parent_ip(unsigned long addr)
-- 
1.8.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-01-25 14:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17 21:23 [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Kevin Hilman
2013-12-17 21:23 ` [PATCH 2/2] sched/nohz: fix overflow error in scheduler_tick_max_deferment() Kevin Hilman
2014-01-05 13:06   ` Frederic Weisbecker
2014-01-06 18:27     ` Kevin Hilman
2014-01-25 14:22   ` [tip:timers/urgent] sched/nohz: Fix " tip-bot for Kevin Hilman
2014-01-05 13:21 ` [PATCH 1/2] sched/nohz: add debugfs control over sched_tick_max_deferment Frederic Weisbecker
2014-01-06 18:37   ` Kevin Hilman
2014-01-10 15:17     ` Frederic Weisbecker
2014-01-14 20:46       ` Kevin Hilman
  -- strict thread matches above, loose matches on Subject: below --
2013-09-16 22:43 Kevin Hilman
2013-11-18 21:42 ` Kevin Hilman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).