All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched: let __sched_period() use rq's nr_running
@ 2015-07-10  8:11 byungchul.park
  2015-07-10 13:31 ` Morten Rasmussen
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: byungchul.park @ 2015-07-10  8:11 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, pjt, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

__sched_period() returns a period which a rq can have. the period has to be
stretched by the number of task *the rq has*, when nr_running > nr_latency.
otherwise, task slice can be very smaller than sysctl_sched_min_granularity
depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09456fc..8ae7aeb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
+	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);
 
 	for_each_sched_entity(se) {
 		struct load_weight *load;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-10  8:11 [PATCH v2] sched: let __sched_period() use rq's nr_running byungchul.park
@ 2015-07-10 13:31 ` Morten Rasmussen
  2015-07-13  0:56   ` Byungchul Park
  2015-07-13  8:26 ` Peter Zijlstra
  2015-07-14  9:26 ` Byungchul Park
  2 siblings, 1 reply; 14+ messages in thread
From: Morten Rasmussen @ 2015-07-10 13:31 UTC (permalink / raw)
  To: byungchul.park; +Cc: mingo, peterz, linux-kernel, pjt

On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 
> __sched_period() returns a period which a rq can have. the period has to be
> stretched by the number of task *the rq has*, when nr_running > nr_latency.
> otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 09456fc..8ae7aeb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
>   */
>  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);

This would stretch the period to fit rq->cfs.h_nr_running (which is
equal to rq.nr_running), but I still think that the slice may be smaller
than sched_min_granularity for low priority tasks since the slice is
scaled by priority.

Also, I'm not sure if we want to enforce sched_slice >=
sched_min_granularity, it would mean that tasks inside task groups can
stretch the overall period and increase latency for non-grouped tasks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-10 13:31 ` Morten Rasmussen
@ 2015-07-13  0:56   ` Byungchul Park
  2015-07-13  7:07     ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Byungchul Park @ 2015-07-13  0:56 UTC (permalink / raw)
  To: Morten Rasmussen; +Cc: mingo, peterz, linux-kernel

On Fri, Jul 10, 2015 at 02:31:10PM +0100, Morten Rasmussen wrote:
> On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> > From: Byungchul Park <byungchul.park@lge.com>
> > 
> > __sched_period() returns a period which a rq can have. the period has to be
> > stretched by the number of task *the rq has*, when nr_running > nr_latency.
> > otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> > depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.
> > 
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > ---
> >  kernel/sched/fair.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 09456fc..8ae7aeb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
> >   */
> >  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> > +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);

hello,

> 
> This would stretch the period to fit rq->cfs.h_nr_running (which is
> equal to rq.nr_running), but I still think that the slice may be smaller
> than sched_min_granularity for low priority tasks since the slice is

yes, i also think the slice may be smaller than sched_min_granularity for 
low priority tasks, while the slice may be larger than sched_min_granularity
for high priority tasks. and as you may know, the slice is already scaled 
by priority in sched_slice().

in order to scale the slice properly in sched_slice(), __sched_period()
should return rq wide period. or i think we should change other code
assuming that variables like sysctl_sched_min_granularity are comparable
to a task execution time which is independant with position of cgroup
hierarch. for example, see check_preempt_tick()..

> scaled by priority.
> 
> Also, I'm not sure if we want to enforce sched_slice >=
> sched_min_granularity, it would mean that tasks inside task groups can
> stretch the overall period and increase latency for non-grouped tasks.

we don't need to enforce sched_slice >= sched_min_granularity. i am just 
saying that rq wide period should be stretched with rq wide nr_number with
which sched_slice() calculate actual task's slice later. 

and i agree with that it makes latency increase for non-grouped tasks.
to prevent it, IMHO, we need to fix how to calculate it. however, when 
getting *rq wide* period, stretching with local cfq's nr_number looks weird.

what do you think?

thank you,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13  0:56   ` Byungchul Park
@ 2015-07-13  7:07     ` Mike Galbraith
  2015-07-13  8:29       ` Byungchul Park
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2015-07-13  7:07 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, 2015-07-13 at 09:56 +0900, Byungchul Park wrote:

> and i agree with that it makes latency increase for non-grouped tasks.

It's not only a latency hit for the root group, it's across the board.

I suspect an overloaded group foo/bar/baz would prefer small slices over
a large wait as well. I certainly wouldn't want my root group taking the
potentially huge hits that come with stretching period to accommodate an
arbitrarily overloaded /foo/bar/baz.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-10  8:11 [PATCH v2] sched: let __sched_period() use rq's nr_running byungchul.park
  2015-07-10 13:31 ` Morten Rasmussen
@ 2015-07-13  8:26 ` Peter Zijlstra
  2015-07-13  9:25   ` Byungchul Park
  2015-07-14  9:26 ` Byungchul Park
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-07-13  8:26 UTC (permalink / raw)
  To: byungchul.park; +Cc: mingo, linux-kernel, pjt

On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 
> __sched_period() returns a period which a rq can have. the period has to be
> stretched by the number of task *the rq has*, when nr_running > nr_latency.
> otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 09456fc..8ae7aeb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
>   */
>  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);
>  
>  	for_each_sched_entity(se) {
>  		struct load_weight *load;

This really doesn't make sense; look at what that
for_each_sched_entity() loop does below this.

I agree that sched_slice() is a difficult proposition in the face of
cgroup, but everything is, cgroups suck arse, they make everything hard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13  7:07     ` Mike Galbraith
@ 2015-07-13  8:29       ` Byungchul Park
  2015-07-13 10:22         ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Byungchul Park @ 2015-07-13  8:29 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, Jul 13, 2015 at 09:07:01AM +0200, Mike Galbraith wrote:
> On Mon, 2015-07-13 at 09:56 +0900, Byungchul Park wrote:
> 
> > and i agree with that it makes latency increase for non-grouped tasks.
> 
> It's not only a latency hit for the root group, it's across the board.
> 
> I suspect an overloaded group foo/bar/baz would prefer small slices over
> a large wait as well. I certainly wouldn't want my root group taking the
> potentially huge hits that come with stretching period to accommodate an
> arbitrarily overloaded /foo/bar/baz.

hello, mike :)

ok, then, do you think that the period have to be stretched by the number of
rq's sched entity(e.i. rq->cfs.nr_running)? if it is done with rq->cfs.nr_running,
as you can guess, leaf sched entities(e.i. tasks) can have much smaller slice
than sysctl_sched_min_granularity. and some code using sysctl_sched_min_granularity
need to be fixed in addition.

anyway, current code looks broken since it stretching with local cfs's nr_running.
IMHO, it should be stretched with rq->*cfs.nr_running* though leaf tasks can have
very small slice, or it should be stretched with rq->*nr_running* to ensure that
any task can have a slice which can be comparable to sysctl_sched_min_granularity.

what do you think about this concern?

thank you,
byungchul

> 
> 	-Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13  8:26 ` Peter Zijlstra
@ 2015-07-13  9:25   ` Byungchul Park
  2015-07-14  2:26     ` Byungchul Park
  0 siblings, 1 reply; 14+ messages in thread
From: Byungchul Park @ 2015-07-13  9:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On Mon, Jul 13, 2015 at 10:26:09AM +0200, Peter Zijlstra wrote:
> On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> > From: Byungchul Park <byungchul.park@lge.com>
> > 
> > __sched_period() returns a period which a rq can have. the period has to be
> > stretched by the number of task *the rq has*, when nr_running > nr_latency.
> > otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> > depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.
> > 
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > ---
> >  kernel/sched/fair.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 09456fc..8ae7aeb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
> >   */
> >  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> > +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);
> >  
> >  	for_each_sched_entity(se) {
> >  		struct load_weight *load;
> 
> This really doesn't make sense; look at what that
> for_each_sched_entity() loop does below this.

hello,

for_each_sched_entity() loop is distributing slice to se with consideration
for both hierarchy and its weight, walking from the passed se to the top rq. 

i am just talking about how to get a whole period value. my question is 
"why does it use local cfs's nr_running to get a whole period value?".

> 
> I agree that sched_slice() is a difficult proposition in the face of
> cgroup, but everything is, cgroups suck arse, they make everything hard.

i don't make an issue of the way for cgroups to work though it already have
many problems as you said.

thank you for commenting,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13  8:29       ` Byungchul Park
@ 2015-07-13 10:22         ` Mike Galbraith
  2015-07-13 11:07           ` Byungchul Park
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2015-07-13 10:22 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, 2015-07-13 at 17:29 +0900, Byungchul Park wrote:
> On Mon, Jul 13, 2015 at 09:07:01AM +0200, Mike Galbraith wrote:
> > On Mon, 2015-07-13 at 09:56 +0900, Byungchul Park wrote:
> > 
> > > and i agree with that it makes latency increase for non-grouped tasks.
> > 
> > It's not only a latency hit for the root group, it's across the board.
> > 
> > I suspect an overloaded group foo/bar/baz would prefer small slices over
> > a large wait as well. I certainly wouldn't want my root group taking the
> > potentially huge hits that come with stretching period to accommodate an
> > arbitrarily overloaded /foo/bar/baz.
> 
> hello, mike :)
> 
> ok, then, do you think that the period have to be stretched by the number of
> rq's sched entity(e.i. rq->cfs.nr_running)? if it is done with rq->cfs.nr_running,
> as you can guess, leaf sched entities(e.i. tasks) can have much smaller slice
> than sysctl_sched_min_granularity. and some code using sysctl_sched_min_granularity
> need to be fixed in addition.

The only choice is to give a small slice frequently or a large slice
infrequently.  Increasing spread for the entire world to accommodate a
massive overload of a small share group in some hierarchy is just not a
viable option.

> anyway, current code looks broken since it stretching with local cfs's nr_running.
> IMHO, it should be stretched with rq->*cfs.nr_running* though leaf tasks can have
> very small slice, or it should be stretched with rq->*nr_running* to ensure that
> any task can have a slice which can be comparable to sysctl_sched_min_granularity.
> 
> what do you think about this concern?

It seems to work fine.  Just say "oh hell no" to hierarchies, and if you
think slices are too small, widen latency_ns a bit to get what you want
to see on your box.  Computing latency target bottom up and cumulatively
is a very bad idea, that lets one nutty group dictate latency for all.  

Something else to keep in mind when fiddling is that FAIR_SLEEPERS by
definition widens spread, effectively doubling our latency target, as
the thing it is defined by is that latency target.  We need short term
fairness so sleepers can perform when facing a world full of hogs, but
the last thing we need is short term being redefined to a week or so ;-)

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13 10:22         ` Mike Galbraith
@ 2015-07-13 11:07           ` Byungchul Park
  2015-07-13 12:30             ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Byungchul Park @ 2015-07-13 11:07 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, Jul 13, 2015 at 12:22:17PM +0200, Mike Galbraith wrote:
> On Mon, 2015-07-13 at 17:29 +0900, Byungchul Park wrote:
> > On Mon, Jul 13, 2015 at 09:07:01AM +0200, Mike Galbraith wrote:
> > > On Mon, 2015-07-13 at 09:56 +0900, Byungchul Park wrote:
> > > 
> > > > and i agree with that it makes latency increase for non-grouped tasks.
> > > 
> > > It's not only a latency hit for the root group, it's across the board.
> > > 
> > > I suspect an overloaded group foo/bar/baz would prefer small slices over
> > > a large wait as well. I certainly wouldn't want my root group taking the
> > > potentially huge hits that come with stretching period to accommodate an
> > > arbitrarily overloaded /foo/bar/baz.
> > 
> > hello, mike :)
> > 
> > ok, then, do you think that the period have to be stretched by the number of
> > rq's sched entity(e.i. rq->cfs.nr_running)? if it is done with rq->cfs.nr_running,
> > as you can guess, leaf sched entities(e.i. tasks) can have much smaller slice
> > than sysctl_sched_min_granularity. and some code using sysctl_sched_min_granularity
> > need to be fixed in addition.
> 
> The only choice is to give a small slice frequently or a large slice
> infrequently.  Increasing spread for the entire world to accommodate a
> massive overload of a small share group in some hierarchy is just not a
> viable option.
> 
> > anyway, current code looks broken since it stretching with local cfs's nr_running.
> > IMHO, it should be stretched with rq->*cfs.nr_running* though leaf tasks can have
> > very small slice, or it should be stretched with rq->*nr_running* to ensure that
> > any task can have a slice which can be comparable to sysctl_sched_min_granularity.
> > 
> > what do you think about this concern?
> 
> It seems to work fine.  Just say "oh hell no" to hierarchies, and if you
> think slices are too small, widen latency_ns a bit to get what you want
> to see on your box.  Computing latency target bottom up and cumulatively
> is a very bad idea, that lets one nutty group dictate latency for all.  

hello,

this is what i missed! i see why computing latency target bottom up is bad.
then... my first option, stretching with the number of rq cfs's sched entities,
e.i. rq->cfs.nr_running, should be choosen to compute latency target,
with additional fix of code assuming that task's execution time is comparable
to sysctl_sched_min_granularity which is not true now.

i still think stretching with local cfs's nr_running should be replaced with
stretching with a top(=root) level one.

thank you,
byungchul

> 
> Something else to keep in mind when fiddling is that FAIR_SLEEPERS by
> definition widens spread, effectively doubling our latency target, as
> the thing it is defined by is that latency target.  We need short term
> fairness so sleepers can perform when facing a world full of hogs, but
> the last thing we need is short term being redefined to a week or so ;-)
> 
> 	-Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13 11:07           ` Byungchul Park
@ 2015-07-13 12:30             ` Mike Galbraith
  2015-07-14  2:07               ` Byungchul Park
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2015-07-13 12:30 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, 2015-07-13 at 20:07 +0900, Byungchul Park wrote:

> i still think stretching with local cfs's nr_running should be replaced with
> stretching with a top(=root) level one.

I think we just can't take 'slice' _too_ seriously.  Not only is it
annoying with cgroups, the scheduler simply doesn't deliver 'slices' in
the traditional sense, it equalizes vruntimes, planning to do that at
slice granularity.  FAIR_SLEEPERS doesn't make that planning any easier.
With a pure compute load and no HR_TICK, what you get is tick
granularity preemption checkpoints, but having just chewed up a 'slice'
means nothing if you're still leftmost.  It's all about vruntime, so
leftmost can have back to back 'slices'.  FAIR_SLEEPERS just increases
the odds that leftmost WILL take more than one 'slice'.

(we could perhaps decay deficit after a full slice or such to decrease
the spread growth that sleepers induce. annoying problem, especially so
with a gaggle of identical sleepers, as sleep time becomes meaningless,
there is no differential to equalize.. other than the ones we create..
but I'm digressing, a lot, time to stop thinking/typing, go do work;)

	-Mike



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13 12:30             ` Mike Galbraith
@ 2015-07-14  2:07               ` Byungchul Park
  2015-07-14  2:30                 ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Byungchul Park @ 2015-07-14  2:07 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Mon, Jul 13, 2015 at 02:30:38PM +0200, Mike Galbraith wrote:
> On Mon, 2015-07-13 at 20:07 +0900, Byungchul Park wrote:
> 
> > i still think stretching with local cfs's nr_running should be replaced with
> > stretching with a top(=root) level one.
> 
> I think we just can't take 'slice' _too_ seriously.  Not only is it

hello mike, :)

as you said, it is not too much important thing which has to be taken
too seriously, since it would be adjusted by vruntime in cfs.

but.. is there any reason meaningless code should be kept in source? :(
it also harms readability. of cource, i need to modify my patch a little 
bit to prevent non-group sched entities from getting large slice.

thank you,
byungchul

> annoying with cgroups, the scheduler simply doesn't deliver 'slices' in
> the traditional sense, it equalizes vruntimes, planning to do that at
> slice granularity.  FAIR_SLEEPERS doesn't make that planning any easier.
> With a pure compute load and no HR_TICK, what you get is tick
> granularity preemption checkpoints, but having just chewed up a 'slice'
> means nothing if you're still leftmost.  It's all about vruntime, so
> leftmost can have back to back 'slices'.  FAIR_SLEEPERS just increases
> the odds that leftmost WILL take more than one 'slice'.
> 
> (we could perhaps decay deficit after a full slice or such to decrease
> the spread growth that sleepers induce. annoying problem, especially so
> with a gaggle of identical sleepers, as sleep time becomes meaningless,
> there is no differential to equalize.. other than the ones we create..
> but I'm digressing, a lot, time to stop thinking/typing, go do work;)
> 
> 	-Mike
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-13  9:25   ` Byungchul Park
@ 2015-07-14  2:26     ` Byungchul Park
  0 siblings, 0 replies; 14+ messages in thread
From: Byungchul Park @ 2015-07-14  2:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On Mon, Jul 13, 2015 at 06:25:35PM +0900, Byungchul Park wrote:
> On Mon, Jul 13, 2015 at 10:26:09AM +0200, Peter Zijlstra wrote:
> > On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> > > From: Byungchul Park <byungchul.park@lge.com>
> > > 
> > > __sched_period() returns a period which a rq can have. the period has to be
> > > stretched by the number of task *the rq has*, when nr_running > nr_latency.
> > > otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> > > depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.
> > > 
> > > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > > ---
> > >  kernel/sched/fair.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 09456fc..8ae7aeb 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
> > >   */
> > >  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > >  {
> > > -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> > > +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);
> > >  
> > >  	for_each_sched_entity(se) {
> > >  		struct load_weight *load;
> > 
> > This really doesn't make sense; look at what that
> > for_each_sched_entity() loop does below this.
> 
> hello,
> 
> for_each_sched_entity() loop is distributing slice to se with consideration
> for both hierarchy and its weight, walking from the passed se to the top rq. 
> 
> i am just talking about how to get a whole period value. my question is 
> "why does it use local cfs's nr_running to get a whole period value?".

i need to modify my patch more, i admit.

but i have a question, do you think it is right to use local cfs's nr_running
to get a whole period value?

> 
> > 
> > I agree that sched_slice() is a difficult proposition in the face of
> > cgroup, but everything is, cgroups suck arse, they make everything hard.
> 
> i don't make an issue of the way for cgroups to work though it already have
> many problems as you said.
> 
> thank you for commenting,
> byungchul
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-14  2:07               ` Byungchul Park
@ 2015-07-14  2:30                 ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2015-07-14  2:30 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Morten Rasmussen, mingo, peterz, linux-kernel

On Tue, 2015-07-14 at 11:07 +0900, Byungchul Park wrote:

> but.. is there any reason meaningless code should be kept in source? :(
> it also harms readability. of cource, i need to modify my patch a little 
> bit to prevent non-group sched entities from getting large slice.

By all means proceed, I'm not trying to discourage you.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] sched: let __sched_period() use rq's nr_running
  2015-07-10  8:11 [PATCH v2] sched: let __sched_period() use rq's nr_running byungchul.park
  2015-07-10 13:31 ` Morten Rasmussen
  2015-07-13  8:26 ` Peter Zijlstra
@ 2015-07-14  9:26 ` Byungchul Park
  2 siblings, 0 replies; 14+ messages in thread
From: Byungchul Park @ 2015-07-14  9:26 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel

On Fri, Jul 10, 2015 at 05:11:30PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 
> __sched_period() returns a period which a rq can have. the period has to be
> stretched by the number of task *the rq has*, when nr_running > nr_latency.
> otherwise, task slice can be very smaller than sysctl_sched_min_granularity
> depending on the position of tg hierarchy when CONFIG_FAIR_GROUP_SCHED.

hello all,

the sysctl_sched_min_granularity must be defined clearly at first. after
defining that clearly, the way to work can be set. the definition can
be either case 1 or case 2 below.

case 1. any task must have at least sysctl_sched_min_granularity slice, which
is currently 0.75ms. in this case, increasing the number of tasks in a rq can
cause stretching a whole latency, which most of you don't like because it can
stretch the whole latency too much. but it looks normal to me since it already 
happens in !CONFIG_FAIR_GROUP_SCHED world with the large number of tasks.
i wonder why CONFIG_FAIR_GROUP_SCHED world must be different with 
!CONFIG_FAIR_GROUP_SCHED world? anyway...

case 2. tasks can have a slice much smaller than sysctl_sched_min_granularity,
according to the position in hierarchy. if a rq has 8 same weighted sched
entities and each entities has 8 same weighted sched entities and do it one
more, then a task can have a very small slice, e.g. 0.75ms / 64 ~ 0.01ms.
if you add more level to cgroup, it would get worse. in this situation,
context switching overhead becomes very large. what does it mean
sysctl_sched_min_granularity here? anyway...

i am not sure which is the right definition of sysctl_sched_min_granularity
between case 1 and case 2. what do you think about this?

thank you,
byungchul

> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 09456fc..8ae7aeb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -635,7 +635,7 @@ static u64 __sched_period(unsigned long nr_running)
>   */
>  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> +	u64 slice = __sched_period(rq_of(cfs_rq)->nr_running + !se->on_rq);
>  
>  	for_each_sched_entity(se) {
>  		struct load_weight *load;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-14  9:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10  8:11 [PATCH v2] sched: let __sched_period() use rq's nr_running byungchul.park
2015-07-10 13:31 ` Morten Rasmussen
2015-07-13  0:56   ` Byungchul Park
2015-07-13  7:07     ` Mike Galbraith
2015-07-13  8:29       ` Byungchul Park
2015-07-13 10:22         ` Mike Galbraith
2015-07-13 11:07           ` Byungchul Park
2015-07-13 12:30             ` Mike Galbraith
2015-07-14  2:07               ` Byungchul Park
2015-07-14  2:30                 ` Mike Galbraith
2015-07-13  8:26 ` Peter Zijlstra
2015-07-13  9:25   ` Byungchul Park
2015-07-14  2:26     ` Byungchul Park
2015-07-14  9:26 ` Byungchul Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.