* [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group @ 2019-01-23 9:46 ufo19890607 2019-01-25 3:11 ` 王贇 2019-02-06 17:19 ` Peter Zijlstra 0 siblings, 2 replies; 10+ messages in thread From: ufo19890607 @ 2019-01-23 9:46 UTC (permalink / raw) To: mingo, peterz, yun.wang, yuzhoujian; +Cc: linux-kernel From: yuzhoujian <yuzhoujian@didichuxing.com> We can monitor the sum wait time of a task group since 'commit 3d6c50c27bd6 ("sched/debug: Show the sum wait time of a task group")'. However this wait_sum just represents the confilct between different task groups, since it is simply sum the wait time of task_group's cfs_rq. And we still cannot evaluate the conflict between all the tasks within hierarchy of this group, so the hierarchy wait time is still needed. Thus we introduce hierarchy wait_sum which summarizes the total wait sum of all the tasks in the hierarchy of a group. The 'cpu.stat' is modified to show the statistic, like: nr_periods 0 nr_throttled 0 throttled_time 0 intergroup wait_sum 2842251984 hierarchy wait_sum 6389509389332798 From now on we can monitor both the wait_sum of intergroup and hierarchy, which will inevitably help a system administrator know how intense the CPU competition is within a task group and between different task groups. We can calculate the wait rate of a task group based on hierarchy wait_sum and cpuacct.usage. For example: X% = (current_wait_sum - last_wait_sum) / ((current_usage - last_usage) + (current_wait_sum - last_wait_sum)) That means the task group paid X percentage of time on runqueue waiting for the CPU. Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> --- kernel/sched/core.c | 11 +++++++---- kernel/sched/fair.c | 17 +++++++++++++++++ kernel/sched/sched.h | 3 +++ 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ee77636..172e6fb 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6760,13 +6760,16 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time); if (schedstat_enabled() && tg != &root_task_group) { - u64 ws = 0; + u64 inter_ws = 0, hierarchy_ws = 0; int i; - for_each_possible_cpu(i) - ws += schedstat_val(tg->se[i]->statistics.wait_sum); + for_each_possible_cpu(i) { + inter_ws += schedstat_val(tg->se[i]->statistics.wait_sum); + hierarchy_ws += tg->cfs_rq[i]->hierarchy_wait_sum; + } - seq_printf(sf, "wait_sum %llu\n", ws); + seq_printf(sf, "intergroup wait_sum %llu\n", inter_ws); + seq_printf(sf, "hierarchy wait_sum %llu\n", hierarchy_ws); } return 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e2ff4b6..35e89ca 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) } static inline void +update_hierarchy_wait_sum(struct sched_entity *se, + u64 delta_wait) +{ + for_each_sched_entity(se) { + struct cfs_rq *cfs_rq = cfs_rq_of(se); + + if (cfs_rq->tg != &root_task_group) + __schedstat_add(cfs_rq->hierarchy_wait_sum, + delta_wait); + } +} + +static inline void update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) { struct task_struct *p; @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) return; } trace_sched_stat_wait(p, delta); + update_hierarchy_wait_sum(se, delta); } __schedstat_set(se->statistics.wait_max, @@ -10273,6 +10287,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) #ifndef CONFIG_64BIT cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif +#ifdef CONFIG_SCHEDSTATS + cfs_rq->hierarchy_wait_sum = 0; +#endif #ifdef CONFIG_SMP raw_spin_lock_init(&cfs_rq->removed.lock); #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d27c1a5..c01ab99 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -496,6 +496,9 @@ struct cfs_rq { #ifndef CONFIG_64BIT u64 min_vruntime_copy; #endif +#ifdef CONFIG_SCHEDSTATS + u64 hierarchy_wait_sum; +#endif struct rb_root_cached tasks_timeline; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-01-23 9:46 [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group ufo19890607 @ 2019-01-25 3:11 ` 王贇 [not found] ` <CAHCio2iYoKOSgxJWFX6KcRcn7GK=aEJP7Sz0vU6SwXK2C56mew@mail.gmail.com> 2019-02-06 17:19 ` Peter Zijlstra 1 sibling, 1 reply; 10+ messages in thread From: 王贇 @ 2019-01-25 3:11 UTC (permalink / raw) To: ufo19890607, mingo, peterz, yuzhoujian; +Cc: linux-kernel On 2019/1/23 下午5:46, ufo19890607@gmail.com wrote: > From: yuzhoujian <yuzhoujian@didichuxing.com> > > We can monitor the sum wait time of a task group since 'commit 3d6c50c27bd6 > ("sched/debug: Show the sum wait time of a task group")'. However this > wait_sum just represents the confilct between different task groups, since > it is simply sum the wait time of task_group's cfs_rq. And we still cannot > evaluate the conflict between all the tasks within hierarchy of this group, > so the hierarchy wait time is still needed. Could you please give us a scene that we do need this hierarchy wait_sum, despite the extra overhead? Regards, Michael Wang > > Thus we introduce hierarchy wait_sum which summarizes the total wait sum of > all the tasks in the hierarchy of a group. > > The 'cpu.stat' is modified to show the statistic, like: > > nr_periods 0 > nr_throttled 0 > throttled_time 0 > intergroup wait_sum 2842251984 > hierarchy wait_sum 6389509389332798 > > From now on we can monitor both the wait_sum of intergroup and hierarchy, > which will inevitably help a system administrator know how intense the CPU > competition is within a task group and between different task groups. We > can calculate the wait rate of a task group based on hierarchy wait_sum and > cpuacct.usage. > > For example: > X% = (current_wait_sum - last_wait_sum) / ((current_usage - > last_usage) + (current_wait_sum - last_wait_sum)) > > That means the task group paid X percentage of time on runqueue waiting > for the CPU. > > Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> > --- > kernel/sched/core.c | 11 +++++++---- > kernel/sched/fair.c | 17 +++++++++++++++++ > kernel/sched/sched.h | 3 +++ > 3 files changed, 27 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ee77636..172e6fb 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6760,13 +6760,16 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) > seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time); > > if (schedstat_enabled() && tg != &root_task_group) { > - u64 ws = 0; > + u64 inter_ws = 0, hierarchy_ws = 0; > int i; > > - for_each_possible_cpu(i) > - ws += schedstat_val(tg->se[i]->statistics.wait_sum); > + for_each_possible_cpu(i) { > + inter_ws += schedstat_val(tg->se[i]->statistics.wait_sum); > + hierarchy_ws += tg->cfs_rq[i]->hierarchy_wait_sum; > + } > > - seq_printf(sf, "wait_sum %llu\n", ws); > + seq_printf(sf, "intergroup wait_sum %llu\n", inter_ws); > + seq_printf(sf, "hierarchy wait_sum %llu\n", hierarchy_ws); > } > > return 0; > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e2ff4b6..35e89ca 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > } > > static inline void > +update_hierarchy_wait_sum(struct sched_entity *se, > + u64 delta_wait) > +{ > + for_each_sched_entity(se) { > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > + > + if (cfs_rq->tg != &root_task_group) > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > + delta_wait); > + } > +} > + > +static inline void > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > struct task_struct *p; > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > return; > } > trace_sched_stat_wait(p, delta); > + update_hierarchy_wait_sum(se, delta); > } > > __schedstat_set(se->statistics.wait_max, > @@ -10273,6 +10287,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) > #ifndef CONFIG_64BIT > cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; > #endif > +#ifdef CONFIG_SCHEDSTATS > + cfs_rq->hierarchy_wait_sum = 0; > +#endif > #ifdef CONFIG_SMP > raw_spin_lock_init(&cfs_rq->removed.lock); > #endif > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index d27c1a5..c01ab99 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -496,6 +496,9 @@ struct cfs_rq { > #ifndef CONFIG_64BIT > u64 min_vruntime_copy; > #endif > +#ifdef CONFIG_SCHEDSTATS > + u64 hierarchy_wait_sum; > +#endif > > struct rb_root_cached tasks_timeline; > > ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAHCio2iYoKOSgxJWFX6KcRcn7GK=aEJP7Sz0vU6SwXK2C56mew@mail.gmail.com>]
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group [not found] ` <CAHCio2iYoKOSgxJWFX6KcRcn7GK=aEJP7Sz0vU6SwXK2C56mew@mail.gmail.com> @ 2019-01-28 2:53 ` 王贇 2019-01-28 7:21 ` 禹舟键 0 siblings, 1 reply; 10+ messages in thread From: 王贇 @ 2019-01-28 2:53 UTC (permalink / raw) To: 禹舟键; +Cc: mingo, Peter Zijlstra, Wind Yu, linux-kernel On 2019/1/25 下午3:55, 禹舟键 wrote: > Hi, Michael > Actually, I just created a container which had just 1 CPU, and I ran two tasks(while 1 loop) in this container. Then I read cpu.stat at 1 second interval ,but the wait_sum difference is almost 0. Task competition inside a cgroup won't be considered as cgroup's competition, please try create another cgroup with dead loop on each CPU. I check your patch and find that you just summary the task group's se wait_sum rather than tasks' se wait sum. So, we can see that the increment of 'wait_sum' will always be 0 as long as there are running tasks in this task group. Running tasks doesn't means no competition, only if that cgroup occupied the CPU exclusively at that moment. Unfortunately, this is not what we want. Our expectation is to evaluate the CPU contention in a task group just as I said. We need to ensure that containers with high priority have sufficient CPU resources. No offense but I'm afraid you misunderstand the problem we try to solve by wait_sum, if your purpose is to have a way to tell whether there are sufficient CPU inside a container, please try lxcfs + top, if there are almost no idle and load is high, then the CPU resource is not sufficient. > > I implement 'hierarchy wait_sum' referred to cpuacct.usage. The hierarchy wait_sum will summarize all the tasks' se wait_sum within the hierarchy of this task group. We > can calculate the wait rate of a task group based on hierarchy wait_sum and cpuacct.usage Frankly speaking this sounds like a supplement rather than a missing piece, although we don't rely on lxcfs and modify the kernel ourselves to support container environment, I still don't think such kind of solutions should be in kernel. Regards, Michael Wang > > The extra overhead for calculating the hierarchy wait_sum is traversing the cfs_rq's se from the target task's se to the root_task_group children's se. > > Regartds > Yuzhoujian > > > 王贇 <yun.wang@linux.alibaba.com <mailto:yun.wang@linux.alibaba.com>> 于2019年1月25日周五 上午11:12写道: > > > > On 2019/1/23 下午5:46, ufo19890607@gmail.com <mailto:ufo19890607@gmail.com> wrote: > > From: yuzhoujian <yuzhoujian@didichuxing.com <mailto:yuzhoujian@didichuxing.com>> > > > > We can monitor the sum wait time of a task group since 'commit 3d6c50c27bd6 > > ("sched/debug: Show the sum wait time of a task group")'. However this > > wait_sum just represents the confilct between different task groups, since > > it is simply sum the wait time of task_group's cfs_rq. And we still cannot > > evaluate the conflict between all the tasks within hierarchy of this group, > > so the hierarchy wait time is still needed. > > Could you please give us a scene that we do need this hierarchy wait_sum, > despite the extra overhead? > > Regards, > Michael Wang > > > > > Thus we introduce hierarchy wait_sum which summarizes the total wait sum of > > all the tasks in the hierarchy of a group. > > > > The 'cpu.stat' is modified to show the statistic, like: > > > > nr_periods 0 > > nr_throttled 0 > > throttled_time 0 > > intergroup wait_sum 2842251984 > > hierarchy wait_sum 6389509389332798 > > > > From now on we can monitor both the wait_sum of intergroup and hierarchy, > > which will inevitably help a system administrator know how intense the CPU > > competition is within a task group and between different task groups. We > > can calculate the wait rate of a task group based on hierarchy wait_sum and > > cpuacct.usage. > > > > For example: > > X% = (current_wait_sum - last_wait_sum) / ((current_usage - > > last_usage) + (current_wait_sum - last_wait_sum)) > > > > That means the task group paid X percentage of time on runqueue waiting > > for the CPU. > > > > Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com <mailto:yuzhoujian@didichuxing.com>> > > --- > > kernel/sched/core.c | 11 +++++++---- > > kernel/sched/fair.c | 17 +++++++++++++++++ > > kernel/sched/sched.h | 3 +++ > > 3 files changed, 27 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index ee77636..172e6fb 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -6760,13 +6760,16 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) > > seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time); > > > > if (schedstat_enabled() && tg != &root_task_group) { > > - u64 ws = 0; > > + u64 inter_ws = 0, hierarchy_ws = 0; > > int i; > > > > - for_each_possible_cpu(i) > > - ws += schedstat_val(tg->se[i]->statistics.wait_sum); > > + for_each_possible_cpu(i) { > > + inter_ws += schedstat_val(tg->se[i]->statistics.wait_sum); > > + hierarchy_ws += tg->cfs_rq[i]->hierarchy_wait_sum; > > + } > > > > - seq_printf(sf, "wait_sum %llu\n", ws); > > + seq_printf(sf, "intergroup wait_sum %llu\n", inter_ws); > > + seq_printf(sf, "hierarchy wait_sum %llu\n", hierarchy_ws); > > } > > > > return 0; > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e2ff4b6..35e89ca 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > > } > > > > static inline void > > +update_hierarchy_wait_sum(struct sched_entity *se, > > + u64 delta_wait) > > +{ > > + for_each_sched_entity(se) { > > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > > + > > + if (cfs_rq->tg != &root_task_group) > > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > > + delta_wait); > > + } > > +} > > + > > +static inline void > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > struct task_struct *p; > > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > > return; > > } > > trace_sched_stat_wait(p, delta); > > + update_hierarchy_wait_sum(se, delta); > > } > > > > __schedstat_set(se->statistics.wait_max, > > @@ -10273,6 +10287,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) > > #ifndef CONFIG_64BIT > > cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; > > #endif > > +#ifdef CONFIG_SCHEDSTATS > > + cfs_rq->hierarchy_wait_sum = 0; > > +#endif > > #ifdef CONFIG_SMP > > raw_spin_lock_init(&cfs_rq->removed.lock); > > #endif > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > index d27c1a5..c01ab99 100644 > > --- a/kernel/sched/sched.h > > +++ b/kernel/sched/sched.h > > @@ -496,6 +496,9 @@ struct cfs_rq { > > #ifndef CONFIG_64BIT > > u64 min_vruntime_copy; > > #endif > > +#ifdef CONFIG_SCHEDSTATS > > + u64 hierarchy_wait_sum; > > +#endif > > > > struct rb_root_cached tasks_timeline; > > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-01-28 2:53 ` 王贇 @ 2019-01-28 7:21 ` 禹舟键 2019-01-30 1:53 ` 王贇 0 siblings, 1 reply; 10+ messages in thread From: 禹舟键 @ 2019-01-28 7:21 UTC (permalink / raw) To: 王贇; +Cc: mingo, Peter Zijlstra, Wind Yu, linux-kernel Hi Michael > Task competition inside a cgroup won't be considered as cgroup's > competition, please try create another cgroup with dead loop on > each CPU Yes, you are right, but I don't think we just need to account for cgroup's competition, because this factor does not reflect cgroup internal conditions. We still need a proper method to evaluate CPU competition inside a cgroup. > Running tasks doesn't means no competition, only if that cgroup occupied > the CPU exclusively at that moment. I care much about CPU competiton inside a cgroup. I can only read '/proc/$pid/schedstat' thousands of times to get every task's wait_sum time without cgroup hierarchy wait_sum, and it definitely tasks a real long time(40ms for 8000 tasks in a container). > No offense but I'm afraid you misunderstand the problem we try to solve > by wait_sum, if your purpose is to have a way to tell whether there are > sufficient CPU inside a container, please try lxcfs + top, if there are > almost no idle and load is high, then the CPU resource is not sufficient. emmmm... Maybe I didn't make it clear. We need to dynamically adjust the number of CPUs for a container based on the running state of tasks inside the container. If we find tasks' wait_sum are really high, we will add more CPU cores to this container, or else we will decline some CPU to this container. In a word, we want to ensure 'co-scheduling' for high priority containers. >Frankly speaking this sounds like a supplement rather than a missing piece, >although we don't rely on lxcfs and modify the kernel ourselves to support >container environment, I still don't think such kind of solutions should be >in kernel. I don't care if this value is considered as a supplement or a missing piece. I only care about how can I assess the running state inside a container. I think lxcfs is really a good solution to improve the visibility of container resources, but it is not good enough at the moment. /proc/cpuinfo /proc/diskstats /proc/meminfo /proc/stat /proc/swaps /proc/uptime we can read this procfs file inside a container,but this file still cannot reflect real-time information. Please think about the following scenario: a 'rabbit' process will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms and then exit. How can we detect this thrashing workload without hierarchy wait_sum? Thanks, Yuzhoujian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-01-28 7:21 ` 禹舟键 @ 2019-01-30 1:53 ` 王贇 0 siblings, 0 replies; 10+ messages in thread From: 王贇 @ 2019-01-30 1:53 UTC (permalink / raw) To: 禹舟键; +Cc: mingo, Peter Zijlstra, Wind Yu, linux-kernel On 2019/1/28 下午3:21, 禹舟键 wrote: [snip] >> No offense but I'm afraid you misunderstand the problem we try to solve >> by wait_sum, if your purpose is to have a way to tell whether there are >> sufficient CPU inside a container, please try lxcfs + top, if there are >> almost no idle and load is high, then the CPU resource is not sufficient. > > emmmm... Maybe I didn't make it clear. We need to dynamically adjust the > number of CPUs for a container based on the running state of tasks inside > the container. If we find tasks' wait_sum are really high, we will add more > CPU cores to this container, or else we will decline some CPU to this container. > In a word, we want to ensure 'co-scheduling' for high priority containers. > I understand that you want to use task wait time which is a raw metric, but IMHO when task wait more, idle will be less and load will be high, they are more general metric to tell whether there are CPU starving rather then task wait time on rq, and we rely on these too. The only issue we got previously is that we don't know what caused the less idle and high load, could be wrong resource assignment or cgroup competition, and now with wait_sum we can firstly make sure competition is low, then if idle is still low and load is still high inside container, time to assign more CPU. > >> Frankly speaking this sounds like a supplement rather than a missing piece, >> although we don't rely on lxcfs and modify the kernel ourselves to support >> container environment, I still don't think such kind of solutions should be >> in kernel. > > I don't care if this value is considered as a supplement or a missing piece. I > only care about how can I assess the running state inside a container. I think > lxcfs is really a good solution to improve the visibility of container > resources, > but it is not good enough at the moment. > > /proc/cpuinfo > /proc/diskstats > /proc/meminfo > /proc/stat > /proc/swaps > /proc/uptime > > we can read this procfs file inside a container,but this file still > cannot reflect > real-time information. Please think about the following scenario: a > 'rabbit' process > will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms > and then exit. How can we detect this thrashing workload without > hierarchy wait_sum? As mentioned, we implement the isolation on ourselves, so we will see the isolated idle and load information inside container, rather then the host data, we don't rely on lxcfs but we know it's something doing similar work, so what you got by reading /proc/stat, does it tell the isolated idle data? And you need a isolated /proc/loadavg too. Anyway, IMHO this is a special requirement only for container environment, not a general solution for kernel problem, so I would suggest either help improve the lxcfs to make it useful for your production, or do the modification in your own kernel. Regards, Michael Wang > > Thanks, > Yuzhoujian > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-01-23 9:46 [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group ufo19890607 2019-01-25 3:11 ` 王贇 @ 2019-02-06 17:19 ` Peter Zijlstra 2019-02-11 2:43 ` 禹舟键 2019-02-12 3:14 ` 禹舟键 1 sibling, 2 replies; 10+ messages in thread From: Peter Zijlstra @ 2019-02-06 17:19 UTC (permalink / raw) To: ufo19890607; +Cc: mingo, yun.wang, yuzhoujian, linux-kernel On Wed, Jan 23, 2019 at 05:46:56PM +0800, ufo19890607@gmail.com wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e2ff4b6..35e89ca 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > } > > static inline void > +update_hierarchy_wait_sum(struct sched_entity *se, > + u64 delta_wait) > +{ > + for_each_sched_entity(se) { > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > + > + if (cfs_rq->tg != &root_task_group) > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > + delta_wait); > + } > +} > + > +static inline void > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > struct task_struct *p; > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > return; > } > trace_sched_stat_wait(p, delta); > + update_hierarchy_wait_sum(se, delta); > } > > __schedstat_set(se->statistics.wait_max, The problem I have with this is that it will make schedstats even more expensive :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-02-06 17:19 ` Peter Zijlstra @ 2019-02-11 2:43 ` 禹舟键 2019-02-12 3:14 ` 禹舟键 1 sibling, 0 replies; 10+ messages in thread From: 禹舟键 @ 2019-02-11 2:43 UTC (permalink / raw) To: Peter Zijlstra; +Cc: mingo, 王贇, Wind Yu, linux-kernel Hi Peter > The problem I have with this is that it will make schedstats even more expensive :/ I think the overhead for accounting hierarchy wait time is just the same as cpuacct.usage. If the performance overhead is low enough(< 1%), is it acceptable? Thanks Yuzhoujian Peter Zijlstra <peterz@infradead.org> 于2019年2月7日周四 上午1:19写道: > > On Wed, Jan 23, 2019 at 05:46:56PM +0800, ufo19890607@gmail.com wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e2ff4b6..35e89ca 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > > } > > > > static inline void > > +update_hierarchy_wait_sum(struct sched_entity *se, > > + u64 delta_wait) > > +{ > > + for_each_sched_entity(se) { > > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > > + > > + if (cfs_rq->tg != &root_task_group) > > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > > + delta_wait); > > + } > > +} > > + > > +static inline void > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > struct task_struct *p; > > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > > return; > > } > > trace_sched_stat_wait(p, delta); > > + update_hierarchy_wait_sum(se, delta); > > } > > > > __schedstat_set(se->statistics.wait_max, > > The problem I have with this is that it will make schedstats even more > expensive :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-02-06 17:19 ` Peter Zijlstra 2019-02-11 2:43 ` 禹舟键 @ 2019-02-12 3:14 ` 禹舟键 2019-02-19 2:15 ` 禹舟键 1 sibling, 1 reply; 10+ messages in thread From: 禹舟键 @ 2019-02-12 3:14 UTC (permalink / raw) To: Peter Zijlstra; +Cc: mingo, 王贇, Wind Yu, linux-kernel Hi, Peter I think hierarchy wait time for task groups is worth accounting despite with a little extra overhead. Because we can evaluate task groups' condition with a more direct metric. We cannot get the real situation just with some general metrics, like idle or loadavg, since their value is decreased with the elapse of time. So, I think general metrics cannot satisify the request for task groups. Thanks Yuzhoujian Peter Zijlstra <peterz@infradead.org> 于2019年2月7日周四 上午1:19写道: > > On Wed, Jan 23, 2019 at 05:46:56PM +0800, ufo19890607@gmail.com wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index e2ff4b6..35e89ca 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > > } > > > > static inline void > > +update_hierarchy_wait_sum(struct sched_entity *se, > > + u64 delta_wait) > > +{ > > + for_each_sched_entity(se) { > > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > > + > > + if (cfs_rq->tg != &root_task_group) > > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > > + delta_wait); > > + } > > +} > > + > > +static inline void > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > struct task_struct *p; > > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > > return; > > } > > trace_sched_stat_wait(p, delta); > > + update_hierarchy_wait_sum(se, delta); > > } > > > > __schedstat_set(se->statistics.wait_max, > > The problem I have with this is that it will make schedstats even more > expensive :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-02-12 3:14 ` 禹舟键 @ 2019-02-19 2:15 ` 禹舟键 2019-02-25 3:29 ` 禹舟键 0 siblings, 1 reply; 10+ messages in thread From: 禹舟键 @ 2019-02-19 2:15 UTC (permalink / raw) To: Peter Zijlstra; +Cc: mingo, 王贇, Wind Yu, linux-kernel PING 禹舟键 <ufo19890607@gmail.com> 于2019年2月12日周二 上午11:14写道: > > Hi, Peter > I think hierarchy wait time for task groups is worth accounting > despite with a little extra overhead. Because we can evaluate task > groups' condition with a more direct metric. We cannot get the real > situation just with some general metrics, like idle or loadavg, since > their value is decreased with the elapse of time. So, I think general > metrics cannot satisify the request for task groups. > > Thanks > Yuzhoujian > > > Peter Zijlstra <peterz@infradead.org> 于2019年2月7日周四 上午1:19写道: > > > > On Wed, Jan 23, 2019 at 05:46:56PM +0800, ufo19890607@gmail.com wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index e2ff4b6..35e89ca 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > > > } > > > > > > static inline void > > > +update_hierarchy_wait_sum(struct sched_entity *se, > > > + u64 delta_wait) > > > +{ > > > + for_each_sched_entity(se) { > > > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > > > + > > > + if (cfs_rq->tg != &root_task_group) > > > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > > > + delta_wait); > > > + } > > > +} > > > + > > > +static inline void > > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > > { > > > struct task_struct *p; > > > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > > > return; > > > } > > > trace_sched_stat_wait(p, delta); > > > + update_hierarchy_wait_sum(se, delta); > > > } > > > > > > __schedstat_set(se->statistics.wait_max, > > > > The problem I have with this is that it will make schedstats even more > > expensive :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group 2019-02-19 2:15 ` 禹舟键 @ 2019-02-25 3:29 ` 禹舟键 0 siblings, 0 replies; 10+ messages in thread From: 禹舟键 @ 2019-02-25 3:29 UTC (permalink / raw) To: Peter Zijlstra; +Cc: mingo, 王贇, Wind Yu, linux-kernel PING 禹舟键 <ufo19890607@gmail.com> 于2019年2月19日周二 上午10:15写道: > > PING > > 禹舟键 <ufo19890607@gmail.com> 于2019年2月12日周二 上午11:14写道: > > > > Hi, Peter > > I think hierarchy wait time for task groups is worth accounting > > despite with a little extra overhead. Because we can evaluate task > > groups' condition with a more direct metric. We cannot get the real > > situation just with some general metrics, like idle or loadavg, since > > their value is decreased with the elapse of time. So, I think general > > metrics cannot satisify the request for task groups. > > > > Thanks > > Yuzhoujian > > > > > > Peter Zijlstra <peterz@infradead.org> 于2019年2月7日周四 上午1:19写道: > > > > > > On Wed, Jan 23, 2019 at 05:46:56PM +0800, ufo19890607@gmail.com wrote: > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index e2ff4b6..35e89ca 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > > > > } > > > > > > > > static inline void > > > > +update_hierarchy_wait_sum(struct sched_entity *se, > > > > + u64 delta_wait) > > > > +{ > > > > + for_each_sched_entity(se) { > > > > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > > > > + > > > > + if (cfs_rq->tg != &root_task_group) > > > > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > > > > + delta_wait); > > > > + } > > > > +} > > > > + > > > > +static inline void > > > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > > > { > > > > struct task_struct *p; > > > > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > > > > return; > > > > } > > > > trace_sched_stat_wait(p, delta); > > > > + update_hierarchy_wait_sum(se, delta); > > > > } > > > > > > > > __schedstat_set(se->statistics.wait_max, > > > > > > The problem I have with this is that it will make schedstats even more > > > expensive :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-02-25 3:30 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-23 9:46 [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group ufo19890607 2019-01-25 3:11 ` 王贇 [not found] ` <CAHCio2iYoKOSgxJWFX6KcRcn7GK=aEJP7Sz0vU6SwXK2C56mew@mail.gmail.com> 2019-01-28 2:53 ` 王贇 2019-01-28 7:21 ` 禹舟键 2019-01-30 1:53 ` 王贇 2019-02-06 17:19 ` Peter Zijlstra 2019-02-11 2:43 ` 禹舟键 2019-02-12 3:14 ` 禹舟键 2019-02-19 2:15 ` 禹舟键 2019-02-25 3:29 ` 禹舟键
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).