* [RFC PATCH 0/5] CFS load tracking trace events @ 2017-03-28 6:35 Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann ` (5 more replies) 0 siblings, 6 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi This patch-set introduces trace events for load (and utilization) tracking for the following three cfs scheduler bricks: cfs_rq, sched_entity and task_group. I've decided to sent it out because people are discussing problems with PELT related functionality and as a base for the related discussion next week at the OSPM-summit in Pisa. The requirements for the design are: (1) Support for all combinations related to the following kernel configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED, CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL. (2) All key=value pairs have to appear in all combinations of the kernel configuration options mentioned under (1). (3) Stable interface, i.e. only use keys for a key=value pairs which are independent from the underlying (load tracking) implementation. (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace event should not have to be guarded by an if condition (e.g. entity_is_task(se) to distinguish between task and task_group) or kernel configuration switch. The trace events only expose one load (and one utilization) key=value pair besides those used to identify the cfs scheduler brick. They do not provide any internal implementation details of the actual (PELT) load-tracking mechanism. This limitation might be too much since for a cfs_rq we can only trace runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load (cfs_rq->avg.load_avg). Other load metrics like instantaneous load ({cfs_rq|se}->load.weight) are not traced but could be easily added. The following keys are used to identify the cfs scheduler brick: (1) Cpu number the cfs scheduler brick is attached to. (2) Task_group path and (css) id. (3) Task name and pid. In case a key does not apply due to an unset Kernel configuration option or the fact that a sched_entity can represent either a task or a task_group its value is set to an invalid default: (1) Task_group path: "(null)" (2) Task group id: -1 (3) Task name: "(null)" (4) Task pid: -1 Load tracking trace events are placed into kernel/sched/fair.c: (1) for cfs_rq: - In PELT core function __update_load_avg(). - During sched_entity attach/detach. - During sched_entity load/util propagation down the task_group hierarchy. (2) for sched_entity: - In PELT core function __update_load_avg(). - During sched_entity load/util propagation down the task_group hierarchy. (3) for task_group: - In its PELT update function. An alternative for using __update_load_avg() would be to put trace events into update_cfs_rq_load_avg() for cfs_rq's and into set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for sched_entities. This would also get rid of the if(cfs_rq)/else condition in __update_load_avg(). These trace events still look a bit fragile. First of all, this patch-set has to use cfs specific data structures in the global task scheduler trace file include/trace/events/sched.h. And second, the accessor-functions (rq_of(), task_of(), etc.) are private to the cfs scheduler class. In case they would be public these trace events would be easier to code. That said, group_cfs_rq() is already made public by this patch-stack. This version bases on tip/sched/core as of yesterday (bc4278987e38). It has been compile tested on ~160 configurations via 0day's kbuild test robot. Dietmar Eggemann (5): sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG sched/events: Introduce cfs_rq load tracking trace event sched/fair: Export group_cfs_rq() sched/events: Introduce sched_entity load tracking trace event sched/events: Introduce task_group load tracking trace event include/linux/sched.h | 10 +++ include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++ kernel/sched/autogroup.c | 2 - kernel/sched/autogroup.h | 2 - kernel/sched/fair.c | 26 ++++---- 5 files changed, 167 insertions(+), 16 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann @ 2017-03-28 6:35 ` Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann ` (4 subsequent siblings) 5 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi Define autogroup_path() even in the !CONFIG_SCHED_DEBUG case. If CONFIG_SCHED_AUTOGROUP is enabled the path of an autogroup has to be available to be printed in the load tracking trace events provided by this patch-stack regardless whether CONFIG_SCHED_DEBUG is set or not. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> --- kernel/sched/autogroup.c | 2 -- kernel/sched/autogroup.h | 2 -- 2 files changed, 4 deletions(-) diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c index da39489d2d80..22be4eaf6ae2 100644 --- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -260,7 +260,6 @@ void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m) } #endif /* CONFIG_PROC_FS */ -#ifdef CONFIG_SCHED_DEBUG int autogroup_path(struct task_group *tg, char *buf, int buflen) { if (!task_group_is_autogroup(tg)) @@ -268,4 +267,3 @@ int autogroup_path(struct task_group *tg, char *buf, int buflen) return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id); } -#endif /* CONFIG_SCHED_DEBUG */ diff --git a/kernel/sched/autogroup.h b/kernel/sched/autogroup.h index ce40c810cd5c..6a661cfa9584 100644 --- a/kernel/sched/autogroup.h +++ b/kernel/sched/autogroup.h @@ -55,11 +55,9 @@ autogroup_task_group(struct task_struct *p, struct task_group *tg) return tg; } -#ifdef CONFIG_SCHED_DEBUG static inline int autogroup_path(struct task_group *tg, char *buf, int buflen) { return 0; } -#endif #endif /* CONFIG_SCHED_AUTOGROUP */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann @ 2017-03-28 6:35 ` Dietmar Eggemann 2017-03-28 7:56 ` Peter Zijlstra ` (2 more replies) 2017-03-28 6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann ` (3 subsequent siblings) 5 siblings, 3 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi The trace event keys load and util (utilization) are mapped to: (1) load : cfs_rq->runnable_load_avg (2) util : cfs_rq->avg.util_avg To let this trace event work for configurations w/ and w/o group scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following special handling is necessary for non-existent key=value pairs: path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED. id = -1 : In case of !CONFIG_FAIR_GROUP_SCHED. The following list shows examples of the key=value pairs in different configurations for: (1) a root task_group: cpu=4 path=/ id=1 load=6 util=331 (2) a task_group: cpu=1 path=/tg1/tg11/tg111 id=4 load=538 util=522 (3) an autogroup: cpu=3 path=/autogroup-18 id=0 load=997 util=517 (4) w/o CONFIG_FAIR_GROUP_SCHED: cpu=0 path=(null) id=-1 load=314 util=289 The trace event is only defined for CONFIG_SMP. The helper function __trace_sched_path() can be used to get the length parameter of the dynamic array (path == NULL) and to copy the path into it (path != NULL). Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> --- include/trace/events/sched.h | 73 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/fair.c | 9 ++++++ 2 files changed, 82 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 9e3ef6c99e4b..51db8a90e45f 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -562,6 +562,79 @@ TRACE_EVENT(sched_wake_idle_without_ipi, TP_printk("cpu=%d", __entry->cpu) ); + +#ifdef CONFIG_SMP +#ifdef CREATE_TRACE_POINTS +static inline +int __trace_sched_cpu(struct cfs_rq *cfs_rq) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + struct rq *rq = cfs_rq->rq; +#else + struct rq *rq = container_of(cfs_rq, struct rq, cfs); +#endif + return cpu_of(rq); +} + +static inline +int __trace_sched_path(struct cfs_rq *cfs_rq, char *path, int len) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + int l = path ? len : 0; + + if (task_group_is_autogroup(cfs_rq->tg)) + return autogroup_path(cfs_rq->tg, path, l) + 1; + else + return cgroup_path(cfs_rq->tg->css.cgroup, path, l) + 1; +#else + if (path) + strcpy(path, "(null)"); + + return strlen("(null)"); +#endif +} + +static inline int __trace_sched_id(struct cfs_rq *cfs_rq) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + return cfs_rq->tg->css.id; +#else + return -1; +#endif +} +#endif /* CREATE_TRACE_POINTS */ + +/* + * Tracepoint for cfs_rq load tracking: + */ +TRACE_EVENT(sched_load_cfs_rq, + + TP_PROTO(struct cfs_rq *cfs_rq), + + TP_ARGS(cfs_rq), + + TP_STRUCT__entry( + __field( int, cpu ) + __dynamic_array(char, path, + __trace_sched_path(cfs_rq, NULL, 0) ) + __field( int, id ) + __field( unsigned long, load ) + __field( unsigned long, util ) + ), + + TP_fast_assign( + __entry->cpu = __trace_sched_cpu(cfs_rq); + __trace_sched_path(cfs_rq, __get_dynamic_array(path), + __get_dynamic_array_len(path)); + __entry->id = __trace_sched_id(cfs_rq); + __entry->load = cfs_rq->runnable_load_avg; + __entry->util = cfs_rq->avg.util_avg; + ), + + TP_printk("cpu=%d path=%s id=%d load=%lu util=%lu", __entry->cpu, + __get_str(path), __entry->id, __entry->load, __entry->util) +); +#endif /* CONFIG_SMP */ #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 03adf9fb48b1..ac19ab6ced8f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, sa->util_avg = sa->util_sum / LOAD_AVG_MAX; } + if (cfs_rq) + trace_sched_load_cfs_rq(cfs_rq); + return decayed; } @@ -3170,6 +3173,8 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) update_tg_cfs_util(cfs_rq, se); update_tg_cfs_load(cfs_rq, se); + trace_sched_load_cfs_rq(cfs_rq); + return 1; } @@ -3359,6 +3364,8 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s set_tg_cfs_propagate(cfs_rq); cfs_rq_util_change(cfs_rq); + + trace_sched_load_cfs_rq(cfs_rq); } /** @@ -3379,6 +3386,8 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s set_tg_cfs_propagate(cfs_rq); cfs_rq_util_change(cfs_rq); + + trace_sched_load_cfs_rq(cfs_rq); } /* Add the load generated by se into cfs_rq's load average */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann @ 2017-03-28 7:56 ` Peter Zijlstra 2017-03-28 13:30 ` Dietmar Eggemann 2017-03-28 8:00 ` Peter Zijlstra 2017-03-28 14:46 ` Steven Rostedt 2 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 7:56 UTC (permalink / raw) To: Dietmar Eggemann Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote: > The trace event keys load and util (utilization) are mapped to: > > (1) load : cfs_rq->runnable_load_avg > > (2) util : cfs_rq->avg.util_avg > > To let this trace event work for configurations w/ and w/o group > scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following > special handling is necessary for non-existent key=value pairs: > > path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED. > > id = -1 : In case of !CONFIG_FAIR_GROUP_SCHED. > > The following list shows examples of the key=value pairs in different > configurations for: > > (1) a root task_group: > > cpu=4 path=/ id=1 load=6 util=331 What's @id and why do we care? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 7:56 ` Peter Zijlstra @ 2017-03-28 13:30 ` Dietmar Eggemann 0 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 13:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 09:56 AM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote: [...] >> (1) a root task_group: >> >> cpu=4 path=/ id=1 load=6 util=331 > > What's @id and why do we care? It's a per cgroup/subsystem unique id for every task_group (cpu controller): struct task_group { struct cgroup_subsys_state css { ... int id; ... } ... } The root task group path=/ has id=1 and all autogroups have id=0. I agree, this id is redundant in case we have the task_group path. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann 2017-03-28 7:56 ` Peter Zijlstra @ 2017-03-28 8:00 ` Peter Zijlstra 2017-03-28 14:47 ` Steven Rostedt 2017-03-28 14:46 ` Steven Rostedt 2 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 8:00 UTC (permalink / raw) To: Dietmar Eggemann Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote: > > + if (cfs_rq) > + trace_sched_load_cfs_rq(cfs_rq); You can do that with DEFINE_EVENT_CONDITION and TP_CONDITION. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 8:00 ` Peter Zijlstra @ 2017-03-28 14:47 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2017-03-28 14:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, 28 Mar 2017 10:00:50 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote: > > > > + if (cfs_rq) > > + trace_sched_load_cfs_rq(cfs_rq); > > You can do that with DEFINE_EVENT_CONDITION and TP_CONDITION. I just read this after I replied about using TRACE_EVENT_CONDITION. As there is not a DEFINE_EVENT(), the TRACE_EVENT_CONDITION() should be used. All the locations expect cfs_rq to not be NULL I assume. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann 2017-03-28 7:56 ` Peter Zijlstra 2017-03-28 8:00 ` Peter Zijlstra @ 2017-03-28 14:46 ` Steven Rostedt 2017-03-28 16:44 ` Peter Zijlstra 2 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2017-03-28 14:46 UTC (permalink / raw) To: Dietmar Eggemann Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, 28 Mar 2017 07:35:38 +0100 Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > /* This part must be outside protection */ > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 03adf9fb48b1..ac19ab6ced8f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > } > > + if (cfs_rq) > + trace_sched_load_cfs_rq(cfs_rq); > + Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL. That way it moves the if (cfs_rq) out of the scheduler code and into the jump label protected location. That is, the if is only tested when tracing is enabled. -- Steve > return decayed; > } > > @@ -3170,6 +3173,8 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) > update_tg_cfs_util(cfs_rq, se); > update_tg_cfs_load(cfs_rq, se); > > + trace_sched_load_cfs_rq(cfs_rq); > + > return 1; > } > > @@ -3359,6 +3364,8 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > set_tg_cfs_propagate(cfs_rq); > > cfs_rq_util_change(cfs_rq); > + > + trace_sched_load_cfs_rq(cfs_rq); > } > > /** > @@ -3379,6 +3386,8 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > set_tg_cfs_propagate(cfs_rq); > > cfs_rq_util_change(cfs_rq); > + > + trace_sched_load_cfs_rq(cfs_rq); > } > > /* Add the load generated by se into cfs_rq's load average */ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 14:46 ` Steven Rostedt @ 2017-03-28 16:44 ` Peter Zijlstra 2017-03-28 16:57 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 16:44 UTC (permalink / raw) To: Steven Rostedt Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote: > On Tue, 28 Mar 2017 07:35:38 +0100 > Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > > /* This part must be outside protection */ > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 03adf9fb48b1..ac19ab6ced8f 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > > } > > > > + if (cfs_rq) > > + trace_sched_load_cfs_rq(cfs_rq); > > + > > Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL. I too suggested that; but then I looked again at that code and we can actually do this. cfs_rq can be constant propagated and the if determined at build time. Its not immediately obvious from the current code; but if we do something like the below, it should be clearer. --- Subject: sched/fair: Explicitly generate __update_load_avg() instances From: Peter Zijlstra <peterz@infradead.org> Date: Tue Mar 28 11:08:20 CEST 2017 The __update_load_avg() function is an __always_inline because its used with constant propagation to generate different variants of the code without having to duplicate it (which would be prone to bugs). Explicitly instantiate the 3 variants. Note that most of this is called from rather hot paths, so reducing branches is good. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6 * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}] */ static __always_inline int -__update_load_avg(u64 now, int cpu, struct sched_avg *sa, +___update_load_avg(u64 now, int cpu, struct sched_avg *sa, unsigned long weight, int running, struct cfs_rq *cfs_rq) { u64 delta, scaled_delta, periods; @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru return decayed; } +static int +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa) +{ + return ___update_load_avg(now, cpu, sa, 0, 0, NULL); +} + +static int +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa, + unsigned long weight, int running) +{ + return ___update_load_avg(now, cpu, sa, weight, running, NULL); +} + +static int +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, + unsigned long weight, int running, struct cfs_rq *cfs_rq) +{ + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); +} + /* * Signed add and clamp on underflow. * @@ -3014,6 +3034,9 @@ static inline void update_tg_load_avg(st void set_task_rq_fair(struct sched_entity *se, struct cfs_rq *prev, struct cfs_rq *next) { + u64 p_last_update_time; + u64 n_last_update_time; + if (!sched_feat(ATTACH_AGE_LOAD)) return; @@ -3024,11 +3047,11 @@ void set_task_rq_fair(struct sched_entit * time. This will result in the wakee task is less decayed, but giving * the wakee more load sounds not bad. */ - if (se->avg.last_update_time && prev) { - u64 p_last_update_time; - u64 n_last_update_time; + if (!(se->avg.last_update_time && prev)) + return; #ifndef CONFIG_64BIT + { u64 p_last_update_time_copy; u64 n_last_update_time_copy; @@ -3043,14 +3066,15 @@ void set_task_rq_fair(struct sched_entit } while (p_last_update_time != p_last_update_time_copy || n_last_update_time != n_last_update_time_copy); + } #else - p_last_update_time = prev->avg.last_update_time; - n_last_update_time = next->avg.last_update_time; + p_last_update_time = prev->avg.last_update_time; + n_last_update_time = next->avg.last_update_time; #endif - __update_load_avg(p_last_update_time, cpu_of(rq_of(prev)), - &se->avg, 0, 0, NULL); - se->avg.last_update_time = n_last_update_time; - } + __update_load_avg_blocked_se(p_last_update_time, + cpu_of(rq_of(prev)), + &se->avg); + se->avg.last_update_time = n_last_update_time; } /* Take into account change of utilization of a child task group */ @@ -3329,9 +3353,9 @@ static inline void update_load_avg(struc * track group sched_entity load average for task_h_load calc in migration */ if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) { - __update_load_avg(now, cpu, &se->avg, + __update_load_avg_se(now, cpu, &se->avg, se->on_rq * scale_load_down(se->load.weight), - cfs_rq->curr == se, NULL); + cfs_rq->curr == se); } decayed = update_cfs_rq_load_avg(now, cfs_rq, true); @@ -3437,7 +3461,7 @@ void sync_entity_load_avg(struct sched_e u64 last_update_time; last_update_time = cfs_rq_last_update_time(cfs_rq); - __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL); + __update_load_avg_blocked_se(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg); } /* ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 16:44 ` Peter Zijlstra @ 2017-03-28 16:57 ` Peter Zijlstra 2017-03-28 17:20 ` Patrick Bellasi 2017-03-28 17:36 ` Steven Rostedt 2017-03-29 21:03 ` Dietmar Eggemann 2 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 16:57 UTC (permalink / raw) To: Steven Rostedt Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote: > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6 > * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}] > */ > static __always_inline int > -__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > +___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > unsigned long weight, int running, struct cfs_rq *cfs_rq) > { > u64 delta, scaled_delta, periods; > @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru > return decayed; > } > > +static int > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa) > +{ > + return ___update_load_avg(now, cpu, sa, 0, 0, NULL); > +} > + > +static int > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa, > + unsigned long weight, int running) > +{ > + return ___update_load_avg(now, cpu, sa, weight, running, NULL); > +} > + > +static int > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > + unsigned long weight, int running, struct cfs_rq *cfs_rq) > +{ > + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); Although ideally we'd be able to tell the compiler that cfs_rq will not be NULL here. Hurmph.. no __builtin for that I think :/ > +} ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 16:57 ` Peter Zijlstra @ 2017-03-28 17:20 ` Patrick Bellasi 2017-03-28 18:18 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Patrick Bellasi @ 2017-03-28 17:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli On 28-Mar 18:57, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote: > > --- > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6 > > * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}] > > */ > > static __always_inline int > > -__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > +___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > unsigned long weight, int running, struct cfs_rq *cfs_rq) > > { > > u64 delta, scaled_delta, periods; > > @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru > > return decayed; > > } > > > > +static int > > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa) > > +{ > > + return ___update_load_avg(now, cpu, sa, 0, 0, NULL); > > +} > > + > > +static int > > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa, > > + unsigned long weight, int running) > > +{ > > + return ___update_load_avg(now, cpu, sa, weight, running, NULL); > > +} > > + > > +static int > > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > + unsigned long weight, int running, struct cfs_rq *cfs_rq) __attribute__((nonnull (6))); > > +{ > > + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); > > Although ideally we'd be able to tell the compiler that cfs_rq will not > be NULL here. Hurmph.. no __builtin for that I think :/ What about the above attribute? > > > +} -- #include <best/regards.h> Patrick Bellasi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 17:20 ` Patrick Bellasi @ 2017-03-28 18:18 ` Peter Zijlstra 0 siblings, 0 replies; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 18:18 UTC (permalink / raw) To: Patrick Bellasi Cc: Steven Rostedt, Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli On Tue, Mar 28, 2017 at 06:20:05PM +0100, Patrick Bellasi wrote: > On 28-Mar 18:57, Peter Zijlstra wrote: > > On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote: > > > +static int > > > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > > + unsigned long weight, int running, struct cfs_rq *cfs_rq) > > __attribute__((nonnull (6))); > > > > +{ > > > + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); > > > > Although ideally we'd be able to tell the compiler that cfs_rq will not > > be NULL here. Hurmph.. no __builtin for that I think :/ > > What about the above attribute? Ooh, shiny, thanks! My bad for failing to check the function attributes. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 16:44 ` Peter Zijlstra 2017-03-28 16:57 ` Peter Zijlstra @ 2017-03-28 17:36 ` Steven Rostedt 2017-03-28 17:37 ` Steven Rostedt 2017-03-29 21:03 ` Dietmar Eggemann 2 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2017-03-28 17:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, 28 Mar 2017 18:44:59 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote: > > On Tue, 28 Mar 2017 07:35:38 +0100 > > Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > > > > /* This part must be outside protection */ > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 03adf9fb48b1..ac19ab6ced8f 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > > sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > > > } > > > > > > + if (cfs_rq) > > > + trace_sched_load_cfs_rq(cfs_rq); > > > + > > > > Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL. > > I too suggested that; but then I looked again at that code and we can > actually do this. cfs_rq can be constant propagated and the if > determined at build time. > > Its not immediately obvious from the current code; but if we do > something like the below, it should be clearer. > But why play games, and rely on the design of the code? A TRACE_EVENT_CONDTION() is more robust and documents that this tracepoint should not be called when cfs_rq is NULL. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 17:36 ` Steven Rostedt @ 2017-03-28 17:37 ` Steven Rostedt 2017-03-29 20:37 ` Dietmar Eggemann 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2017-03-28 17:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, 28 Mar 2017 13:36:26 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > But why play games, and rely on the design of the code? A > TRACE_EVENT_CONDTION() is more robust and documents that this > tracepoint should not be called when cfs_rq is NULL. In other words, what are you trying to save for not using the TRACE_EVENT_CONDITION()? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 17:37 ` Steven Rostedt @ 2017-03-29 20:37 ` Dietmar Eggemann 0 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-29 20:37 UTC (permalink / raw) To: Steven Rostedt, Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 07:37 PM, Steven Rostedt wrote: > On Tue, 28 Mar 2017 13:36:26 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> But why play games, and rely on the design of the code? A >> TRACE_EVENT_CONDTION() is more robust and documents that this >> tracepoint should not be called when cfs_rq is NULL. > > In other words, what are you trying to save for not using the > TRACE_EVENT_CONDITION()? IMHO, if we could avoid this if(cfs_rq) trace_sched_load_cfs_rq(cfs_rq); else trace_sched_load_se(container_of(sa, struct sched_entity, avg)); in __update_load_avg(), then we can use 'unconditional' TRACE_EVENTs in all call-sites: __update_load_avg{_cfs_rq}(), propagate_entity_load_avg(), attach_entity_load_avg(), detach_entity_load_avg() for cfs_rq and __update_load_avg_blocked_se(), __update_load_avg_se(), propagate_entity_load_avg() for se. [...] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-28 16:44 ` Peter Zijlstra 2017-03-28 16:57 ` Peter Zijlstra 2017-03-28 17:36 ` Steven Rostedt @ 2017-03-29 21:03 ` Dietmar Eggemann 2017-03-30 7:04 ` Peter Zijlstra 2 siblings, 1 reply; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-29 21:03 UTC (permalink / raw) To: Peter Zijlstra, Steven Rostedt Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 06:44 PM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote: >> On Tue, 28 Mar 2017 07:35:38 +0100 >> Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: [...] > I too suggested that; but then I looked again at that code and we can > actually do this. cfs_rq can be constant propagated and the if > determined at build time. > > Its not immediately obvious from the current code; but if we do > something like the below, it should be clearer. > > --- > Subject: sched/fair: Explicitly generate __update_load_avg() instances > From: Peter Zijlstra <peterz@infradead.org> > Date: Tue Mar 28 11:08:20 CEST 2017 > > The __update_load_avg() function is an __always_inline because its > used with constant propagation to generate different variants of the > code without having to duplicate it (which would be prone to bugs). Ah, so the if(cfs_rq)/else condition should stay in ___update_load_avg() and I shouldn't move the trace events into the 3 variants? I tried to verify that the if is determined at build time but it's kind of hard with trace_events. > Explicitly instantiate the 3 variants. > > Note that most of this is called from rather hot paths, so reducing > branches is good. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6 > * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}] > */ > static __always_inline int > -__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > +___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > unsigned long weight, int running, struct cfs_rq *cfs_rq) > { > u64 delta, scaled_delta, periods; > @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru > return decayed; > } > > +static int > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa) > +{ > + return ___update_load_avg(now, cpu, sa, 0, 0, NULL); > +} > + > +static int > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa, > + unsigned long weight, int running) > +{ > + return ___update_load_avg(now, cpu, sa, weight, running, NULL); > +} > + > +static int > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > + unsigned long weight, int running, struct cfs_rq *cfs_rq) > +{ > + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); > +} Why not reduce the parameter list of these 3 incarnations to 'now, cpu, object'? static int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se) static int __update_load_avg_se(u64 now, int cpu, struct sched_entity *se) static int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq) [...] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-29 21:03 ` Dietmar Eggemann @ 2017-03-30 7:04 ` Peter Zijlstra 2017-03-30 7:46 ` Dietmar Eggemann 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-30 7:04 UTC (permalink / raw) To: Dietmar Eggemann Cc: Steven Rostedt, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Wed, Mar 29, 2017 at 11:03:45PM +0200, Dietmar Eggemann wrote: > > +static int > > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa) > > +{ > > + return ___update_load_avg(now, cpu, sa, 0, 0, NULL); > > +} > > + > > +static int > > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa, > > + unsigned long weight, int running) > > +{ > > + return ___update_load_avg(now, cpu, sa, weight, running, NULL); > > +} > > + > > +static int > > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > + unsigned long weight, int running, struct cfs_rq *cfs_rq) > > +{ > > + return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq); > > +} > > Why not reduce the parameter list of these 3 incarnations to 'now, cpu, > object'? > > static int > __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se) > > static int > __update_load_avg_se(u64 now, int cpu, struct sched_entity *se) > > static int > __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq) > > [...] doesn't quite work with se, but yes good idea. And this way we don't need the nonnull attribute either, because it should be clear from having dereferenced it that it cannot be null. --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6 * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}] */ static __always_inline int -__update_load_avg(u64 now, int cpu, struct sched_avg *sa, +___update_load_avg(u64 now, int cpu, struct sched_avg *sa, unsigned long weight, int running, struct cfs_rq *cfs_rq) { u64 delta, scaled_delta, periods; @@ -2953,6 +2953,28 @@ __update_load_avg(u64 now, int cpu, stru return decayed; } +static int +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se) +{ + return ___update_load_avg(now, cpu, &se->avg, 0, 0, NULL); +} + +static int +__update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + return ___update_load_avg(now, cpu, &se->avg, + se->on_rq * scale_load_down(se->load.weight), + cfs_rq->curr == se, NULL); +} + +static int +__update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq) +{ + return ___update_load_avg(now, cpu, &cfs_rq->avg, + scale_down_load(cfs_rq->load.weight), + cfs_rq->curr != NULL, cfs_rq); +} + /* * Signed add and clamp on underflow. * @@ -3014,6 +3036,9 @@ static inline void update_tg_load_avg(st void set_task_rq_fair(struct sched_entity *se, struct cfs_rq *prev, struct cfs_rq *next) { + u64 p_last_update_time; + u64 n_last_update_time; + if (!sched_feat(ATTACH_AGE_LOAD)) return; @@ -3024,11 +3049,11 @@ void set_task_rq_fair(struct sched_entit * time. This will result in the wakee task is less decayed, but giving * the wakee more load sounds not bad. */ - if (se->avg.last_update_time && prev) { - u64 p_last_update_time; - u64 n_last_update_time; + if (!(se->avg.last_update_time && prev)) + return; #ifndef CONFIG_64BIT + { u64 p_last_update_time_copy; u64 n_last_update_time_copy; @@ -3043,14 +3068,13 @@ void set_task_rq_fair(struct sched_entit } while (p_last_update_time != p_last_update_time_copy || n_last_update_time != n_last_update_time_copy); + } #else - p_last_update_time = prev->avg.last_update_time; - n_last_update_time = next->avg.last_update_time; + p_last_update_time = prev->avg.last_update_time; + n_last_update_time = next->avg.last_update_time; #endif - __update_load_avg(p_last_update_time, cpu_of(rq_of(prev)), - &se->avg, 0, 0, NULL); - se->avg.last_update_time = n_last_update_time; - } + __update_load_avg_blocked_se(p_last_update_time, cpu_of(rq_of(prev)), se); + se->avg.last_update_time = n_last_update_time; } /* Take into account change of utilization of a child task group */ @@ -3295,8 +3319,7 @@ update_cfs_rq_load_avg(u64 now, struct c set_tg_cfs_propagate(cfs_rq); } - decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, - scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq); + decayed = __update_load_avg_cfs_rq(now, cpu_of(rq_of(cfs_rq)), cfs_rq); #ifndef CONFIG_64BIT smp_wmb(); @@ -3328,11 +3351,8 @@ static inline void update_load_avg(struc * Track task load average for carrying it to new CPU after migrated, and * track group sched_entity load average for task_h_load calc in migration */ - if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) { - __update_load_avg(now, cpu, &se->avg, - se->on_rq * scale_load_down(se->load.weight), - cfs_rq->curr == se, NULL); - } + if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) + __update_load_avg_se(now, cpu, cfs_rq, se); decayed = update_cfs_rq_load_avg(now, cfs_rq, true); decayed |= propagate_entity_load_avg(se); @@ -3437,7 +3457,7 @@ void sync_entity_load_avg(struct sched_e u64 last_update_time; last_update_time = cfs_rq_last_update_time(cfs_rq); - __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL); + __update_load_avg_blocked_se(last_update_time, cpu_of(rq_of(cfs_rq)), se); } /* ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event 2017-03-30 7:04 ` Peter Zijlstra @ 2017-03-30 7:46 ` Dietmar Eggemann 0 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-30 7:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/30/2017 09:04 AM, Peter Zijlstra wrote: > On Wed, Mar 29, 2017 at 11:03:45PM +0200, Dietmar Eggemann wrote: [...] >> Why not reduce the parameter list of these 3 incarnations to 'now, cpu, >> object'? >> >> static int >> __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se) >> >> static int >> __update_load_avg_se(u64 now, int cpu, struct sched_entity *se) >> >> static int >> __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq) >> >> [...] > > doesn't quite work with se, but yes good idea. Ah, OK, you don't like to use 'cfs_rq_of(se)->curr == se' in __update_load_avg_se(). The reason is that it's already fetched in update_load_avg()? > And this way we don't need the nonnull attribute either, because it > should be clear from having dereferenced it that it cannot be null. Yes, this would be clearer now. [...] ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann @ 2017-03-28 6:35 ` Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann ` (2 subsequent siblings) 5 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi Export struct cfs_rq *group_cfs_rq(struct sched_entity *se) to be able to distinguish sched_entities representing either tasks or task_groups in the sched_entity related load tracking trace event provided by the next patch. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> --- include/linux/sched.h | 10 ++++++++++ kernel/sched/fair.c | 12 ------------ 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index d67eee84fd43..8a35ff99140b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -392,6 +392,16 @@ struct sched_entity { #endif }; +/* cfs_rq "owned" by this sched_entity */ +static inline struct cfs_rq *group_cfs_rq(struct sched_entity *se) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + return se->my_q; +#else + return NULL; +#endif +} + struct sched_rt_entity { struct list_head run_list; unsigned long timeout; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ac19ab6ced8f..04d4f81b96ae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -292,12 +292,6 @@ static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se) return se->cfs_rq; } -/* runqueue "owned" by this group */ -static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp) -{ - return grp->my_q; -} - static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq) { if (!cfs_rq->on_list) { @@ -449,12 +443,6 @@ static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se) return &rq->cfs; } -/* runqueue "owned" by this group */ -static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp) -{ - return NULL; -} - static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq) { } -- 2.11.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann ` (2 preceding siblings ...) 2017-03-28 6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann @ 2017-03-28 6:35 ` Dietmar Eggemann 2017-03-28 8:05 ` Peter Zijlstra 2017-03-28 8:08 ` Peter Zijlstra 2017-03-28 6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann 2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot 5 siblings, 2 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi The trace event keys load and util (utilization) are mapped to: (1) load : se->avg.load_avg (2) util : se->avg.util_avg To let this trace event work for configurations w/ and w/o group scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following special handling is necessary for non-existent key=value pairs: path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED or the sched_entity represents a task. id = -1 : In case of !CONFIG_FAIR_GROUP_SCHED or the sched_entity represents a task. comm = "(null)" : In case sched_entity represents a task_group. pid = -1 : In case sched_entity represents a task_group. The following list shows examples of the key=value pairs in different configurations for: (1) a task: cpu=0 path=(null) id=-1 comm=sshd pid=2206 load=102 util=102 (2) a taskgroup: cpu=1 path=/tg1/tg11/tg111 id=4 comm=(null) pid=-1 load=882 util=510 (3) an autogroup: cpu=0 path=/autogroup-13 id=0 comm=(null) pid=-1 load=49 util=48 (4) w/o CONFIG_FAIR_GROUP_SCHED: cpu=0 path=(null) id=-1 comm=sshd pid=2211 load=301 util=265 The trace event is only defined for CONFIG_SMP. The helper functions __trace_sched_cpu(), __trace_sched_path() and __trace_sched_id() are extended to deal with sched_entities as well. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> --- include/trace/events/sched.h | 63 +++++++++++++++++++++++++++++++++++--------- kernel/sched/fair.c | 3 +++ 2 files changed, 54 insertions(+), 12 deletions(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 51db8a90e45f..647cfaf528fd 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi, #ifdef CONFIG_SMP #ifdef CREATE_TRACE_POINTS static inline -int __trace_sched_cpu(struct cfs_rq *cfs_rq) +int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se) { #ifdef CONFIG_FAIR_GROUP_SCHED - struct rq *rq = cfs_rq->rq; + struct rq *rq = cfs_rq ? cfs_rq->rq : NULL; #else - struct rq *rq = container_of(cfs_rq, struct rq, cfs); + struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL; #endif - return cpu_of(rq); + return rq ? cpu_of(rq) + : task_cpu((container_of(se, struct task_struct, se))); } static inline @@ -582,25 +583,24 @@ int __trace_sched_path(struct cfs_rq *cfs_rq, char *path, int len) #ifdef CONFIG_FAIR_GROUP_SCHED int l = path ? len : 0; - if (task_group_is_autogroup(cfs_rq->tg)) + if (cfs_rq && task_group_is_autogroup(cfs_rq->tg)) return autogroup_path(cfs_rq->tg, path, l) + 1; - else + else if (cfs_rq && cfs_rq->tg->css.cgroup) return cgroup_path(cfs_rq->tg->css.cgroup, path, l) + 1; -#else +#endif if (path) strcpy(path, "(null)"); return strlen("(null)"); -#endif } static inline int __trace_sched_id(struct cfs_rq *cfs_rq) { #ifdef CONFIG_FAIR_GROUP_SCHED - return cfs_rq->tg->css.id; -#else - return -1; + if (cfs_rq) + return cfs_rq->tg->css.id; #endif + return -1; } #endif /* CREATE_TRACE_POINTS */ @@ -623,7 +623,7 @@ TRACE_EVENT(sched_load_cfs_rq, ), TP_fast_assign( - __entry->cpu = __trace_sched_cpu(cfs_rq); + __entry->cpu = __trace_sched_cpu(cfs_rq, NULL); __trace_sched_path(cfs_rq, __get_dynamic_array(path), __get_dynamic_array_len(path)); __entry->id = __trace_sched_id(cfs_rq); @@ -634,6 +634,45 @@ TRACE_EVENT(sched_load_cfs_rq, TP_printk("cpu=%d path=%s id=%d load=%lu util=%lu", __entry->cpu, __get_str(path), __entry->id, __entry->load, __entry->util) ); + +/* + * Tracepoint for sched_entity load tracking: + */ +TRACE_EVENT(sched_load_se, + + TP_PROTO(struct sched_entity *se), + + TP_ARGS(se), + + TP_STRUCT__entry( + __field( int, cpu ) + __dynamic_array(char, path, + __trace_sched_path(group_cfs_rq(se), NULL, 0) ) + __field( int, id ) + __array( char, comm, TASK_COMM_LEN ) + __field( pid_t, pid ) + __field( unsigned long, load ) + __field( unsigned long, util ) + ), + + TP_fast_assign( + struct task_struct *p = group_cfs_rq(se) ? NULL + : container_of(se, struct task_struct, se); + + __entry->cpu = __trace_sched_cpu(group_cfs_rq(se), se); + __trace_sched_path(group_cfs_rq(se), __get_dynamic_array(path), + __get_dynamic_array_len(path)); + __entry->id = __trace_sched_id(group_cfs_rq(se)); + memcpy(__entry->comm, p ? p->comm : "(null)", TASK_COMM_LEN); + __entry->pid = p ? p->pid : -1; + __entry->load = se->avg.load_avg; + __entry->util = se->avg.util_avg; + ), + + TP_printk("cpu=%d path=%s id=%d comm=%s pid=%d load=%lu util=%lu", + __entry->cpu, __get_str(path), __entry->id, __entry->comm, + __entry->pid, __entry->load, __entry->util) +); #endif /* CONFIG_SMP */ #endif /* _TRACE_SCHED_H */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 04d4f81b96ae..d1dcb19f5b55 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, if (cfs_rq) trace_sched_load_cfs_rq(cfs_rq); + else + trace_sched_load_se(container_of(sa, struct sched_entity, avg)); return decayed; } @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) update_tg_cfs_load(cfs_rq, se); trace_sched_load_cfs_rq(cfs_rq); + trace_sched_load_se(se); return 1; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann @ 2017-03-28 8:05 ` Peter Zijlstra 2017-03-28 14:01 ` Dietmar Eggemann 2017-03-28 14:03 ` Dietmar Eggemann 2017-03-28 8:08 ` Peter Zijlstra 1 sibling, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 8:05 UTC (permalink / raw) To: Dietmar Eggemann Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 04d4f81b96ae..d1dcb19f5b55 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > > if (cfs_rq) > trace_sched_load_cfs_rq(cfs_rq); > + else > + trace_sched_load_se(container_of(sa, struct sched_entity, avg)); > > return decayed; > } > @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) > update_tg_cfs_load(cfs_rq, se); > > trace_sched_load_cfs_rq(cfs_rq); > + trace_sched_load_se(se); > > return 1; > } Having back-to-back tracepoints is disgusting. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 8:05 ` Peter Zijlstra @ 2017-03-28 14:01 ` Dietmar Eggemann 2017-03-28 14:03 ` Dietmar Eggemann 1 sibling, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 14:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 10:05 AM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote: >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 04d4f81b96ae..d1dcb19f5b55 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> >> if (cfs_rq) >> trace_sched_load_cfs_rq(cfs_rq); >> + else >> + trace_sched_load_se(container_of(sa, struct sched_entity, avg)); >> >> return decayed; >> } >> @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) >> update_tg_cfs_load(cfs_rq, se); >> >> trace_sched_load_cfs_rq(cfs_rq); >> + trace_sched_load_se(se); >> >> return 1; >> } > > Having back-to-back tracepoints is disgusting. > Yeah, avoiding putting them like this is hard since update_tg_cfs_util()/update_tg_cfs_load() refresh util for the cfs_rq and the se respectively load/runnable_load. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 8:05 ` Peter Zijlstra 2017-03-28 14:01 ` Dietmar Eggemann @ 2017-03-28 14:03 ` Dietmar Eggemann 1 sibling, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 14:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 10:05 AM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote: >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 04d4f81b96ae..d1dcb19f5b55 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> >> if (cfs_rq) >> trace_sched_load_cfs_rq(cfs_rq); >> + else >> + trace_sched_load_se(container_of(sa, struct sched_entity, avg)); >> >> return decayed; >> } >> @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se) >> update_tg_cfs_load(cfs_rq, se); >> >> trace_sched_load_cfs_rq(cfs_rq); >> + trace_sched_load_se(se); >> >> return 1; >> } > > Having back-to-back tracepoints is disgusting. Yeah, but avoiding putting them like this is hard since the calls to update_tg_cfs_util() and update_tg_cfs_load() inside propagate_entity_load_avg() refresh util for the cfs_rq and the se respectively load (and runnable_load). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann 2017-03-28 8:05 ` Peter Zijlstra @ 2017-03-28 8:08 ` Peter Zijlstra 2017-03-28 14:13 ` Dietmar Eggemann 1 sibling, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 8:08 UTC (permalink / raw) To: Dietmar Eggemann Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote: > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index 51db8a90e45f..647cfaf528fd 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi, > #ifdef CONFIG_SMP > #ifdef CREATE_TRACE_POINTS > static inline > -int __trace_sched_cpu(struct cfs_rq *cfs_rq) > +int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > #ifdef CONFIG_FAIR_GROUP_SCHED > - struct rq *rq = cfs_rq->rq; > + struct rq *rq = cfs_rq ? cfs_rq->rq : NULL; > #else > - struct rq *rq = container_of(cfs_rq, struct rq, cfs); > + struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL; > #endif > - return cpu_of(rq); > + return rq ? cpu_of(rq) > + : task_cpu((container_of(se, struct task_struct, se))); > } So here you duplicate lots of FAIR_GROUP internals. So why do you then have to expose group_cfs_rq() in the previous patch? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 8:08 ` Peter Zijlstra @ 2017-03-28 14:13 ` Dietmar Eggemann 2017-03-28 16:41 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 14:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 10:08 AM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote: >> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h >> index 51db8a90e45f..647cfaf528fd 100644 >> --- a/include/trace/events/sched.h >> +++ b/include/trace/events/sched.h >> @@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi, >> #ifdef CONFIG_SMP >> #ifdef CREATE_TRACE_POINTS >> static inline >> -int __trace_sched_cpu(struct cfs_rq *cfs_rq) >> +int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se) >> { >> #ifdef CONFIG_FAIR_GROUP_SCHED >> - struct rq *rq = cfs_rq->rq; >> + struct rq *rq = cfs_rq ? cfs_rq->rq : NULL; >> #else >> - struct rq *rq = container_of(cfs_rq, struct rq, cfs); >> + struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL; >> #endif >> - return cpu_of(rq); >> + return rq ? cpu_of(rq) >> + : task_cpu((container_of(se, struct task_struct, se))); >> } > > So here you duplicate lots of FAIR_GROUP internals. So why do you then > have to expose group_cfs_rq() in the previous patch? > Not having group_cfs_rq() available made the trace event code too confusing. But like I mentioned in the second to last paragraph in the cover letter, having all necessary cfs accessor-functions (rq_of(), task_of(), etc.) available would definitely streamline the coding effort of these trace events. Do you think that making them public in include/linux/sched.h is the way to go? What about the namespace issue with other sched classes? Should they be exported with the name they have right now (since cfs was there first) or should they be renamed to cfs_task_of() and rq_of_cfs_rq() etc. ? RT and Deadline class already have the own (private) accessor-functions (e.g. dl_task_of() or rq_of_dl_rq()). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 14:13 ` Dietmar Eggemann @ 2017-03-28 16:41 ` Peter Zijlstra 2017-03-29 20:19 ` Dietmar Eggemann 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2017-03-28 16:41 UTC (permalink / raw) To: Dietmar Eggemann Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On Tue, Mar 28, 2017 at 04:13:45PM +0200, Dietmar Eggemann wrote: > Do you think that making them public in include/linux/sched.h is the way to > go? No; all that stuff should really stay private. tracepoints are a very bad reason to leak this stuff. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event 2017-03-28 16:41 ` Peter Zijlstra @ 2017-03-29 20:19 ` Dietmar Eggemann 0 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-29 20:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 06:41 PM, Peter Zijlstra wrote: > On Tue, Mar 28, 2017 at 04:13:45PM +0200, Dietmar Eggemann wrote: > >> Do you think that making them public in include/linux/sched.h is the way to >> go? > > No; all that stuff should really stay private. tracepoints are a very > bad reason to leak this stuff. Understood & makes sense to me. In hindsight, it's not too complicated to code group_cfs_rq in include/trace/events/sched.h. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 5/5] sched/events: Introduce task_group load tracking trace event 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann ` (3 preceding siblings ...) 2017-03-28 6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann @ 2017-03-28 6:35 ` Dietmar Eggemann 2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot 5 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 6:35 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi The trace event key load is mapped to: (1) load : cfs_rq->tg->load_avg The cfs_rq owned by the task_group is used as the only parameter for the trace event because it has a reference to the taskgroup and the cpu. Using the taskgroup as a parameter instead would require the cpu as a second parameter. A task_group is global and not per-cpu data. The cpu key only tells on which cpu the value was gathered. The following list shows examples of the key=value pairs for: (1) a task group: cpu=1 path=/tg1/tg11/tg111 id=4 load=517 (2) an autogroup: cpu=1 path=/autogroup-10 id=0 load=1050 We don't maintain a load signal for a root task group. The trace event is only defined if cfs group scheduling support (CONFIG_FAIR_GROUP_SCHED) is enabled. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> --- include/trace/events/sched.h | 31 +++++++++++++++++++++++++++++++ kernel/sched/fair.c | 2 ++ 2 files changed, 33 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 647cfaf528fd..3fe0092176f8 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -673,6 +673,37 @@ TRACE_EVENT(sched_load_se, __entry->cpu, __get_str(path), __entry->id, __entry->comm, __entry->pid, __entry->load, __entry->util) ); + +/* + * Tracepoint for task_group load tracking: + */ +#ifdef CONFIG_FAIR_GROUP_SCHED +TRACE_EVENT(sched_load_tg, + + TP_PROTO(struct cfs_rq *cfs_rq), + + TP_ARGS(cfs_rq), + + TP_STRUCT__entry( + __field( int, cpu ) + __dynamic_array(char, path, + __trace_sched_path(cfs_rq, NULL, 0) ) + __field( int, id ) + __field( long, load ) + ), + + TP_fast_assign( + __entry->cpu = cfs_rq->rq->cpu; + __trace_sched_path(cfs_rq, __get_dynamic_array(path), + __get_dynamic_array_len(path)); + __entry->id = cfs_rq->tg->css.id; + __entry->load = atomic_long_read(&cfs_rq->tg->load_avg); + ), + + TP_printk("cpu=%d path=%s id=%d load=%ld", __entry->cpu, + __get_str(path), __entry->id, __entry->load) +); +#endif /* CONFIG_FAIR_GROUP_SCHED */ #endif /* CONFIG_SMP */ #endif /* _TRACE_SCHED_H */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d1dcb19f5b55..dbe2d5ef8b9e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2997,6 +2997,8 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) atomic_long_add(delta, &cfs_rq->tg->load_avg); cfs_rq->tg_load_avg_contrib = cfs_rq->avg.load_avg; } + + trace_sched_load_tg(cfs_rq); } /* -- 2.11.0 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/5] CFS load tracking trace events 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann ` (4 preceding siblings ...) 2017-03-28 6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann @ 2017-03-28 10:05 ` Vincent Guittot 2017-03-28 13:45 ` Dietmar Eggemann 5 siblings, 1 reply; 30+ messages in thread From: Vincent Guittot @ 2017-03-28 10:05 UTC (permalink / raw) To: Dietmar Eggemann Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 28 March 2017 at 08:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > This patch-set introduces trace events for load (and utilization) > tracking for the following three cfs scheduler bricks: cfs_rq, > sched_entity and task_group. > > I've decided to sent it out because people are discussing problems with > PELT related functionality and as a base for the related discussion next > week at the OSPM-summit in Pisa. > > The requirements for the design are: > > (1) Support for all combinations related to the following kernel > configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED, > CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL. > > (2) All key=value pairs have to appear in all combinations of the > kernel configuration options mentioned under (1). > > (3) Stable interface, i.e. only use keys for a key=value pairs which > are independent from the underlying (load tracking) implementation. > > (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace > event should not have to be guarded by an if condition (e.g. > entity_is_task(se) to distinguish between task and task_group) > or kernel configuration switch. > > The trace events only expose one load (and one utilization) key=value > pair besides those used to identify the cfs scheduler brick. They do > not provide any internal implementation details of the actual (PELT) > load-tracking mechanism. > This limitation might be too much since for a cfs_rq we can only trace > runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load > (cfs_rq->avg.load_avg). > Other load metrics like instantaneous load ({cfs_rq|se}->load.weight) > are not traced but could be easily added. > > The following keys are used to identify the cfs scheduler brick: > > (1) Cpu number the cfs scheduler brick is attached to. > > (2) Task_group path and (css) id. > > (3) Task name and pid. Do you really need both path/name and id/pid ? The path/name looks quite intrusive so can't we just use id/pid ? > > In case a key does not apply due to an unset Kernel configuration option > or the fact that a sched_entity can represent either a task or a > task_group its value is set to an invalid default: > > (1) Task_group path: "(null)" > > (2) Task group id: -1 > > (3) Task name: "(null)" > > (4) Task pid: -1 > > Load tracking trace events are placed into kernel/sched/fair.c: > > (1) for cfs_rq: > > - In PELT core function __update_load_avg(). > > - During sched_entity attach/detach. > > - During sched_entity load/util propagation down the task_group > hierarchy. > > (2) for sched_entity: > > - In PELT core function __update_load_avg(). > > - During sched_entity load/util propagation down the task_group > hierarchy. > > (3) for task_group: > > - In its PELT update function. > > An alternative for using __update_load_avg() would be to put trace > events into update_cfs_rq_load_avg() for cfs_rq's and into > set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for > sched_entities. This would also get rid of the if(cfs_rq)/else condition > in __update_load_avg(). > > These trace events still look a bit fragile. > First of all, this patch-set has to use cfs specific data structures > in the global task scheduler trace file include/trace/events/sched.h. > And second, the accessor-functions (rq_of(), task_of(), etc.) are > private to the cfs scheduler class. In case they would be public these > trace events would be easier to code. That said, group_cfs_rq() is > already made public by this patch-stack. > > This version bases on tip/sched/core as of yesterday (bc4278987e38). It > has been compile tested on ~160 configurations via 0day's kbuild test > robot. > > Dietmar Eggemann (5): > sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG > sched/events: Introduce cfs_rq load tracking trace event > sched/fair: Export group_cfs_rq() > sched/events: Introduce sched_entity load tracking trace event > sched/events: Introduce task_group load tracking trace event > > include/linux/sched.h | 10 +++ > include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/autogroup.c | 2 - > kernel/sched/autogroup.h | 2 - > kernel/sched/fair.c | 26 ++++---- > 5 files changed, 167 insertions(+), 16 deletions(-) > > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 0/5] CFS load tracking trace events 2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot @ 2017-03-28 13:45 ` Dietmar Eggemann 0 siblings, 0 replies; 30+ messages in thread From: Dietmar Eggemann @ 2017-03-28 13:45 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Steven Rostedt, Morten Rasmussen, Juri Lelli, Patrick Bellasi On 03/28/2017 12:05 PM, Vincent Guittot wrote: > On 28 March 2017 at 08:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: [...] >> The following keys are used to identify the cfs scheduler brick: >> >> (1) Cpu number the cfs scheduler brick is attached to. >> >> (2) Task_group path and (css) id. >> >> (3) Task name and pid. > > Do you really need both path/name and id/pid ? > > The path/name looks quite intrusive so can't we just use id/pid ? One problem is that all autogroups use id=0. Another thing with task_groups is that dealing with path="/tg1/tg11" is so much more intuitive than id="7". IMHO, we do need task name and pid to be able to clearly identify a task (same name/different pid or fork phase (forkee still has name of forker)). You're right, the implementation with path is more complicated but I guess that's worth it. We could get rid of 'id' though. [...] ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2017-03-30 7:46 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-28 6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann 2017-03-28 7:56 ` Peter Zijlstra 2017-03-28 13:30 ` Dietmar Eggemann 2017-03-28 8:00 ` Peter Zijlstra 2017-03-28 14:47 ` Steven Rostedt 2017-03-28 14:46 ` Steven Rostedt 2017-03-28 16:44 ` Peter Zijlstra 2017-03-28 16:57 ` Peter Zijlstra 2017-03-28 17:20 ` Patrick Bellasi 2017-03-28 18:18 ` Peter Zijlstra 2017-03-28 17:36 ` Steven Rostedt 2017-03-28 17:37 ` Steven Rostedt 2017-03-29 20:37 ` Dietmar Eggemann 2017-03-29 21:03 ` Dietmar Eggemann 2017-03-30 7:04 ` Peter Zijlstra 2017-03-30 7:46 ` Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann 2017-03-28 8:05 ` Peter Zijlstra 2017-03-28 14:01 ` Dietmar Eggemann 2017-03-28 14:03 ` Dietmar Eggemann 2017-03-28 8:08 ` Peter Zijlstra 2017-03-28 14:13 ` Dietmar Eggemann 2017-03-28 16:41 ` Peter Zijlstra 2017-03-29 20:19 ` Dietmar Eggemann 2017-03-28 6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann 2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot 2017-03-28 13:45 ` Dietmar Eggemann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.