All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] CFS load tracking trace events
@ 2017-03-28  6:35 Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

This patch-set introduces trace events for load (and utilization)
tracking for the following three cfs scheduler bricks: cfs_rq,
sched_entity and task_group.

I've decided to sent it out because people are discussing problems with
PELT related functionality and as a base for the related discussion next
week at the OSPM-summit in Pisa.

The requirements for the design are:

 (1) Support for all combinations related to the following kernel
     configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED,
     CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL.

 (2) All key=value pairs have to appear in all combinations of the
     kernel configuration options mentioned under (1).

 (3) Stable interface, i.e. only use keys for a key=value pairs which
     are independent from the underlying (load tracking) implementation.

 (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace
     event should not have to be guarded by an if condition (e.g.
     entity_is_task(se) to distinguish between task and task_group)
     or kernel configuration switch.

The trace events only expose one load (and one utilization) key=value
pair besides those used to identify the cfs scheduler brick. They do
not provide any internal implementation details of the actual (PELT)
load-tracking mechanism.
This limitation might be too much since for a cfs_rq we can only trace
runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load
(cfs_rq->avg.load_avg).
Other load metrics like instantaneous load ({cfs_rq|se}->load.weight)
are not traced but could be easily added.

The following keys are used to identify the cfs scheduler brick:

 (1) Cpu number the cfs scheduler brick is attached to.

 (2) Task_group path and (css) id.

 (3) Task name and pid.

In case a key does not apply due to an unset Kernel configuration option
or the fact that a sched_entity can represent either a task or a
task_group its value is set to an invalid default:

 (1) Task_group path: "(null)"

 (2) Task group id:   -1

 (3) Task name:       "(null)"

 (4) Task pid:        -1

Load tracking trace events are placed into kernel/sched/fair.c:

 (1) for cfs_rq:

     - In PELT core function __update_load_avg().
    
     - During sched_entity attach/detach.

     - During sched_entity load/util propagation down the task_group
       hierarchy.

 (2) for sched_entity:

     - In PELT core function __update_load_avg().
    
     - During sched_entity load/util propagation down the task_group
       hierarchy.

 (3) for task_group:

     - In its PELT update function.

An alternative for using __update_load_avg() would be to put trace
events into update_cfs_rq_load_avg() for cfs_rq's and into
set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for
sched_entities. This would also get rid of the if(cfs_rq)/else condition
in __update_load_avg().

These trace events still look a bit fragile.
First of all, this patch-set has to use cfs specific data structures
in the global task scheduler trace file include/trace/events/sched.h.
And second, the accessor-functions (rq_of(), task_of(), etc.) are
private to the cfs scheduler class. In case they would be public these
trace events would be easier to code. That said, group_cfs_rq() is
already made public by this patch-stack.

This version bases on tip/sched/core as of yesterday (bc4278987e38). It
has been compile tested on ~160 configurations via 0day's kbuild test
robot.

Dietmar Eggemann (5):
  sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
  sched/events: Introduce cfs_rq load tracking trace event
  sched/fair: Export group_cfs_rq()
  sched/events: Introduce sched_entity load tracking trace event
  sched/events: Introduce task_group load tracking trace event

 include/linux/sched.h        |  10 +++
 include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/autogroup.c     |   2 -
 kernel/sched/autogroup.h     |   2 -
 kernel/sched/fair.c          |  26 ++++----
 5 files changed, 167 insertions(+), 16 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
@ 2017-03-28  6:35 ` Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

Define autogroup_path() even in the !CONFIG_SCHED_DEBUG case. If
CONFIG_SCHED_AUTOGROUP is enabled the path of an autogroup has to be
available to be printed in the load tracking trace events provided by
this patch-stack regardless whether CONFIG_SCHED_DEBUG is set or not.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/autogroup.c | 2 --
 kernel/sched/autogroup.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index da39489d2d80..22be4eaf6ae2 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -260,7 +260,6 @@ void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
 }
 #endif /* CONFIG_PROC_FS */
 
-#ifdef CONFIG_SCHED_DEBUG
 int autogroup_path(struct task_group *tg, char *buf, int buflen)
 {
 	if (!task_group_is_autogroup(tg))
@@ -268,4 +267,3 @@ int autogroup_path(struct task_group *tg, char *buf, int buflen)
 
 	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
 }
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/autogroup.h b/kernel/sched/autogroup.h
index ce40c810cd5c..6a661cfa9584 100644
--- a/kernel/sched/autogroup.h
+++ b/kernel/sched/autogroup.h
@@ -55,11 +55,9 @@ autogroup_task_group(struct task_struct *p, struct task_group *tg)
 	return tg;
 }
 
-#ifdef CONFIG_SCHED_DEBUG
 static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
 {
 	return 0;
 }
-#endif
 
 #endif /* CONFIG_SCHED_AUTOGROUP */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann
@ 2017-03-28  6:35 ` Dietmar Eggemann
  2017-03-28  7:56   ` Peter Zijlstra
                     ` (2 more replies)
  2017-03-28  6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

The trace event keys load and util (utilization) are mapped to:

 (1) load : cfs_rq->runnable_load_avg

 (2) util : cfs_rq->avg.util_avg

To let this trace event work for configurations w/ and w/o group
scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following
special handling is necessary for non-existent key=value pairs:

 path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED.

 id   = -1       : In case of !CONFIG_FAIR_GROUP_SCHED.

The following list shows examples of the key=value pairs in different
configurations for:

 (1) a root task_group:

     cpu=4 path=/ id=1 load=6 util=331

 (2) a task_group:

     cpu=1 path=/tg1/tg11/tg111 id=4 load=538 util=522

 (3) an autogroup:

     cpu=3 path=/autogroup-18 id=0 load=997 util=517

 (4) w/o CONFIG_FAIR_GROUP_SCHED:

     cpu=0 path=(null) id=-1 load=314 util=289

The trace event is only defined for CONFIG_SMP.

The helper function __trace_sched_path() can be used to get the length
parameter of the dynamic array (path == NULL) and to copy the path into
it (path != NULL).

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h | 73 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c          |  9 ++++++
 2 files changed, 82 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9e3ef6c99e4b..51db8a90e45f 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -562,6 +562,79 @@ TRACE_EVENT(sched_wake_idle_without_ipi,
 
 	TP_printk("cpu=%d", __entry->cpu)
 );
+
+#ifdef CONFIG_SMP
+#ifdef CREATE_TRACE_POINTS
+static inline
+int __trace_sched_cpu(struct cfs_rq *cfs_rq)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	struct rq *rq = cfs_rq->rq;
+#else
+	struct rq *rq = container_of(cfs_rq, struct rq, cfs);
+#endif
+	return cpu_of(rq);
+}
+
+static inline
+int __trace_sched_path(struct cfs_rq *cfs_rq, char *path, int len)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	int l = path ? len : 0;
+
+	if (task_group_is_autogroup(cfs_rq->tg))
+		return autogroup_path(cfs_rq->tg, path, l) + 1;
+	else
+		return cgroup_path(cfs_rq->tg->css.cgroup, path, l) + 1;
+#else
+	if (path)
+		strcpy(path, "(null)");
+
+	return strlen("(null)");
+#endif
+}
+
+static inline int __trace_sched_id(struct cfs_rq *cfs_rq)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	return cfs_rq->tg->css.id;
+#else
+	return -1;
+#endif
+}
+#endif /* CREATE_TRACE_POINTS */
+
+/*
+ * Tracepoint for cfs_rq load tracking:
+ */
+TRACE_EVENT(sched_load_cfs_rq,
+
+	TP_PROTO(struct cfs_rq *cfs_rq),
+
+	TP_ARGS(cfs_rq),
+
+	TP_STRUCT__entry(
+		__field(	int,		cpu			)
+		__dynamic_array(char,		path,
+				__trace_sched_path(cfs_rq, NULL, 0)	)
+		__field(	int,		id			)
+		__field(	unsigned long,	load			)
+		__field(	unsigned long,	util			)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= __trace_sched_cpu(cfs_rq);
+		__trace_sched_path(cfs_rq, __get_dynamic_array(path),
+				   __get_dynamic_array_len(path));
+		__entry->id	= __trace_sched_id(cfs_rq);
+		__entry->load	= cfs_rq->runnable_load_avg;
+		__entry->util	= cfs_rq->avg.util_avg;
+	),
+
+	TP_printk("cpu=%d path=%s id=%d load=%lu util=%lu",  __entry->cpu,
+		  __get_str(path), __entry->id, __entry->load, __entry->util)
+);
+#endif /* CONFIG_SMP */
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03adf9fb48b1..ac19ab6ced8f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
 	}
 
+	if (cfs_rq)
+		trace_sched_load_cfs_rq(cfs_rq);
+
 	return decayed;
 }
 
@@ -3170,6 +3173,8 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
 	update_tg_cfs_util(cfs_rq, se);
 	update_tg_cfs_load(cfs_rq, se);
 
+	trace_sched_load_cfs_rq(cfs_rq);
+
 	return 1;
 }
 
@@ -3359,6 +3364,8 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	set_tg_cfs_propagate(cfs_rq);
 
 	cfs_rq_util_change(cfs_rq);
+
+	trace_sched_load_cfs_rq(cfs_rq);
 }
 
 /**
@@ -3379,6 +3386,8 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	set_tg_cfs_propagate(cfs_rq);
 
 	cfs_rq_util_change(cfs_rq);
+
+	trace_sched_load_cfs_rq(cfs_rq);
 }
 
 /* Add the load generated by se into cfs_rq's load average */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/5] sched/fair: Export group_cfs_rq()
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
@ 2017-03-28  6:35 ` Dietmar Eggemann
  2017-03-28  6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

Export struct cfs_rq *group_cfs_rq(struct sched_entity *se) to be able
to distinguish sched_entities representing either tasks or task_groups
in the sched_entity related load tracking trace event provided by the
next patch.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h | 10 ++++++++++
 kernel/sched/fair.c   | 12 ------------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d67eee84fd43..8a35ff99140b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -392,6 +392,16 @@ struct sched_entity {
 #endif
 };
 
+/* cfs_rq "owned" by this sched_entity */
+static inline struct cfs_rq *group_cfs_rq(struct sched_entity *se)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	return se->my_q;
+#else
+	return NULL;
+#endif
+}
+
 struct sched_rt_entity {
 	struct list_head		run_list;
 	unsigned long			timeout;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac19ab6ced8f..04d4f81b96ae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -292,12 +292,6 @@ static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
 	return se->cfs_rq;
 }
 
-/* runqueue "owned" by this group */
-static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
-{
-	return grp->my_q;
-}
-
 static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 {
 	if (!cfs_rq->on_list) {
@@ -449,12 +443,6 @@ static inline struct cfs_rq *cfs_rq_of(struct sched_entity *se)
 	return &rq->cfs;
 }
 
-/* runqueue "owned" by this group */
-static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
-{
-	return NULL;
-}
-
 static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 {
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
                   ` (2 preceding siblings ...)
  2017-03-28  6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann
@ 2017-03-28  6:35 ` Dietmar Eggemann
  2017-03-28  8:05   ` Peter Zijlstra
  2017-03-28  8:08   ` Peter Zijlstra
  2017-03-28  6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann
  2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot
  5 siblings, 2 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

The trace event keys load and util (utilization) are mapped to:

 (1) load : se->avg.load_avg

 (2) util : se->avg.util_avg

To let this trace event work for configurations w/ and w/o group
scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following
special handling is necessary for non-existent key=value pairs:

 path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED or the
                   sched_entity represents a task.

 id   = -1       : In case of !CONFIG_FAIR_GROUP_SCHED or the
                   sched_entity represents a task.

 comm = "(null)" : In case sched_entity represents a task_group.

 pid = -1        : In case sched_entity represents a task_group.

The following list shows examples of the key=value pairs in different
configurations for:

 (1) a task:

     cpu=0 path=(null) id=-1 comm=sshd pid=2206 load=102 util=102

 (2) a taskgroup:

     cpu=1 path=/tg1/tg11/tg111 id=4 comm=(null) pid=-1 load=882 util=510

 (3) an autogroup:

     cpu=0 path=/autogroup-13 id=0 comm=(null) pid=-1 load=49 util=48

 (4) w/o CONFIG_FAIR_GROUP_SCHED:

     cpu=0 path=(null) id=-1 comm=sshd pid=2211 load=301 util=265

The trace event is only defined for CONFIG_SMP.

The helper functions __trace_sched_cpu(), __trace_sched_path() and
__trace_sched_id() are extended to deal with sched_entities as well.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h | 63 +++++++++++++++++++++++++++++++++++---------
 kernel/sched/fair.c          |  3 +++
 2 files changed, 54 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 51db8a90e45f..647cfaf528fd 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi,
 #ifdef CONFIG_SMP
 #ifdef CREATE_TRACE_POINTS
 static inline
-int __trace_sched_cpu(struct cfs_rq *cfs_rq)
+int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	struct rq *rq = cfs_rq->rq;
+	struct rq *rq = cfs_rq ? cfs_rq->rq : NULL;
 #else
-	struct rq *rq = container_of(cfs_rq, struct rq, cfs);
+	struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL;
 #endif
-	return cpu_of(rq);
+	return rq ? cpu_of(rq)
+		  : task_cpu((container_of(se, struct task_struct, se)));
 }
 
 static inline
@@ -582,25 +583,24 @@ int __trace_sched_path(struct cfs_rq *cfs_rq, char *path, int len)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	int l = path ? len : 0;
 
-	if (task_group_is_autogroup(cfs_rq->tg))
+	if (cfs_rq && task_group_is_autogroup(cfs_rq->tg))
 		return autogroup_path(cfs_rq->tg, path, l) + 1;
-	else
+	else if (cfs_rq && cfs_rq->tg->css.cgroup)
 		return cgroup_path(cfs_rq->tg->css.cgroup, path, l) + 1;
-#else
+#endif
 	if (path)
 		strcpy(path, "(null)");
 
 	return strlen("(null)");
-#endif
 }
 
 static inline int __trace_sched_id(struct cfs_rq *cfs_rq)
 {
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	return cfs_rq->tg->css.id;
-#else
-	return -1;
+	if (cfs_rq)
+		return cfs_rq->tg->css.id;
 #endif
+	return -1;
 }
 #endif /* CREATE_TRACE_POINTS */
 
@@ -623,7 +623,7 @@ TRACE_EVENT(sched_load_cfs_rq,
 	),
 
 	TP_fast_assign(
-		__entry->cpu	= __trace_sched_cpu(cfs_rq);
+		__entry->cpu	= __trace_sched_cpu(cfs_rq, NULL);
 		__trace_sched_path(cfs_rq, __get_dynamic_array(path),
 				   __get_dynamic_array_len(path));
 		__entry->id	= __trace_sched_id(cfs_rq);
@@ -634,6 +634,45 @@ TRACE_EVENT(sched_load_cfs_rq,
 	TP_printk("cpu=%d path=%s id=%d load=%lu util=%lu",  __entry->cpu,
 		  __get_str(path), __entry->id, __entry->load, __entry->util)
 );
+
+/*
+ * Tracepoint for sched_entity load tracking:
+ */
+TRACE_EVENT(sched_load_se,
+
+	TP_PROTO(struct sched_entity *se),
+
+	TP_ARGS(se),
+
+	TP_STRUCT__entry(
+		__field(	int,		cpu			      )
+		__dynamic_array(char,		path,
+				__trace_sched_path(group_cfs_rq(se), NULL, 0) )
+		__field(	int,		id			      )
+		__array(	char,		comm,	TASK_COMM_LEN	      )
+		__field(	pid_t,		pid			      )
+		__field(	unsigned long,	load			      )
+		__field(	unsigned long,	util			      )
+	),
+
+	TP_fast_assign(
+		struct task_struct *p = group_cfs_rq(se) ? NULL
+				    : container_of(se, struct task_struct, se);
+
+		__entry->cpu = __trace_sched_cpu(group_cfs_rq(se), se);
+		__trace_sched_path(group_cfs_rq(se), __get_dynamic_array(path),
+				   __get_dynamic_array_len(path));
+		__entry->id = __trace_sched_id(group_cfs_rq(se));
+		memcpy(__entry->comm, p ? p->comm : "(null)", TASK_COMM_LEN);
+		__entry->pid = p ? p->pid : -1;
+		__entry->load = se->avg.load_avg;
+		__entry->util = se->avg.util_avg;
+	),
+
+	TP_printk("cpu=%d path=%s id=%d comm=%s pid=%d load=%lu util=%lu",
+		  __entry->cpu, __get_str(path), __entry->id, __entry->comm,
+		  __entry->pid, __entry->load, __entry->util)
+);
 #endif /* CONFIG_SMP */
 #endif /* _TRACE_SCHED_H */
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04d4f81b96ae..d1dcb19f5b55 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 
 	if (cfs_rq)
 		trace_sched_load_cfs_rq(cfs_rq);
+	else
+		trace_sched_load_se(container_of(sa, struct sched_entity, avg));
 
 	return decayed;
 }
@@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
 	update_tg_cfs_load(cfs_rq, se);
 
 	trace_sched_load_cfs_rq(cfs_rq);
+	trace_sched_load_se(se);
 
 	return 1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/5] sched/events: Introduce task_group load tracking trace event
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
                   ` (3 preceding siblings ...)
  2017-03-28  6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann
@ 2017-03-28  6:35 ` Dietmar Eggemann
  2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot
  5 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28  6:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

The trace event key load is mapped to:

 (1) load : cfs_rq->tg->load_avg

The cfs_rq owned by the task_group is used as the only parameter for the
trace event because it has a reference to the taskgroup and the cpu.
Using the taskgroup as a parameter instead would require the cpu as a
second parameter. A task_group is global and not per-cpu data. The cpu
key only tells on which cpu the value was gathered.

The following list shows examples of the key=value pairs for:

 (1) a task group:

     cpu=1 path=/tg1/tg11/tg111 id=4 load=517

 (2) an autogroup:

     cpu=1 path=/autogroup-10 id=0 load=1050

We don't maintain a load signal for a root task group.

The trace event is only defined if cfs group scheduling support
(CONFIG_FAIR_GROUP_SCHED) is enabled.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h | 31 +++++++++++++++++++++++++++++++
 kernel/sched/fair.c          |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 647cfaf528fd..3fe0092176f8 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -673,6 +673,37 @@ TRACE_EVENT(sched_load_se,
 		  __entry->cpu, __get_str(path), __entry->id, __entry->comm,
 		  __entry->pid, __entry->load, __entry->util)
 );
+
+/*
+ * Tracepoint for task_group load tracking:
+ */
+#ifdef CONFIG_FAIR_GROUP_SCHED
+TRACE_EVENT(sched_load_tg,
+
+	TP_PROTO(struct cfs_rq *cfs_rq),
+
+	TP_ARGS(cfs_rq),
+
+	TP_STRUCT__entry(
+		__field(	int,	cpu				)
+		__dynamic_array(char,	path,
+				__trace_sched_path(cfs_rq, NULL, 0)	)
+		__field(	int,	id				)
+		__field(	long,	load				)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	= cfs_rq->rq->cpu;
+		__trace_sched_path(cfs_rq, __get_dynamic_array(path),
+				   __get_dynamic_array_len(path));
+		__entry->id	= cfs_rq->tg->css.id;
+		__entry->load	= atomic_long_read(&cfs_rq->tg->load_avg);
+	),
+
+	TP_printk("cpu=%d path=%s id=%d load=%ld", __entry->cpu,
+		  __get_str(path), __entry->id, __entry->load)
+);
+#endif /* CONFIG_FAIR_GROUP_SCHED */
 #endif /* CONFIG_SMP */
 #endif /* _TRACE_SCHED_H */
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d1dcb19f5b55..dbe2d5ef8b9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2997,6 +2997,8 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force)
 		atomic_long_add(delta, &cfs_rq->tg->load_avg);
 		cfs_rq->tg_load_avg_contrib = cfs_rq->avg.load_avg;
 	}
+
+	trace_sched_load_tg(cfs_rq);
 }
 
 /*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
@ 2017-03-28  7:56   ` Peter Zijlstra
  2017-03-28 13:30     ` Dietmar Eggemann
  2017-03-28  8:00   ` Peter Zijlstra
  2017-03-28 14:46   ` Steven Rostedt
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28  7:56 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote:
> The trace event keys load and util (utilization) are mapped to:
> 
>  (1) load : cfs_rq->runnable_load_avg
> 
>  (2) util : cfs_rq->avg.util_avg
> 
> To let this trace event work for configurations w/ and w/o group
> scheduling support for cfs (CONFIG_FAIR_GROUP_SCHED) the following
> special handling is necessary for non-existent key=value pairs:
> 
>  path = "(null)" : In case of !CONFIG_FAIR_GROUP_SCHED.
> 
>  id   = -1       : In case of !CONFIG_FAIR_GROUP_SCHED.
> 
> The following list shows examples of the key=value pairs in different
> configurations for:
> 
>  (1) a root task_group:
> 
>      cpu=4 path=/ id=1 load=6 util=331

What's @id and why do we care?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
  2017-03-28  7:56   ` Peter Zijlstra
@ 2017-03-28  8:00   ` Peter Zijlstra
  2017-03-28 14:47     ` Steven Rostedt
  2017-03-28 14:46   ` Steven Rostedt
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28  8:00 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote:
>  
> +	if (cfs_rq)
> +		trace_sched_load_cfs_rq(cfs_rq);

You can do that with DEFINE_EVENT_CONDITION and TP_CONDITION.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann
@ 2017-03-28  8:05   ` Peter Zijlstra
  2017-03-28 14:01     ` Dietmar Eggemann
  2017-03-28 14:03     ` Dietmar Eggemann
  2017-03-28  8:08   ` Peter Zijlstra
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28  8:05 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04d4f81b96ae..d1dcb19f5b55 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  
>  	if (cfs_rq)
>  		trace_sched_load_cfs_rq(cfs_rq);
> +	else
> +		trace_sched_load_se(container_of(sa, struct sched_entity, avg));
>  
>  	return decayed;
>  }
> @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
>  	update_tg_cfs_load(cfs_rq, se);
>  
>  	trace_sched_load_cfs_rq(cfs_rq);
> +	trace_sched_load_se(se);
>  
>  	return 1;
>  }

Having back-to-back tracepoints is disgusting.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann
  2017-03-28  8:05   ` Peter Zijlstra
@ 2017-03-28  8:08   ` Peter Zijlstra
  2017-03-28 14:13     ` Dietmar Eggemann
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28  8:08 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote:
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 51db8a90e45f..647cfaf528fd 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi,
>  #ifdef CONFIG_SMP
>  #ifdef CREATE_TRACE_POINTS
>  static inline
> -int __trace_sched_cpu(struct cfs_rq *cfs_rq)
> +int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -	struct rq *rq = cfs_rq->rq;
> +	struct rq *rq = cfs_rq ? cfs_rq->rq : NULL;
>  #else
> -	struct rq *rq = container_of(cfs_rq, struct rq, cfs);
> +	struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL;
>  #endif
> -	return cpu_of(rq);
> +	return rq ? cpu_of(rq)
> +		  : task_cpu((container_of(se, struct task_struct, se)));
>  }

So here you duplicate lots of FAIR_GROUP internals. So why do you then
have to expose group_cfs_rq() in the previous patch?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] CFS load tracking trace events
  2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
                   ` (4 preceding siblings ...)
  2017-03-28  6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann
@ 2017-03-28 10:05 ` Vincent Guittot
  2017-03-28 13:45   ` Dietmar Eggemann
  5 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2017-03-28 10:05 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 28 March 2017 at 08:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> This patch-set introduces trace events for load (and utilization)
> tracking for the following three cfs scheduler bricks: cfs_rq,
> sched_entity and task_group.
>
> I've decided to sent it out because people are discussing problems with
> PELT related functionality and as a base for the related discussion next
> week at the OSPM-summit in Pisa.
>
> The requirements for the design are:
>
>  (1) Support for all combinations related to the following kernel
>      configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED,
>      CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL.
>
>  (2) All key=value pairs have to appear in all combinations of the
>      kernel configuration options mentioned under (1).
>
>  (3) Stable interface, i.e. only use keys for a key=value pairs which
>      are independent from the underlying (load tracking) implementation.
>
>  (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace
>      event should not have to be guarded by an if condition (e.g.
>      entity_is_task(se) to distinguish between task and task_group)
>      or kernel configuration switch.
>
> The trace events only expose one load (and one utilization) key=value
> pair besides those used to identify the cfs scheduler brick. They do
> not provide any internal implementation details of the actual (PELT)
> load-tracking mechanism.
> This limitation might be too much since for a cfs_rq we can only trace
> runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load
> (cfs_rq->avg.load_avg).
> Other load metrics like instantaneous load ({cfs_rq|se}->load.weight)
> are not traced but could be easily added.
>
> The following keys are used to identify the cfs scheduler brick:
>
>  (1) Cpu number the cfs scheduler brick is attached to.
>
>  (2) Task_group path and (css) id.
>
>  (3) Task name and pid.

Do you really need both path/name and id/pid ?

The path/name looks quite intrusive so can't we just use id/pid ?

>
> In case a key does not apply due to an unset Kernel configuration option
> or the fact that a sched_entity can represent either a task or a
> task_group its value is set to an invalid default:
>
>  (1) Task_group path: "(null)"
>
>  (2) Task group id:   -1
>
>  (3) Task name:       "(null)"
>
>  (4) Task pid:        -1
>
> Load tracking trace events are placed into kernel/sched/fair.c:
>
>  (1) for cfs_rq:
>
>      - In PELT core function __update_load_avg().
>
>      - During sched_entity attach/detach.
>
>      - During sched_entity load/util propagation down the task_group
>        hierarchy.
>
>  (2) for sched_entity:
>
>      - In PELT core function __update_load_avg().
>
>      - During sched_entity load/util propagation down the task_group
>        hierarchy.
>
>  (3) for task_group:
>
>      - In its PELT update function.
>
> An alternative for using __update_load_avg() would be to put trace
> events into update_cfs_rq_load_avg() for cfs_rq's and into
> set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for
> sched_entities. This would also get rid of the if(cfs_rq)/else condition
> in __update_load_avg().
>
> These trace events still look a bit fragile.
> First of all, this patch-set has to use cfs specific data structures
> in the global task scheduler trace file include/trace/events/sched.h.
> And second, the accessor-functions (rq_of(), task_of(), etc.) are
> private to the cfs scheduler class. In case they would be public these
> trace events would be easier to code. That said, group_cfs_rq() is
> already made public by this patch-stack.
>
> This version bases on tip/sched/core as of yesterday (bc4278987e38). It
> has been compile tested on ~160 configurations via 0day's kbuild test
> robot.
>
> Dietmar Eggemann (5):
>   sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
>   sched/events: Introduce cfs_rq load tracking trace event
>   sched/fair: Export group_cfs_rq()
>   sched/events: Introduce sched_entity load tracking trace event
>   sched/events: Introduce task_group load tracking trace event
>
>  include/linux/sched.h        |  10 +++
>  include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/autogroup.c     |   2 -
>  kernel/sched/autogroup.h     |   2 -
>  kernel/sched/fair.c          |  26 ++++----
>  5 files changed, 167 insertions(+), 16 deletions(-)
>
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  7:56   ` Peter Zijlstra
@ 2017-03-28 13:30     ` Dietmar Eggemann
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28 13:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 09:56 AM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote:

[...]

>>  (1) a root task_group:
>>
>>      cpu=4 path=/ id=1 load=6 util=331
>
> What's @id and why do we care?

It's a per cgroup/subsystem unique id for every task_group (cpu controller):

struct task_group {
	struct cgroup_subsys_state css {
		...
		int id;
                 ...
	}
	...
}

The root task group path=/ has id=1 and all autogroups have id=0.

I agree, this id is redundant in case we have the task_group path.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/5] CFS load tracking trace events
  2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot
@ 2017-03-28 13:45   ` Dietmar Eggemann
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28 13:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 12:05 PM, Vincent Guittot wrote:
> On 28 March 2017 at 08:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

[...]

>> The following keys are used to identify the cfs scheduler brick:
>>
>>  (1) Cpu number the cfs scheduler brick is attached to.
>>
>>  (2) Task_group path and (css) id.
>>
>>  (3) Task name and pid.
>
> Do you really need both path/name and id/pid ?
>
> The path/name looks quite intrusive so can't we just use id/pid ?

One problem is that all autogroups use id=0.

Another thing with task_groups is that dealing with path="/tg1/tg11" is 
so much more intuitive than id="7".

IMHO, we do need task name and pid to be able to clearly identify a task 
(same name/different pid or fork phase (forkee still has name of forker)).

You're right, the implementation with path is more complicated but I 
guess that's worth it. We could get rid of 'id' though.

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  8:05   ` Peter Zijlstra
@ 2017-03-28 14:01     ` Dietmar Eggemann
  2017-03-28 14:03     ` Dietmar Eggemann
  1 sibling, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 10:05 AM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04d4f81b96ae..d1dcb19f5b55 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>
>>  	if (cfs_rq)
>>  		trace_sched_load_cfs_rq(cfs_rq);
>> +	else
>> +		trace_sched_load_se(container_of(sa, struct sched_entity, avg));
>>
>>  	return decayed;
>>  }
>> @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
>>  	update_tg_cfs_load(cfs_rq, se);
>>
>>  	trace_sched_load_cfs_rq(cfs_rq);
>> +	trace_sched_load_se(se);
>>
>>  	return 1;
>>  }
>
> Having back-to-back tracepoints is disgusting.
>

Yeah, avoiding putting them like this is hard since 
update_tg_cfs_util()/update_tg_cfs_load() refresh util for the cfs_rq 
and the se respectively load/runnable_load.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  8:05   ` Peter Zijlstra
  2017-03-28 14:01     ` Dietmar Eggemann
@ 2017-03-28 14:03     ` Dietmar Eggemann
  1 sibling, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 10:05 AM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04d4f81b96ae..d1dcb19f5b55 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2940,6 +2940,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>
>>  	if (cfs_rq)
>>  		trace_sched_load_cfs_rq(cfs_rq);
>> +	else
>> +		trace_sched_load_se(container_of(sa, struct sched_entity, avg));
>>
>>  	return decayed;
>>  }
>> @@ -3162,6 +3164,7 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
>>  	update_tg_cfs_load(cfs_rq, se);
>>
>>  	trace_sched_load_cfs_rq(cfs_rq);
>> +	trace_sched_load_se(se);
>>
>>  	return 1;
>>  }
>
> Having back-to-back tracepoints is disgusting.

Yeah, but avoiding putting them like this is hard since the calls to 
update_tg_cfs_util() and update_tg_cfs_load() inside 
propagate_entity_load_avg() refresh util for the cfs_rq and the se 
respectively load (and runnable_load).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28  8:08   ` Peter Zijlstra
@ 2017-03-28 14:13     ` Dietmar Eggemann
  2017-03-28 16:41       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-28 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 10:08 AM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 07:35:40AM +0100, Dietmar Eggemann wrote:
>> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
>> index 51db8a90e45f..647cfaf528fd 100644
>> --- a/include/trace/events/sched.h
>> +++ b/include/trace/events/sched.h
>> @@ -566,14 +566,15 @@ TRACE_EVENT(sched_wake_idle_without_ipi,
>>  #ifdef CONFIG_SMP
>>  #ifdef CREATE_TRACE_POINTS
>>  static inline
>> -int __trace_sched_cpu(struct cfs_rq *cfs_rq)
>> +int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>  {
>>  #ifdef CONFIG_FAIR_GROUP_SCHED
>> -	struct rq *rq = cfs_rq->rq;
>> +	struct rq *rq = cfs_rq ? cfs_rq->rq : NULL;
>>  #else
>> -	struct rq *rq = container_of(cfs_rq, struct rq, cfs);
>> +	struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL;
>>  #endif
>> -	return cpu_of(rq);
>> +	return rq ? cpu_of(rq)
>> +		  : task_cpu((container_of(se, struct task_struct, se)));
>>  }
>
> So here you duplicate lots of FAIR_GROUP internals. So why do you then
> have to expose group_cfs_rq() in the previous patch?
>

Not having group_cfs_rq() available made the trace event code too 
confusing.

But like I mentioned in the second to last paragraph in the cover 
letter, having all necessary cfs accessor-functions (rq_of(), task_of(), 
etc.) available would definitely streamline the coding effort of these 
trace events.

Do you think that making them public in include/linux/sched.h is the way 
to go? What about the namespace issue with other sched classes? Should 
they be exported with the name they have right now (since cfs was there 
first) or should they be renamed to cfs_task_of() and rq_of_cfs_rq() etc. ?

RT and Deadline class already have the own (private) accessor-functions 
(e.g. dl_task_of() or rq_of_dl_rq()).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
  2017-03-28  7:56   ` Peter Zijlstra
  2017-03-28  8:00   ` Peter Zijlstra
@ 2017-03-28 14:46   ` Steven Rostedt
  2017-03-28 16:44     ` Peter Zijlstra
  2 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2017-03-28 14:46 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, 28 Mar 2017 07:35:38 +0100
Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>  /* This part must be outside protection */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03adf9fb48b1..ac19ab6ced8f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>  	}
>  
> +	if (cfs_rq)
> +		trace_sched_load_cfs_rq(cfs_rq);
> +

Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL.

That way it moves the if (cfs_rq) out of the scheduler code and into
the jump label protected location. That is, the if is only tested when
tracing is enabled.

-- Steve


>  	return decayed;
>  }
>  
> @@ -3170,6 +3173,8 @@ static inline int propagate_entity_load_avg(struct sched_entity *se)
>  	update_tg_cfs_util(cfs_rq, se);
>  	update_tg_cfs_load(cfs_rq, se);
>  
> +	trace_sched_load_cfs_rq(cfs_rq);
> +
>  	return 1;
>  }
>  
> @@ -3359,6 +3364,8 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  	set_tg_cfs_propagate(cfs_rq);
>  
>  	cfs_rq_util_change(cfs_rq);
> +
> +	trace_sched_load_cfs_rq(cfs_rq);
>  }
>  
>  /**
> @@ -3379,6 +3386,8 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  	set_tg_cfs_propagate(cfs_rq);
>  
>  	cfs_rq_util_change(cfs_rq);
> +
> +	trace_sched_load_cfs_rq(cfs_rq);
>  }
>  
>  /* Add the load generated by se into cfs_rq's load average */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28  8:00   ` Peter Zijlstra
@ 2017-03-28 14:47     ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2017-03-28 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming,
	Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, 28 Mar 2017 10:00:50 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 28, 2017 at 07:35:38AM +0100, Dietmar Eggemann wrote:
> >  
> > +	if (cfs_rq)
> > +		trace_sched_load_cfs_rq(cfs_rq);  
> 
> You can do that with DEFINE_EVENT_CONDITION and TP_CONDITION.

I just read this after I replied about using TRACE_EVENT_CONDITION. As
there is not a DEFINE_EVENT(), the TRACE_EVENT_CONDITION() should be
used. All the locations expect cfs_rq to not be NULL I assume.

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28 14:13     ` Dietmar Eggemann
@ 2017-03-28 16:41       ` Peter Zijlstra
  2017-03-29 20:19         ` Dietmar Eggemann
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28 16:41 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 04:13:45PM +0200, Dietmar Eggemann wrote:

> Do you think that making them public in include/linux/sched.h is the way to
> go? 

No; all that stuff should really stay private. tracepoints are a very
bad reason to leak this stuff.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 14:46   ` Steven Rostedt
@ 2017-03-28 16:44     ` Peter Zijlstra
  2017-03-28 16:57       ` Peter Zijlstra
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28 16:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming,
	Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote:
> On Tue, 28 Mar 2017 07:35:38 +0100
> Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> 
> >  /* This part must be outside protection */
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 03adf9fb48b1..ac19ab6ced8f 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >  		sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> >  	}
> >  
> > +	if (cfs_rq)
> > +		trace_sched_load_cfs_rq(cfs_rq);
> > +
> 
> Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL.

I too suggested that; but then I looked again at that code and we can
actually do this. cfs_rq can be constant propagated and the if
determined at build time.

Its not immediately obvious from the current code; but if we do
something like the below, it should be clearer.

---
Subject: sched/fair: Explicitly generate __update_load_avg() instances
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Mar 28 11:08:20 CEST 2017

The __update_load_avg() function is an __always_inline because its
used with constant propagation to generate different variants of the
code without having to duplicate it (which would be prone to bugs).

Explicitly instantiate the 3 variants.

Note that most of this is called from rather hot paths, so reducing
branches is good.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6
  *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
  */
 static __always_inline int
-__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
+___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
 	u64 delta, scaled_delta, periods;
@@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru
 	return decayed;
 }
 
+static int
+__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa)
+{
+	return ___update_load_avg(now, cpu, sa, 0, 0, NULL);
+}
+
+static int
+__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa,
+		     unsigned long weight, int running)
+{
+	return ___update_load_avg(now, cpu, sa, weight, running, NULL);
+}
+
+static int
+__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
+		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
+{
+	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);
+}
+
 /*
  * Signed add and clamp on underflow.
  *
@@ -3014,6 +3034,9 @@ static inline void update_tg_load_avg(st
 void set_task_rq_fair(struct sched_entity *se,
 		      struct cfs_rq *prev, struct cfs_rq *next)
 {
+	u64 p_last_update_time;
+	u64 n_last_update_time;
+
 	if (!sched_feat(ATTACH_AGE_LOAD))
 		return;
 
@@ -3024,11 +3047,11 @@ void set_task_rq_fair(struct sched_entit
 	 * time. This will result in the wakee task is less decayed, but giving
 	 * the wakee more load sounds not bad.
 	 */
-	if (se->avg.last_update_time && prev) {
-		u64 p_last_update_time;
-		u64 n_last_update_time;
+	if (!(se->avg.last_update_time && prev))
+		return;
 
 #ifndef CONFIG_64BIT
+	{
 		u64 p_last_update_time_copy;
 		u64 n_last_update_time_copy;
 
@@ -3043,14 +3066,15 @@ void set_task_rq_fair(struct sched_entit
 
 		} while (p_last_update_time != p_last_update_time_copy ||
 			 n_last_update_time != n_last_update_time_copy);
+	}
 #else
-		p_last_update_time = prev->avg.last_update_time;
-		n_last_update_time = next->avg.last_update_time;
+	p_last_update_time = prev->avg.last_update_time;
+	n_last_update_time = next->avg.last_update_time;
 #endif
-		__update_load_avg(p_last_update_time, cpu_of(rq_of(prev)),
-				  &se->avg, 0, 0, NULL);
-		se->avg.last_update_time = n_last_update_time;
-	}
+	__update_load_avg_blocked_se(p_last_update_time,
+				     cpu_of(rq_of(prev)),
+				     &se->avg);
+	se->avg.last_update_time = n_last_update_time;
 }
 
 /* Take into account change of utilization of a child task group */
@@ -3329,9 +3353,9 @@ static inline void update_load_avg(struc
 	 * track group sched_entity load average for task_h_load calc in migration
 	 */
 	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) {
-		__update_load_avg(now, cpu, &se->avg,
+		__update_load_avg_se(now, cpu, &se->avg,
 			  se->on_rq * scale_load_down(se->load.weight),
-			  cfs_rq->curr == se, NULL);
+			  cfs_rq->curr == se);
 	}
 
 	decayed  = update_cfs_rq_load_avg(now, cfs_rq, true);
@@ -3437,7 +3461,7 @@ void sync_entity_load_avg(struct sched_e
 	u64 last_update_time;
 
 	last_update_time = cfs_rq_last_update_time(cfs_rq);
-	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+	__update_load_avg_blocked_se(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg);
 }
 
 /*

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 16:44     ` Peter Zijlstra
@ 2017-03-28 16:57       ` Peter Zijlstra
  2017-03-28 17:20         ` Patrick Bellasi
  2017-03-28 17:36       ` Steven Rostedt
  2017-03-29 21:03       ` Dietmar Eggemann
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28 16:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming,
	Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote:
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6
>   *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
>   */
>  static __always_inline int
> -__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> +___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
>  {
>  	u64 delta, scaled_delta, periods;
> @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru
>  	return decayed;
>  }
>  
> +static int
> +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa)
> +{
> +	return ___update_load_avg(now, cpu, sa, 0, 0, NULL);
> +}
> +
> +static int
> +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa,
> +		     unsigned long weight, int running)
> +{
> +	return ___update_load_avg(now, cpu, sa, weight, running, NULL);
> +}
> +
> +static int
> +__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> +		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> +{
> +	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);

Although ideally we'd be able to tell the compiler that cfs_rq will not
be NULL here. Hurmph.. no __builtin for that I think :/

> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 16:57       ` Peter Zijlstra
@ 2017-03-28 17:20         ` Patrick Bellasi
  2017-03-28 18:18           ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Patrick Bellasi @ 2017-03-28 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Dietmar Eggemann, Ingo Molnar, LKML,
	Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli

On 28-Mar 18:57, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote:
> > ---
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6
> >   *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
> >   */
> >  static __always_inline int
> > -__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > +___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >  		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> >  {
> >  	u64 delta, scaled_delta, periods;
> > @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru
> >  	return decayed;
> >  }
> >  
> > +static int
> > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa)
> > +{
> > +	return ___update_load_avg(now, cpu, sa, 0, 0, NULL);
> > +}
> > +
> > +static int
> > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa,
> > +		     unsigned long weight, int running)
> > +{
> > +	return ___update_load_avg(now, cpu, sa, weight, running, NULL);
> > +}
> > +
> > +static int
> > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > +		  unsigned long weight, int running, struct cfs_rq *cfs_rq)

                  __attribute__((nonnull (6)));

> > +{
> > +	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);
> 
> Although ideally we'd be able to tell the compiler that cfs_rq will not
> be NULL here. Hurmph.. no __builtin for that I think :/

What about the above attribute?

> 
> > +}

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 16:44     ` Peter Zijlstra
  2017-03-28 16:57       ` Peter Zijlstra
@ 2017-03-28 17:36       ` Steven Rostedt
  2017-03-28 17:37         ` Steven Rostedt
  2017-03-29 21:03       ` Dietmar Eggemann
  2 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2017-03-28 17:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming,
	Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, 28 Mar 2017 18:44:59 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote:
> > On Tue, 28 Mar 2017 07:35:38 +0100
> > Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >   
> > >  /* This part must be outside protection */
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 03adf9fb48b1..ac19ab6ced8f 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2950,6 +2950,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > >  		sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> > >  	}
> > >  
> > > +	if (cfs_rq)
> > > +		trace_sched_load_cfs_rq(cfs_rq);
> > > +  
> > 
> > Please use TRACE_EVENT_CONDITION(), and test for cfs_rq not NULL.  
> 
> I too suggested that; but then I looked again at that code and we can
> actually do this. cfs_rq can be constant propagated and the if
> determined at build time.
> 
> Its not immediately obvious from the current code; but if we do
> something like the below, it should be clearer.
> 

But why play games, and rely on the design of the code? A
TRACE_EVENT_CONDTION() is more robust and documents that this
tracepoint should not be called when cfs_rq is NULL.

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 17:36       ` Steven Rostedt
@ 2017-03-28 17:37         ` Steven Rostedt
  2017-03-29 20:37           ` Dietmar Eggemann
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2017-03-28 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dietmar Eggemann, Ingo Molnar, LKML, Matt Fleming,
	Vincent Guittot, Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Tue, 28 Mar 2017 13:36:26 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> But why play games, and rely on the design of the code? A
> TRACE_EVENT_CONDTION() is more robust and documents that this
> tracepoint should not be called when cfs_rq is NULL.

In other words, what are you trying to save for not using the
TRACE_EVENT_CONDITION()?

-- Steve

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 17:20         ` Patrick Bellasi
@ 2017-03-28 18:18           ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-28 18:18 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Steven Rostedt, Dietmar Eggemann, Ingo Molnar, LKML,
	Matt Fleming, Vincent Guittot, Morten Rasmussen, Juri Lelli

On Tue, Mar 28, 2017 at 06:20:05PM +0100, Patrick Bellasi wrote:
> On 28-Mar 18:57, Peter Zijlstra wrote:
> > On Tue, Mar 28, 2017 at 06:44:59PM +0200, Peter Zijlstra wrote:

> > > +static int
> > > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > > +		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> 
>                   __attribute__((nonnull (6)));
> 
> > > +{
> > > +	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);
> > 
> > Although ideally we'd be able to tell the compiler that cfs_rq will not
> > be NULL here. Hurmph.. no __builtin for that I think :/
> 
> What about the above attribute?

Ooh, shiny, thanks! My bad for failing to check the function attributes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event
  2017-03-28 16:41       ` Peter Zijlstra
@ 2017-03-29 20:19         ` Dietmar Eggemann
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-29 20:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot, Steven Rostedt,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 06:41 PM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 04:13:45PM +0200, Dietmar Eggemann wrote:
>
>> Do you think that making them public in include/linux/sched.h is the way to
>> go?
>
> No; all that stuff should really stay private. tracepoints are a very
> bad reason to leak this stuff.

Understood & makes sense to me. In hindsight, it's not too complicated 
to code group_cfs_rq in include/trace/events/sched.h.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 17:37         ` Steven Rostedt
@ 2017-03-29 20:37           ` Dietmar Eggemann
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-29 20:37 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 07:37 PM, Steven Rostedt wrote:
> On Tue, 28 Mar 2017 13:36:26 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> But why play games, and rely on the design of the code? A
>> TRACE_EVENT_CONDTION() is more robust and documents that this
>> tracepoint should not be called when cfs_rq is NULL.
>
> In other words, what are you trying to save for not using the
> TRACE_EVENT_CONDITION()?

IMHO, if we could avoid this

   if(cfs_rq)
     trace_sched_load_cfs_rq(cfs_rq);
   else
     trace_sched_load_se(container_of(sa, struct sched_entity, avg));

in __update_load_avg(), then we can use 'unconditional' TRACE_EVENTs in 
all call-sites:

__update_load_avg{_cfs_rq}(), propagate_entity_load_avg(), 
attach_entity_load_avg(), detach_entity_load_avg() for cfs_rq and

__update_load_avg_blocked_se(), __update_load_avg_se(), 
propagate_entity_load_avg() for se.

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-28 16:44     ` Peter Zijlstra
  2017-03-28 16:57       ` Peter Zijlstra
  2017-03-28 17:36       ` Steven Rostedt
@ 2017-03-29 21:03       ` Dietmar Eggemann
  2017-03-30  7:04         ` Peter Zijlstra
  2 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-29 21:03 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: Ingo Molnar, LKML, Matt Fleming, Vincent Guittot,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/28/2017 06:44 PM, Peter Zijlstra wrote:
> On Tue, Mar 28, 2017 at 10:46:00AM -0400, Steven Rostedt wrote:
>> On Tue, 28 Mar 2017 07:35:38 +0100
>> Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

[...]

> I too suggested that; but then I looked again at that code and we can
> actually do this. cfs_rq can be constant propagated and the if
> determined at build time.
>
> Its not immediately obvious from the current code; but if we do
> something like the below, it should be clearer.
>
> ---
> Subject: sched/fair: Explicitly generate __update_load_avg() instances
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Mar 28 11:08:20 CEST 2017
>
> The __update_load_avg() function is an __always_inline because its
> used with constant propagation to generate different variants of the
> code without having to duplicate it (which would be prone to bugs).

Ah, so the if(cfs_rq)/else condition should stay in ___update_load_avg() 
and I shouldn't move the trace events into the 3 variants?

I tried to verify that the if is determined at build time but it's kind 
of hard with trace_events.

> Explicitly instantiate the 3 variants.
>
> Note that most of this is called from rather hot paths, so reducing
> branches is good.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6
>   *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
>   */
>  static __always_inline int
> -__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> +___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
>  {
>  	u64 delta, scaled_delta, periods;
> @@ -2953,6 +2953,26 @@ __update_load_avg(u64 now, int cpu, stru
>  	return decayed;
>  }
>
> +static int
> +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa)
> +{
> +	return ___update_load_avg(now, cpu, sa, 0, 0, NULL);
> +}
> +
> +static int
> +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa,
> +		     unsigned long weight, int running)
> +{
> +	return ___update_load_avg(now, cpu, sa, weight, running, NULL);
> +}
> +
> +static int
> +__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> +		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> +{
> +	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);
> +}

Why not reduce the parameter list of these 3 incarnations to 'now, cpu, 
object'?

static int
__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)

static int
__update_load_avg_se(u64 now, int cpu, struct sched_entity *se)

static int
__update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-29 21:03       ` Dietmar Eggemann
@ 2017-03-30  7:04         ` Peter Zijlstra
  2017-03-30  7:46           ` Dietmar Eggemann
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2017-03-30  7:04 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Steven Rostedt, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On Wed, Mar 29, 2017 at 11:03:45PM +0200, Dietmar Eggemann wrote:

> > +static int
> > +__update_load_avg_blocked_se(u64 now, int cpu, struct sched_avg *sa)
> > +{
> > +	return ___update_load_avg(now, cpu, sa, 0, 0, NULL);
> > +}
> > +
> > +static int
> > +__update_load_avg_se(u64 now, int cpu, struct sched_avg *sa,
> > +		     unsigned long weight, int running)
> > +{
> > +	return ___update_load_avg(now, cpu, sa, weight, running, NULL);
> > +}
> > +
> > +static int
> > +__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > +		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> > +{
> > +	return ___update_load_avg(now, cpu, sa, weight, running, cfs_rq);
> > +}
> 
> Why not reduce the parameter list of these 3 incarnations to 'now, cpu,
> object'?
> 
> static int
> __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
> 
> static int
> __update_load_avg_se(u64 now, int cpu, struct sched_entity *se)
> 
> static int
> __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
> 
> [...]

doesn't quite work with se, but yes good idea.

And this way we don't need the nonnull attribute either, because it
should be clear from having dereferenced it that it cannot be null.

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2849,7 +2849,7 @@ static u32 __compute_runnable_contrib(u6
  *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
  */
 static __always_inline int
-__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
+___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
 	u64 delta, scaled_delta, periods;
@@ -2953,6 +2953,28 @@ __update_load_avg(u64 now, int cpu, stru
 	return decayed;
 }
 
+static int
+__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
+{
+	return ___update_load_avg(now, cpu, &se->avg, 0, 0, NULL);
+}
+
+static int
+__update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	return ___update_load_avg(now, cpu, &se->avg,
+				  se->on_rq * scale_load_down(se->load.weight),
+				  cfs_rq->curr == se, NULL);
+}
+
+static int
+__update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
+{
+	return ___update_load_avg(now, cpu, &cfs_rq->avg,
+			scale_down_load(cfs_rq->load.weight),
+			cfs_rq->curr != NULL, cfs_rq);
+}
+
 /*
  * Signed add and clamp on underflow.
  *
@@ -3014,6 +3036,9 @@ static inline void update_tg_load_avg(st
 void set_task_rq_fair(struct sched_entity *se,
 		      struct cfs_rq *prev, struct cfs_rq *next)
 {
+	u64 p_last_update_time;
+	u64 n_last_update_time;
+
 	if (!sched_feat(ATTACH_AGE_LOAD))
 		return;
 
@@ -3024,11 +3049,11 @@ void set_task_rq_fair(struct sched_entit
 	 * time. This will result in the wakee task is less decayed, but giving
 	 * the wakee more load sounds not bad.
 	 */
-	if (se->avg.last_update_time && prev) {
-		u64 p_last_update_time;
-		u64 n_last_update_time;
+	if (!(se->avg.last_update_time && prev))
+		return;
 
 #ifndef CONFIG_64BIT
+	{
 		u64 p_last_update_time_copy;
 		u64 n_last_update_time_copy;
 
@@ -3043,14 +3068,13 @@ void set_task_rq_fair(struct sched_entit
 
 		} while (p_last_update_time != p_last_update_time_copy ||
 			 n_last_update_time != n_last_update_time_copy);
+	}
 #else
-		p_last_update_time = prev->avg.last_update_time;
-		n_last_update_time = next->avg.last_update_time;
+	p_last_update_time = prev->avg.last_update_time;
+	n_last_update_time = next->avg.last_update_time;
 #endif
-		__update_load_avg(p_last_update_time, cpu_of(rq_of(prev)),
-				  &se->avg, 0, 0, NULL);
-		se->avg.last_update_time = n_last_update_time;
-	}
+	__update_load_avg_blocked_se(p_last_update_time, cpu_of(rq_of(prev)), se);
+	se->avg.last_update_time = n_last_update_time;
 }
 
 /* Take into account change of utilization of a child task group */
@@ -3295,8 +3319,7 @@ update_cfs_rq_load_avg(u64 now, struct c
 		set_tg_cfs_propagate(cfs_rq);
 	}
 
-	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-		scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
+	decayed = __update_load_avg_cfs_rq(now, cpu_of(rq_of(cfs_rq)), cfs_rq);
 
 #ifndef CONFIG_64BIT
 	smp_wmb();
@@ -3328,11 +3351,8 @@ static inline void update_load_avg(struc
 	 * Track task load average for carrying it to new CPU after migrated, and
 	 * track group sched_entity load average for task_h_load calc in migration
 	 */
-	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) {
-		__update_load_avg(now, cpu, &se->avg,
-			  se->on_rq * scale_load_down(se->load.weight),
-			  cfs_rq->curr == se, NULL);
-	}
+	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
+		__update_load_avg_se(now, cpu, cfs_rq, se);
 
 	decayed  = update_cfs_rq_load_avg(now, cfs_rq, true);
 	decayed |= propagate_entity_load_avg(se);
@@ -3437,7 +3457,7 @@ void sync_entity_load_avg(struct sched_e
 	u64 last_update_time;
 
 	last_update_time = cfs_rq_last_update_time(cfs_rq);
-	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+	__update_load_avg_blocked_se(last_update_time, cpu_of(rq_of(cfs_rq)), se);
 }
 
 /*

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event
  2017-03-30  7:04         ` Peter Zijlstra
@ 2017-03-30  7:46           ` Dietmar Eggemann
  0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2017-03-30  7:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Ingo Molnar, LKML, Matt Fleming, Vincent Guittot,
	Morten Rasmussen, Juri Lelli, Patrick Bellasi

On 03/30/2017 09:04 AM, Peter Zijlstra wrote:
> On Wed, Mar 29, 2017 at 11:03:45PM +0200, Dietmar Eggemann wrote:

[...]

>> Why not reduce the parameter list of these 3 incarnations to 'now, cpu,
>> object'?
>>
>> static int
>> __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
>>
>> static int
>> __update_load_avg_se(u64 now, int cpu, struct sched_entity *se)
>>
>> static int
>> __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
>>
>> [...]
>
> doesn't quite work with se, but yes good idea.

Ah, OK, you don't like to use 'cfs_rq_of(se)->curr == se' in 
__update_load_avg_se(). The reason is that it's already fetched in 
update_load_avg()?

> And this way we don't need the nonnull attribute either, because it
> should be clear from having dereferenced it that it cannot be null.

Yes, this would be clearer now.

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-03-30  7:46 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28  6:35 [RFC PATCH 0/5] CFS load tracking trace events Dietmar Eggemann
2017-03-28  6:35 ` [RFC PATCH 1/5] sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG Dietmar Eggemann
2017-03-28  6:35 ` [RFC PATCH 2/5] sched/events: Introduce cfs_rq load tracking trace event Dietmar Eggemann
2017-03-28  7:56   ` Peter Zijlstra
2017-03-28 13:30     ` Dietmar Eggemann
2017-03-28  8:00   ` Peter Zijlstra
2017-03-28 14:47     ` Steven Rostedt
2017-03-28 14:46   ` Steven Rostedt
2017-03-28 16:44     ` Peter Zijlstra
2017-03-28 16:57       ` Peter Zijlstra
2017-03-28 17:20         ` Patrick Bellasi
2017-03-28 18:18           ` Peter Zijlstra
2017-03-28 17:36       ` Steven Rostedt
2017-03-28 17:37         ` Steven Rostedt
2017-03-29 20:37           ` Dietmar Eggemann
2017-03-29 21:03       ` Dietmar Eggemann
2017-03-30  7:04         ` Peter Zijlstra
2017-03-30  7:46           ` Dietmar Eggemann
2017-03-28  6:35 ` [RFC PATCH 3/5] sched/fair: Export group_cfs_rq() Dietmar Eggemann
2017-03-28  6:35 ` [RFC PATCH 4/5] sched/events: Introduce sched_entity load tracking trace event Dietmar Eggemann
2017-03-28  8:05   ` Peter Zijlstra
2017-03-28 14:01     ` Dietmar Eggemann
2017-03-28 14:03     ` Dietmar Eggemann
2017-03-28  8:08   ` Peter Zijlstra
2017-03-28 14:13     ` Dietmar Eggemann
2017-03-28 16:41       ` Peter Zijlstra
2017-03-29 20:19         ` Dietmar Eggemann
2017-03-28  6:35 ` [RFC PATCH 5/5] sched/events: Introduce task_group " Dietmar Eggemann
2017-03-28 10:05 ` [RFC PATCH 0/5] CFS load tracking trace events Vincent Guittot
2017-03-28 13:45   ` Dietmar Eggemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.