linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] sched: support schedstat for RT sched class
@ 2020-11-23 12:58 Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 1/5] sched: don't include stats.h in sched.h Yafang Shao
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

We want to measure the latency of RT tasks in our production
environment with schedstat facility, but currently schedstat is only
supported for fair sched class. This patchset enable it for RT sched class
as well.

The schedstat statistics are defined in struct sched_entity, which is a
member of struct task_struct, so we can resue it for RT sched class.

The schedstat usage in RT sched class is similar with fair sched class,
for example,
                fair                            RT
enqueue         update_stats_enqueue_fair       update_stats_enqueue_rt
dequeue         update_stats_dequeue_fair       update_stats_dequeue_rt
put_prev_task   update_stats_wait_start         update_stats_wait_start
set_next_task   update_stats_wait_end           update_stats_wait_end
show            /proc/[pid]/sched               /proc/[pid]/sched

The sched:sched_stats_* tracepoints can be used to trace RT tasks as
well after that patchset.

PATCH #1 ~ #4 are the preparation of PATCH #5.

- v2:
keep the schedstats functions inline, per Mel.

Yafang Shao (5):
  sched: don't include stats.h in sched.h
  sched: define task_of() as a common helper
  sched: make schedstats helper independent of cfs_rq
  sched: define update_stats_curr_start() as a common helper
  sched, rt: support schedstat for RT sched class

 kernel/sched/core.c      |   1 +
 kernel/sched/deadline.c  |   1 +
 kernel/sched/debug.c     |   1 +
 kernel/sched/fair.c      | 174 ++-------------------------------------
 kernel/sched/idle.c      |   1 +
 kernel/sched/rt.c        |  94 ++++++++++++++++++++-
 kernel/sched/sched.h     |  30 ++++++-
 kernel/sched/stats.c     |   1 +
 kernel/sched/stats.h     | 146 ++++++++++++++++++++++++++++++++
 kernel/sched/stop_task.c |   1 +
 10 files changed, 280 insertions(+), 170 deletions(-)

-- 
2.18.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 1/5] sched: don't include stats.h in sched.h
  2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
@ 2020-11-23 12:58 ` Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 2/5] sched: define task_of() as a common helper Yafang Shao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

This patch is a preparation of the followup patches. In the followup
patches some common helpers will be defined in stats.h, and these common
helpers require some definitions in sched.h, so let's move stats.h out
of sched.h.

The source files which require stats.h include it specifically.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/core.c      | 1 +
 kernel/sched/deadline.c  | 1 +
 kernel/sched/debug.c     | 1 +
 kernel/sched/fair.c      | 1 +
 kernel/sched/idle.c      | 1 +
 kernel/sched/rt.c        | 2 +-
 kernel/sched/sched.h     | 6 +++++-
 kernel/sched/stats.c     | 1 +
 kernel/sched/stats.h     | 2 ++
 kernel/sched/stop_task.c | 1 +
 10 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d2003a7d5ab5..fd76628778f7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11,6 +11,7 @@
 #undef CREATE_TRACE_POINTS
 
 #include "sched.h"
+#include "stats.h"
 
 #include <linux/nospec.h>
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f232305dcefe..7a0124f81a4f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -16,6 +16,7 @@
  *                    Fabio Checconi <fchecconi@gmail.com>
  */
 #include "sched.h"
+#include "stats.h"
 #include "pelt.h"
 
 struct dl_bandwidth def_dl_bandwidth;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2357921580f9..9758aa1bba1e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -7,6 +7,7 @@
  * Copyright(C) 2007, Red Hat, Inc., Ingo Molnar
  */
 #include "sched.h"
+#include "stats.h"
 
 static DEFINE_SPINLOCK(sched_debug_lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8917d2d715ef..8ff1daa3d9bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -21,6 +21,7 @@
  *  Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
  */
 #include "sched.h"
+#include "stats.h"
 
 /*
  * Targeted preemption latency for CPU-bound tasks:
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 24d0ee26377d..95c02cbca04a 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -7,6 +7,7 @@
  *        tasks which are handled in sched/fair.c )
  */
 #include "sched.h"
+#include "stats.h"
 
 #include <trace/events/power.h>
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 49ec096a8aa1..af772ac0f32d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -4,7 +4,7 @@
  * policies)
  */
 #include "sched.h"
-
+#include "stats.h"
 #include "pelt.h"
 
 int sched_rr_timeslice = RR_TIMESLICE;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index df80bfcea92e..871544bb9a38 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2,6 +2,9 @@
 /*
  * Scheduler internal types and methods:
  */
+#ifndef _KERNEL_SCHED_SCHED_H
+#define _KERNEL_SCHED_SCHED_H
+
 #include <linux/sched.h>
 
 #include <linux/sched/autogroup.h>
@@ -1538,7 +1541,6 @@ extern void flush_smp_call_function_from_idle(void);
 static inline void flush_smp_call_function_from_idle(void) { }
 #endif
 
-#include "stats.h"
 #include "autogroup.h"
 
 #ifdef CONFIG_CGROUP_SCHED
@@ -2633,3 +2635,5 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 
 void swake_up_all_locked(struct swait_queue_head *q);
 void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+#endif	/* _KERNEL_SCHED_SCHED_H */
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 750fb3c67eed..844bd9dbfbf0 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -3,6 +3,7 @@
  * /proc/schedstat implementation
  */
 #include "sched.h"
+#include "stats.h"
 
 /*
  * Current schedstat API version.
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 33d0daf83842..c23b653ffc53 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -2,6 +2,8 @@
 
 #ifdef CONFIG_SCHEDSTATS
 
+#include "sched.h"
+
 /*
  * Expects runqueue lock to be held for atomicity of update
  */
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index ceb5b6b12561..a5d289049388 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -8,6 +8,7 @@
  * See kernel/stop_machine.c
  */
 #include "sched.h"
+#include "stats.h"
 
 #ifdef CONFIG_SMP
 static int
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 2/5] sched: define task_of() as a common helper
  2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 1/5] sched: don't include stats.h in sched.h Yafang Shao
@ 2020-11-23 12:58 ` Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq Yafang Shao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

task_of() is used to get task_struct from sched_entity. As sched_entity
in struct task_struct can be used by all sched class, we'd better move
this macro into sched.h, then it can be used by all sched class.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/fair.c  | 11 -----------
 kernel/sched/sched.h |  8 ++++++++
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ff1daa3d9bb..59e454cae3be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -259,12 +259,6 @@ const struct sched_class fair_sched_class;
  */
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static inline struct task_struct *task_of(struct sched_entity *se)
-{
-	SCHED_WARN_ON(!entity_is_task(se));
-	return container_of(se, struct task_struct, se);
-}
-
 /* Walk up scheduling entities hierarchy */
 #define for_each_sched_entity(se) \
 		for (; se; se = se->parent)
@@ -446,11 +440,6 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 
 #else	/* !CONFIG_FAIR_GROUP_SCHED */
 
-static inline struct task_struct *task_of(struct sched_entity *se)
-{
-	return container_of(se, struct task_struct, se);
-}
-
 #define for_each_sched_entity(se) \
 		for (; se; se = NULL)
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 871544bb9a38..9a4576ccf3d7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2636,4 +2636,12 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 void swake_up_all_locked(struct swait_queue_head *q);
 void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
 
+static inline struct task_struct *task_of(struct sched_entity *se)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	SCHED_WARN_ON(!entity_is_task(se));
+#endif
+	return container_of(se, struct task_struct, se);
+}
+
 #endif	/* _KERNEL_SCHED_SCHED_H */
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq
  2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 1/5] sched: don't include stats.h in sched.h Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 2/5] sched: define task_of() as a common helper Yafang Shao
@ 2020-11-23 12:58 ` Yafang Shao
  2020-11-24 11:40   ` Mel Gorman
  2020-11-23 12:58 ` [RFC PATCH v2 4/5] sched: define update_stats_curr_start() as a common helper Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 5/5] sched, rt: support schedstat for RT sched class Yafang Shao
  4 siblings, 1 reply; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

The 'cfs_rq' in these helpers
update_stats_{wait_start, wait_end, enqueue_sleeper} is only used to get
the rq_clock, so we can pass the rq directly. Then these helpers can be
used by all sched class after being moved into stats.h.

After that change, the size of vmlinux is increased around 824Bytes.
			w/o this patch, with this patch
Size of vmlinux:	78443832	78444656

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/fair.c  | 148 ++-----------------------------------------
 kernel/sched/stats.h | 144 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 149 insertions(+), 143 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 59e454cae3be..946b60f586e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -869,124 +869,6 @@ static void update_curr_fair(struct rq *rq)
 	update_curr(cfs_rq_of(&rq->curr->se));
 }
 
-static inline void
-update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	u64 wait_start, prev_wait_start;
-
-	if (!schedstat_enabled())
-		return;
-
-	wait_start = rq_clock(rq_of(cfs_rq));
-	prev_wait_start = schedstat_val(se->statistics.wait_start);
-
-	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)) &&
-	    likely(wait_start > prev_wait_start))
-		wait_start -= prev_wait_start;
-
-	__schedstat_set(se->statistics.wait_start, wait_start);
-}
-
-static inline void
-update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	struct task_struct *p;
-	u64 delta;
-
-	if (!schedstat_enabled())
-		return;
-
-	delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start);
-
-	if (entity_is_task(se)) {
-		p = task_of(se);
-		if (task_on_rq_migrating(p)) {
-			/*
-			 * Preserve migrating task's wait time so wait_start
-			 * time stamp can be adjusted to accumulate wait time
-			 * prior to migration.
-			 */
-			__schedstat_set(se->statistics.wait_start, delta);
-			return;
-		}
-		trace_sched_stat_wait(p, delta);
-	}
-
-	__schedstat_set(se->statistics.wait_max,
-		      max(schedstat_val(se->statistics.wait_max), delta));
-	__schedstat_inc(se->statistics.wait_count);
-	__schedstat_add(se->statistics.wait_sum, delta);
-	__schedstat_set(se->statistics.wait_start, 0);
-}
-
-static inline void
-update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	struct task_struct *tsk = NULL;
-	u64 sleep_start, block_start;
-
-	if (!schedstat_enabled())
-		return;
-
-	sleep_start = schedstat_val(se->statistics.sleep_start);
-	block_start = schedstat_val(se->statistics.block_start);
-
-	if (entity_is_task(se))
-		tsk = task_of(se);
-
-	if (sleep_start) {
-		u64 delta = rq_clock(rq_of(cfs_rq)) - sleep_start;
-
-		if ((s64)delta < 0)
-			delta = 0;
-
-		if (unlikely(delta > schedstat_val(se->statistics.sleep_max)))
-			__schedstat_set(se->statistics.sleep_max, delta);
-
-		__schedstat_set(se->statistics.sleep_start, 0);
-		__schedstat_add(se->statistics.sum_sleep_runtime, delta);
-
-		if (tsk) {
-			account_scheduler_latency(tsk, delta >> 10, 1);
-			trace_sched_stat_sleep(tsk, delta);
-		}
-	}
-	if (block_start) {
-		u64 delta = rq_clock(rq_of(cfs_rq)) - block_start;
-
-		if ((s64)delta < 0)
-			delta = 0;
-
-		if (unlikely(delta > schedstat_val(se->statistics.block_max)))
-			__schedstat_set(se->statistics.block_max, delta);
-
-		__schedstat_set(se->statistics.block_start, 0);
-		__schedstat_add(se->statistics.sum_sleep_runtime, delta);
-
-		if (tsk) {
-			if (tsk->in_iowait) {
-				__schedstat_add(se->statistics.iowait_sum, delta);
-				__schedstat_inc(se->statistics.iowait_count);
-				trace_sched_stat_iowait(tsk, delta);
-			}
-
-			trace_sched_stat_blocked(tsk, delta);
-
-			/*
-			 * Blocking time is in units of nanosecs, so shift by
-			 * 20 to get a milliseconds-range estimation of the
-			 * amount of time that the task spent sleeping:
-			 */
-			if (unlikely(prof_on == SLEEP_PROFILING)) {
-				profile_hits(SLEEP_PROFILING,
-						(void *)get_wchan(tsk),
-						delta >> 20);
-			}
-			account_scheduler_latency(tsk, delta >> 10, 0);
-		}
-	}
-}
-
 /*
  * Task is being enqueued - update stats:
  */
@@ -1001,10 +883,10 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * a dequeue/enqueue event is a NOP)
 	 */
 	if (se != cfs_rq->curr)
-		update_stats_wait_start(cfs_rq, se);
+		update_stats_wait_start(rq_of(cfs_rq), se);
 
 	if (flags & ENQUEUE_WAKEUP)
-		update_stats_enqueue_sleeper(cfs_rq, se);
+		update_stats_enqueue_sleeper(rq_of(cfs_rq), se);
 }
 
 static inline void
@@ -1019,7 +901,7 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * waiting task:
 	 */
 	if (se != cfs_rq->curr)
-		update_stats_wait_end(cfs_rq, se);
+		update_stats_wait_end(rq_of(cfs_rq), se);
 
 	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
 		struct task_struct *tsk = task_of(se);
@@ -4128,26 +4010,6 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
 
-static inline void check_schedstat_required(void)
-{
-#ifdef CONFIG_SCHEDSTATS
-	if (schedstat_enabled())
-		return;
-
-	/* Force schedstat enabled if a dependent tracepoint is active */
-	if (trace_sched_stat_wait_enabled()    ||
-			trace_sched_stat_sleep_enabled()   ||
-			trace_sched_stat_iowait_enabled()  ||
-			trace_sched_stat_blocked_enabled() ||
-			trace_sched_stat_runtime_enabled())  {
-		printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, "
-			     "stat_blocked and stat_runtime require the "
-			     "kernel parameter schedstats=enable or "
-			     "kernel.sched_schedstats=1\n");
-	}
-#endif
-}
-
 static inline bool cfs_bandwidth_used(void);
 
 /*
@@ -4388,7 +4250,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		 * a CPU. So account for the time it spent waiting on the
 		 * runqueue.
 		 */
-		update_stats_wait_end(cfs_rq, se);
+		update_stats_wait_end(rq_of(cfs_rq), se);
 		__dequeue_entity(cfs_rq, se);
 		update_load_avg(cfs_rq, se, UPDATE_TG);
 	}
@@ -4489,7 +4351,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
 	check_spread(cfs_rq, prev);
 
 	if (prev->on_rq) {
-		update_stats_wait_start(cfs_rq, prev);
+		update_stats_wait_start(rq_of(cfs_rq), prev);
 		/* Put 'current' back into the tree. */
 		__enqueue_entity(cfs_rq, prev);
 		/* in !on_rq case, update occurred at dequeue */
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index c23b653ffc53..966cc408bd8b 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -42,6 +42,144 @@ rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
 #define   schedstat_val(var)		(var)
 #define   schedstat_val_or_zero(var)	((schedstat_enabled()) ? (var) : 0)
 
+static inline void
+update_stats_wait_start(struct rq *rq, struct sched_entity *se)
+{
+	u64 wait_start, prev_wait_start;
+
+	if (!schedstat_enabled())
+		return;
+
+	wait_start = rq_clock(rq);
+	prev_wait_start = schedstat_val(se->statistics.wait_start);
+
+	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)) &&
+	    likely(wait_start > prev_wait_start))
+		wait_start -= prev_wait_start;
+
+	__schedstat_set(se->statistics.wait_start, wait_start);
+}
+
+static inline void
+update_stats_wait_end(struct rq *rq, struct sched_entity *se)
+{
+	struct task_struct *p;
+	u64 delta;
+
+	if (!schedstat_enabled())
+		return;
+
+	delta = rq_clock(rq) - schedstat_val(se->statistics.wait_start);
+
+	if (entity_is_task(se)) {
+		p = task_of(se);
+		if (task_on_rq_migrating(p)) {
+			/*
+			 * Preserve migrating task's wait time so wait_start
+			 * time stamp can be adjusted to accumulate wait time
+			 * prior to migration.
+			 */
+			__schedstat_set(se->statistics.wait_start, delta);
+			return;
+		}
+		trace_sched_stat_wait(p, delta);
+	}
+
+	__schedstat_set(se->statistics.wait_max,
+		      max(schedstat_val(se->statistics.wait_max), delta));
+	__schedstat_inc(se->statistics.wait_count);
+	__schedstat_add(se->statistics.wait_sum, delta);
+	__schedstat_set(se->statistics.wait_start, 0);
+}
+
+static inline void
+update_stats_enqueue_sleeper(struct rq *rq, struct sched_entity *se)
+{
+	struct task_struct *tsk = NULL;
+	u64 sleep_start, block_start;
+
+	if (!schedstat_enabled())
+		return;
+
+	sleep_start = schedstat_val(se->statistics.sleep_start);
+	block_start = schedstat_val(se->statistics.block_start);
+
+	if (entity_is_task(se))
+		tsk = task_of(se);
+
+	if (sleep_start) {
+		u64 delta = rq_clock(rq) - sleep_start;
+
+		if ((s64)delta < 0)
+			delta = 0;
+
+		if (unlikely(delta > schedstat_val(se->statistics.sleep_max)))
+			__schedstat_set(se->statistics.sleep_max, delta);
+
+		__schedstat_set(se->statistics.sleep_start, 0);
+		__schedstat_add(se->statistics.sum_sleep_runtime, delta);
+
+		if (tsk) {
+			account_scheduler_latency(tsk, delta >> 10, 1);
+			trace_sched_stat_sleep(tsk, delta);
+		}
+	}
+
+	if (block_start) {
+		u64 delta = rq_clock(rq) - block_start;
+
+		if ((s64)delta < 0)
+			delta = 0;
+
+		if (unlikely(delta > schedstat_val(se->statistics.block_max)))
+			__schedstat_set(se->statistics.block_max, delta);
+
+		__schedstat_set(se->statistics.block_start, 0);
+		__schedstat_add(se->statistics.sum_sleep_runtime, delta);
+
+		if (tsk) {
+			if (tsk->in_iowait) {
+				__schedstat_add(se->statistics.iowait_sum, delta);
+				__schedstat_inc(se->statistics.iowait_count);
+				trace_sched_stat_iowait(tsk, delta);
+			}
+
+			trace_sched_stat_blocked(tsk, delta);
+
+			/*
+			 * Blocking time is in units of nanosecs, so shift by
+			 * 20 to get a milliseconds-range estimation of the
+			 * amount of time that the task spent sleeping:
+			 */
+			if (unlikely(prof_on == SLEEP_PROFILING)) {
+				profile_hits(SLEEP_PROFILING,
+						(void *)get_wchan(tsk),
+						delta >> 20);
+			}
+			account_scheduler_latency(tsk, delta >> 10, 0);
+		}
+	}
+}
+
+static inline void
+check_schedstat_required(void)
+{
+	if (schedstat_enabled())
+		return;
+
+	/* Force schedstat enabled if a dependent tracepoint is active */
+	if (trace_sched_stat_wait_enabled()    ||
+			trace_sched_stat_sleep_enabled()   ||
+			trace_sched_stat_iowait_enabled()  ||
+			trace_sched_stat_blocked_enabled() ||
+			trace_sched_stat_runtime_enabled())  {
+		printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, "
+			     "stat_blocked and stat_runtime require the "
+			     "kernel parameter schedstats=enable or "
+			     "kernel.sched_schedstats=1\n");
+	}
+}
+
 #else /* !CONFIG_SCHEDSTATS: */
 static inline void rq_sched_info_arrive  (struct rq *rq, unsigned long long delta) { }
 static inline void rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) { }
@@ -55,6 +193,12 @@ static inline void rq_sched_info_depart  (struct rq *rq, unsigned long long delt
 # define   schedstat_set(var, val)	do { } while (0)
 # define   schedstat_val(var)		0
 # define   schedstat_val_or_zero(var)	0
+
+# define update_stats_wait_start(rq, se)	do { } while (0)
+# define update_stats_wait_end(rq, se)		do { } while (0)
+# define update_stats_enqueue_sleeper(rq, se)	do { } while (0)
+# define check_schedstat_required()		do { } while (0)
+
 #endif /* CONFIG_SCHEDSTATS */
 
 #ifdef CONFIG_PSI
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 4/5] sched: define update_stats_curr_start() as a common helper
  2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
                   ` (2 preceding siblings ...)
  2020-11-23 12:58 ` [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq Yafang Shao
@ 2020-11-23 12:58 ` Yafang Shao
  2020-11-23 12:58 ` [RFC PATCH v2 5/5] sched, rt: support schedstat for RT sched class Yafang Shao
  4 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

update_stats_curr_start() is used to update the exec_start when we are
starting a new run period, which is used by all sched class. So we'd
better define it as a common helper.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/fair.c  | 14 +-------------
 kernel/sched/rt.c    |  2 +-
 kernel/sched/sched.h | 12 ++++++++++++
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 946b60f586e4..13e803369ced 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -915,18 +915,6 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	}
 }
 
-/*
- * We are picking a new current task - update its stats:
- */
-static inline void
-update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	/*
-	 * We are starting a new run period:
-	 */
-	se->exec_start = rq_clock_task(rq_of(cfs_rq));
-}
-
 /**************************************************
  * Scheduling class queueing methods:
  */
@@ -4255,7 +4243,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		update_load_avg(cfs_rq, se, UPDATE_TG);
 	}
 
-	update_stats_curr_start(cfs_rq, se);
+	update_stats_curr_start(rq_of(cfs_rq), se);
 	cfs_rq->curr = se;
 
 	/*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index af772ac0f32d..3422dd85cfb4 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1574,7 +1574,7 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flag
 
 static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
 {
-	p->se.exec_start = rq_clock_task(rq);
+	update_stats_curr_start(rq, &p->se);
 
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a4576ccf3d7..3948112dc31c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2644,4 +2644,16 @@ static inline struct task_struct *task_of(struct sched_entity *se)
 	return container_of(se, struct task_struct, se);
 }
 
+/*
+ * We are picking a new current task - update its stats:
+ */
+static inline void
+update_stats_curr_start(struct rq *rq, struct sched_entity *se)
+{
+	/*
+	 * We are starting a new run period:
+	 */
+	se->exec_start = rq_clock_task(rq);
+}
+
 #endif	/* _KERNEL_SCHED_SCHED_H */
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 5/5] sched, rt: support schedstat for RT sched class
  2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
                   ` (3 preceding siblings ...)
  2020-11-23 12:58 ` [RFC PATCH v2 4/5] sched: define update_stats_curr_start() as a common helper Yafang Shao
@ 2020-11-23 12:58 ` Yafang Shao
  4 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-23 12:58 UTC (permalink / raw)
  To: mgorman, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot
  Cc: linux-kernel, linux-rt-users, Yafang Shao

We want to measure the latency of RT tasks in our production
environment with schedstat facility, but currently schedstat is only
supported for fair sched class. This patch enable it for RT sched class
as well.

The schedstat statistics are define in struct sched_entity, which is a
member of struct task_struct, so we can resue it for RT sched class.

The schedstat usage in RT sched class is similar with fair sched class,
for example,
		fair				RT
enqueue		update_stats_enqueue_fair	update_stats_enqueue_rt
dequeue		update_stats_dequeue_fair	update_stats_dequeue_rt
put_prev_task	update_stats_wait_start		update_stats_wait_start
set_next_task	update_stats_wait_end		update_stats_wait_end
show		/proc/[pid]/sched		/proc/[pid]/sched

The sched:sched_stats_* tracepoints can be used to trace RT tasks as
well.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/rt.c    | 90 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  4 ++
 2 files changed, 94 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 3422dd85cfb4..f2eff92275f0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1246,6 +1246,75 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	dec_rt_group(rt_se, rt_rq);
 }
 
+#ifdef CONFIG_SCHEDSTATS
+
+static inline bool
+rt_se_is_waiting(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	return rt_se != rt_rq->curr;
+}
+
+static inline void
+rt_rq_curr_set(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	rt_rq->curr = rt_se;
+}
+
+#else
+
+static inline bool
+rt_se_is_waiting(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	return false;
+}
+
+static inline void
+rt_rq_curr_set(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+}
+
+#endif
+
+static inline void
+update_stats_enqueue_rt(struct rq *rq, struct sched_entity *se,
+			struct sched_rt_entity *rt_se, int flags)
+{
+	struct rt_rq *rt_rq = &rq->rt;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_se_is_waiting(rt_rq, rt_se))
+		update_stats_wait_start(rq, se);
+
+	if (flags & ENQUEUE_WAKEUP)
+		update_stats_enqueue_sleeper(rq, se);
+}
+
+static inline void
+update_stats_dequeue_rt(struct rq *rq, struct sched_entity *se,
+			struct sched_rt_entity *rt_se, int flags)
+{
+	struct rt_rq *rt_rq = &rq->rt;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_se_is_waiting(rt_rq, rt_se))
+		update_stats_wait_end(rq, se);
+
+	if ((flags & DEQUEUE_SLEEP) && rt_entity_is_task(rt_se)) {
+		struct task_struct *tsk = rt_task_of(rt_se);
+
+		if (tsk->state & TASK_INTERRUPTIBLE)
+			__schedstat_set(se->statistics.sleep_start,
+					rq_clock(rq));
+		if (tsk->state & TASK_UNINTERRUPTIBLE)
+			__schedstat_set(se->statistics.block_start,
+					rq_clock(rq));
+	}
+}
+
 /*
  * Change rt_se->run_list location unless SAVE && !MOVE
  *
@@ -1275,6 +1344,7 @@ static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
 	struct rt_prio_array *array = &rt_rq->active;
 	struct rt_rq *group_rq = group_rt_rq(rt_se);
 	struct list_head *queue = array->queue + rt_se_prio(rt_se);
+	struct task_struct *task = rt_task_of(rt_se);
 
 	/*
 	 * Don't enqueue the group if its throttled, or when empty.
@@ -1288,6 +1358,8 @@ static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
 		return;
 	}
 
+	update_stats_enqueue_rt(rq_of_rt_rq(rt_rq), &task->se, rt_se, flags);
+
 	if (move_entity(flags)) {
 		WARN_ON_ONCE(rt_se->on_list);
 		if (flags & ENQUEUE_HEAD)
@@ -1307,7 +1379,9 @@ static void __dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flag
 {
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
 	struct rt_prio_array *array = &rt_rq->active;
+	struct task_struct *task = rt_task_of(rt_se);
 
+	update_stats_dequeue_rt(rq_of_rt_rq(rt_rq), &task->se, rt_se, flags);
 	if (move_entity(flags)) {
 		WARN_ON_ONCE(!rt_se->on_list);
 		__delist_rt_entity(rt_se, array);
@@ -1374,6 +1448,7 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	if (flags & ENQUEUE_WAKEUP)
 		rt_se->timeout = 0;
 
+	check_schedstat_required();
 	enqueue_rt_entity(rt_se, flags);
 
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
@@ -1574,6 +1649,12 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flag
 
 static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
 {
+	struct sched_rt_entity *rt_se = &p->rt;
+	struct rt_rq *rt_rq = &rq->rt;
+
+	if (on_rt_rq(&p->rt))
+		update_stats_wait_end(rq, &p->se);
+
 	update_stats_curr_start(rq, &p->se);
 
 	/* The running task is never eligible for pushing */
@@ -1591,6 +1672,8 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 
 	rt_queue_push_tasks(rq);
+
+	rt_rq_curr_set(rt_rq, rt_se);
 }
 
 static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
@@ -1638,6 +1721,11 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
 
 static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq = &rq->rt;
+
+	if (on_rt_rq(&p->rt))
+		update_stats_wait_start(rq, &p->se);
+
 	update_curr_rt(rq);
 
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
@@ -1648,6 +1736,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 	 */
 	if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_task(rq, p);
+
+	rt_rq_curr_set(rt_rq, NULL);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3948112dc31c..a9a2f579f50c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -652,6 +652,10 @@ struct rt_rq {
 	struct rq		*rq;
 	struct task_group	*tg;
 #endif
+
+#ifdef CONFIG_SCHEDSTATS
+	struct sched_rt_entity  *curr;
+#endif
 };
 
 static inline bool rt_rq_is_runnable(struct rt_rq *rt_rq)
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq
  2020-11-23 12:58 ` [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq Yafang Shao
@ 2020-11-24 11:40   ` Mel Gorman
  2020-11-24 13:08     ` Yafang Shao
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2020-11-24 11:40 UTC (permalink / raw)
  To: Yafang Shao
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, linux-kernel, linux-rt-users

On Mon, Nov 23, 2020 at 08:58:06PM +0800, Yafang Shao wrote:
> The 'cfs_rq' in these helpers
> update_stats_{wait_start, wait_end, enqueue_sleeper} is only used to get
> the rq_clock, so we can pass the rq directly. Then these helpers can be
> used by all sched class after being moved into stats.h.
> 
> After that change, the size of vmlinux is increased around 824Bytes.
> 			w/o this patch, with this patch
> Size of vmlinux:	78443832	78444656
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

The inline helpers are quite large. When I was suggesting that the overhead
was minimal, what I expected what that the inline functions would be a
schedstat_enabled() followed by a real function call. It would introduce
a small additional overhead when schedstats are enabled but avoid vmlinux
growing too large

e.g.
 
static inline void
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
	if (!schedstat_enabled())
		return;

	__update_stats_wait_start(cfs_rq, se);
}

where __update_stats_wait_start then lives in kernel/sched/stats.c

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq
  2020-11-24 11:40   ` Mel Gorman
@ 2020-11-24 13:08     ` Yafang Shao
  0 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2020-11-24 13:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Benjamin Segall, bristot, LKML,
	linux-rt-users

On Tue, Nov 24, 2020 at 7:40 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Mon, Nov 23, 2020 at 08:58:06PM +0800, Yafang Shao wrote:
> > The 'cfs_rq' in these helpers
> > update_stats_{wait_start, wait_end, enqueue_sleeper} is only used to get
> > the rq_clock, so we can pass the rq directly. Then these helpers can be
> > used by all sched class after being moved into stats.h.
> >
> > After that change, the size of vmlinux is increased around 824Bytes.
> >                       w/o this patch, with this patch
> > Size of vmlinux:      78443832        78444656
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> The inline helpers are quite large. When I was suggesting that the overhead
> was minimal, what I expected what that the inline functions would be a
> schedstat_enabled() followed by a real function call. It would introduce
> a small additional overhead when schedstats are enabled but avoid vmlinux
> growing too large
>
> e.g.
>
> static inline void
> update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
>         if (!schedstat_enabled())
>                 return;
>
>         __update_stats_wait_start(cfs_rq, se);
> }
>
> where __update_stats_wait_start then lives in kernel/sched/stats.c
>

Good idea!  Now I understand what you mean. Thanks for the detailed explanation.
I will update it in the next version.


-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-24 13:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23 12:58 [RFC PATCH v2 0/5] sched: support schedstat for RT sched class Yafang Shao
2020-11-23 12:58 ` [RFC PATCH v2 1/5] sched: don't include stats.h in sched.h Yafang Shao
2020-11-23 12:58 ` [RFC PATCH v2 2/5] sched: define task_of() as a common helper Yafang Shao
2020-11-23 12:58 ` [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq Yafang Shao
2020-11-24 11:40   ` Mel Gorman
2020-11-24 13:08     ` Yafang Shao
2020-11-23 12:58 ` [RFC PATCH v2 4/5] sched: define update_stats_curr_start() as a common helper Yafang Shao
2020-11-23 12:58 ` [RFC PATCH v2 5/5] sched, rt: support schedstat for RT sched class Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).