linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: mgorman@suse.de, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, bristot@redhat.com, qianjun.kernel@gmail.com
Cc: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH 6/6] sched, rt: support schedstats for RT sched class
Date: Tue,  1 Dec 2020 19:54:16 +0800	[thread overview]
Message-ID: <20201201115416.26515-7-laoar.shao@gmail.com> (raw)
In-Reply-To: <20201201115416.26515-1-laoar.shao@gmail.com>

We want to measure the latency of RT tasks in our production
environment with schedstats facility, but currently schedstats is only
supported for fair sched class. This patch enable it for RT sched class
as well.

After we make the struct sched_statistics and the helpers of it
independent of fair sched class, we can easily use the schedstats
facility for RT sched class.

The schedstat usage in RT sched class is similar with fair sched class,
for example,
                fair                            RT
enqueue         update_stats_enqueue_fair       update_stats_enqueue_rt
dequeue         update_stats_dequeue_fair       update_stats_dequeue_rt
put_prev_task   update_stats_wait_start         update_stats_wait_start
set_next_task   update_stats_wait_end           update_stats_wait_end

The user can get the schedstats information in the same way in fair sched
class. For example,
		fair				RT
task show	/proc/[pid]/sched               /proc/[pid]/sched
group show	cpu.stat in cgroup		cpu.stat in cgroup

The output of a RT task's schedstats as follows,
$ cat /proc/10461/sched
...
stats.sum_sleep_runtime                      :         37966.502936
stats.wait_start                             :             0.000000
stats.sleep_start                            :             0.000000
stats.block_start                            :        279182.986012
stats.sleep_max                              :             9.001121
stats.block_max                              :             9.292692
stats.exec_max                               :             0.090009
stats.slice_max                              :             0.000000
stats.wait_max                               :             0.005305
stats.wait_sum                               :             0.352352
stats.wait_count                             :               236173
stats.iowait_sum                             :         37875.625128
stats.iowait_count                           :               235933
stats.nr_migrations_cold                     :                    0
stats.nr_failed_migrations_affine            :                    0
stats.nr_failed_migrations_running           :                    0
stats.nr_failed_migrations_hot               :                    0
stats.nr_forced_migrations                   :                    0
stats.nr_wakeups                             :               236172
stats.nr_wakeups_sync                        :                    0
stats.nr_wakeups_migrate                     :                    2
stats.nr_wakeups_local                       :               235865
stats.nr_wakeups_remote                      :                  307
stats.nr_wakeups_affine                      :                    0
stats.nr_wakeups_affine_attempts             :                    0
stats.nr_wakeups_passive                     :                    0
stats.nr_wakeups_idle                        :                    0
...

The sched:sched_stat_{wait, sleep, iowait, blocked} tracepoints can
be used to trace RT tasks as well.

The output of these tracepoints for a RT tasks as follows,

- blocked & iowait
    kworker/48:1-442     [048] d...   539.830872: sched_stat_iowait: comm=stress pid=10461 delay=158242 [ns]
    kworker/48:1-442     [048] d...   539.830872: sched_stat_blocked: comm=stress pid=10461 delay=158242 [ns]

- wait
          stress-10460   [001] dN..   813.965304: sched_stat_wait: comm=stress pid=10462 delay=99997536 [ns]
          stress-10462   [001] d.h.   813.966300: sched_stat_runtime: comm=stress pid=10462 runtime=993812 [ns] vruntime=0 [ns]
          [...]
          stress-10462   [001] d.h.   814.065300: sched_stat_runtime: comm=stress pid=10462 runtime=994484 [ns] vruntime=0 [ns]
          [ totally 100 times of sched_stat_runtime for pid 10462]
          [ The delay of pid 10462 is the sum of above runtime ]
          stress-10462   [001] dN..   814.065307: sched_stat_wait: comm=stress pid=10460 delay=100001089 [ns]
          stress-10460   [001] d.h.   814.066299: sched_stat_runtime: comm=stress pid=10460 runtime=991934 [ns] vruntime=0 [ns]

- sleep
           sleep-15582   [041] dN..  1732.814348: sched_stat_sleep: comm=sleep.sh pid=15474 delay=1001223130 [ns]
           sleep-15584   [041] dN..  1733.815908: sched_stat_sleep: comm=sleep.sh pid=15474 delay=1001238954 [ns]
           [ In sleep.sh, it sleeps 1 sec each time. ]

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/rt.c | 134 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 133 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index da989653b0a2..f764c2b9070d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1271,6 +1271,121 @@ static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_arr
 	rt_se->on_list = 0;
 }
 
+static inline void
+__schedstat_from_sched_rt_entity(struct sched_rt_entity *rt_se,
+				 struct sched_statistics **stats)
+{
+	struct task_struct *p;
+	struct task_group *tg;
+	struct rt_rq *rt_rq;
+	int cpu;
+
+	if (rt_entity_is_task(rt_se)) {
+		p = rt_task_of(rt_se);
+		*stats = &p->stats;
+	} else {
+		rt_rq = group_rt_rq(rt_se);
+		tg = rt_rq->tg;
+		cpu = cpu_of(rq_of_rt_rq(rt_rq));
+		*stats = tg->stats[cpu];
+	}
+}
+
+static inline void
+schedstat_from_sched_rt_entity(struct sched_rt_entity *rt_se,
+			       struct sched_statistics **stats)
+{
+	if (!schedstat_enabled())
+		return;
+
+	__schedstat_from_sched_rt_entity(rt_se, stats);
+}
+
+static inline void
+update_stats_wait_start_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	struct sched_statistics *stats = NULL;
+	struct task_struct *p = NULL;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_entity_is_task(rt_se))
+		p = rt_task_of(rt_se);
+
+	__schedstat_from_sched_rt_entity(rt_se, &stats);
+
+	__update_stats_wait_start(rq_of_rt_rq(rt_rq), p, stats);
+}
+
+static inline void
+update_stats_enqueue_sleeper_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	struct sched_statistics *stats = NULL;
+	struct task_struct *p = NULL;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_entity_is_task(rt_se))
+		p = rt_task_of(rt_se);
+
+	__schedstat_from_sched_rt_entity(rt_se, &stats);
+
+	__update_stats_enqueue_sleeper(rq_of_rt_rq(rt_rq), p, stats);
+}
+
+static inline void
+update_stats_enqueue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
+			int flags)
+{
+	if (!schedstat_enabled())
+		return;
+
+	if (flags & ENQUEUE_WAKEUP)
+		update_stats_enqueue_sleeper_rt(rt_rq, rt_se);
+}
+
+static inline void
+update_stats_wait_end_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
+{
+	struct sched_statistics *stats = NULL;
+	struct task_struct *p = NULL;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_entity_is_task(rt_se))
+		p = rt_task_of(rt_se);
+
+	__schedstat_from_sched_rt_entity(rt_se, &stats);
+
+	__update_stats_wait_end(rq_of_rt_rq(rt_rq), p, stats);
+}
+
+static inline void
+update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
+			int flags)
+{
+	struct task_struct *p = NULL;
+
+	if (!schedstat_enabled())
+		return;
+
+	if (rt_entity_is_task(rt_se))
+		p = rt_task_of(rt_se);
+
+	if ((flags & DEQUEUE_SLEEP) && p) {
+		if (p->state & TASK_INTERRUPTIBLE)
+			__schedstat_set(p->stats.sleep_start,
+					rq_clock(rq_of_rt_rq(rt_rq)));
+
+		if (p->state & TASK_UNINTERRUPTIBLE)
+			__schedstat_set(p->stats.block_start,
+					rq_clock(rq_of_rt_rq(rt_rq)));
+	}
+}
+
 static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
@@ -1344,6 +1459,8 @@ static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rq *rq = rq_of_rt_se(rt_se);
 
+	update_stats_enqueue_rt(rt_rq_of_se(rt_se), rt_se, flags);
+
 	dequeue_rt_stack(rt_se, flags);
 	for_each_sched_rt_entity(rt_se)
 		__enqueue_rt_entity(rt_se, flags);
@@ -1354,6 +1471,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 {
 	struct rq *rq = rq_of_rt_se(rt_se);
 
+	update_stats_dequeue_rt(rt_rq_of_se(rt_se), rt_se, flags);
 	dequeue_rt_stack(rt_se, flags);
 
 	for_each_sched_rt_entity(rt_se) {
@@ -1376,6 +1494,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	if (flags & ENQUEUE_WAKEUP)
 		rt_se->timeout = 0;
 
+	check_schedstat_required();
+	update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se);
+
 	enqueue_rt_entity(rt_se, flags);
 
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
@@ -1574,9 +1695,14 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flag
 #endif
 }
 
-static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
+void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
 {
+	struct sched_rt_entity *rt_se = &p->rt;
+	struct rt_rq *rt_rq = &rq->rt;
+
 	p->se.exec_start = rq_clock_task(rq);
+	if (on_rt_rq(&p->rt))
+		update_stats_wait_end_rt(rt_rq, rt_se);
 
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
@@ -1640,6 +1766,12 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
 
 static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 {
+	struct sched_rt_entity *rt_se = &p->rt;
+	struct rt_rq *rt_rq = &rq->rt;
+
+	if (on_rt_rq(&p->rt))
+		update_stats_wait_start_rt(rt_rq, rt_se);
+
 	update_curr_rt(rq);
 
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
-- 
2.18.4


      parent reply	other threads:[~2020-12-01 11:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01 11:54 [PATCH 0/6] sched: support schedstats for RT sched class Yafang Shao
2020-12-01 11:54 ` [PATCH 1/6] sched: don't include stats.h in sched.h Yafang Shao
2020-12-01 11:54 ` [PATCH 2/6] sched, fair: use __schedstat_set() in set_next_entity() Yafang Shao
2020-12-01 12:28   ` Mel Gorman
2020-12-01 11:54 ` [PATCH 3/6] sched: make struct sched_statistics independent of fair sched class Yafang Shao
2020-12-01 12:41   ` Mel Gorman
2020-12-02  2:06     ` Yafang Shao
2020-12-01 11:54 ` [PATCH 4/6] sched: make schedstats helpers " Yafang Shao
2020-12-01 12:46   ` Mel Gorman
2020-12-02  2:04     ` Yafang Shao
2020-12-01 11:54 ` [PATCH 5/6] sched, rt: support sched_stat_runtime tracepoint for RT " Yafang Shao
2020-12-01 11:54 ` Yafang Shao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201201115416.26515-7-laoar.shao@gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qianjun.kernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).