* [PATCH] sched: Add trace events for Proxy Execution (PE)
@ 2024-02-02 8:33 Metin Kaya
2024-02-21 14:23 ` Steven Rostedt
0 siblings, 1 reply; 3+ messages in thread
From: Metin Kaya @ 2024-02-02 8:33 UTC (permalink / raw)
To: linux-kernel
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Valentin Schneider, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Ben Segall, Zimuzo Ezeozue, Youssef Esmat,
Mel Gorman, Daniel Bristot de Oliveira, Will Deacon, Waiman Long,
Boqun Feng, Paul E. McKenney, Xuewen Yan, K Prateek Nayak,
Thomas Gleixner, kernel-team, linux-trace-kernel
Add sched_[start, finish]_task_selection trace events to measure the
latency of PE patches in task selection.
Moreover, introduce trace events for interesting events in PE:
1. sched_pe_enqueue_sleeping_task: a task gets enqueued on wait queue of
a sleeping task (mutex owner).
2. sched_pe_cross_remote_cpu: dependency chain crosses remote CPU.
3. sched_pe_task_is_migrating: mutex owner task migrates.
New trace events can be tested via this command:
$ perf trace \
-e sched:sched_start_task_selection \
-e sched:sched_finish_task_selection \
-e sched:sched_pe_enqueue_sleeping_task \
-e sched:sched_pe_cross_remote_cpu \
-e sched:sched_pe_task_is_migrating
Notes:
1. These trace events are not intended to merge upstream. Instead, they
are only for making PE tests easier and will be converted to trace
points once PE patches hit upstream.
2. This patch is based on John's Proxy Execution v7 patch series (see
the link below) which is also available at
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v7-6.7-rc6/.
Link: https://lore.kernel.org/linux-kernel/CANDhNCrHd+5twWVNqBAhVLfhMhkiO0KjxXBmwVgaCD4kAyFyWw@mail.gmail.com/
Signed-off-by: Metin Kaya <metin.kaya@arm.com>
CC: John Stultz <jstultz@google.com>
CC: Joel Fernandes <joelaf@google.com>
CC: Qais Yousef <qyousef@google.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Valentin Schneider <vschneid@redhat.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Masami Hiramatsu <mhiramat@kernel.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Ben Segall <bsegall@google.com>
CC: Zimuzo Ezeozue <zezeozue@google.com>
CC: Youssef Esmat <youssefesmat@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Daniel Bristot de Oliveira <bristot@redhat.com>
CC: Will Deacon <will@kernel.org>
CC: Waiman Long <longman@redhat.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: "Paul E. McKenney" <paulmck@kernel.org>
CC: Xuewen Yan <xuewen.yan94@gmail.com>
CC: K Prateek Nayak <kprateek.nayak@amd.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: kernel-team@android.com
CC: linux-trace-kernel@vger.kernel.org
---
include/trace/events/sched.h | 138 +++++++++++++++++++++++++++++++++++
kernel/sched/core.c | 11 +++
2 files changed, 149 insertions(+)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 6188ad0d9e0d..2b08509f3088 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -737,6 +737,144 @@ TRACE_EVENT(sched_wake_idle_without_ipi,
TP_printk("cpu=%d", __entry->cpu)
);
+#ifdef CONFIG_SCHED_PROXY_EXEC
+/**
+ * sched_pe_enqueue_sleeping_task - called when a task is enqueued on wait
+ * queue of a sleeping task (mutex owner).
+ * @mutex_owner: pointer to struct task_struct
+ * @blocked: pointer to struct task_struct
+ */
+TRACE_EVENT(sched_pe_enqueue_sleeping_task,
+
+ TP_PROTO(struct task_struct *mutex_owner, struct task_struct *blocked),
+
+ TP_ARGS(mutex_owner, blocked),
+
+ TP_STRUCT__entry(
+ __array(char, owner_comm, TASK_COMM_LEN )
+ __field(pid_t, owner_pid )
+ __field(int, owner_prio )
+ __field(int, owner_cpu )
+ __array(char, blocked_comm, TASK_COMM_LEN )
+ __field(pid_t, blocked_pid )
+ __field(int, blocked_prio )
+ __field(int, blocked_cpu )
+ ),
+
+ TP_fast_assign(
+ strscpy(__entry->owner_comm, mutex_owner->comm, TASK_COMM_LEN);
+ __entry->owner_pid = mutex_owner->pid;
+ __entry->owner_prio = mutex_owner->prio; /* XXX SCHED_DEADLINE */
+ __entry->owner_cpu = task_cpu(mutex_owner);
+
+ strscpy(__entry->blocked_comm, blocked->comm, TASK_COMM_LEN);
+ __entry->blocked_pid = blocked->pid;
+ __entry->blocked_prio = blocked->prio; /* XXX SCHED_DEADLINE */
+ __entry->blocked_cpu = task_cpu(blocked);
+ ),
+
+ TP_printk("task=%s pid=%d prio=%d cpu=%d blocked_on owner_task=%s owner_pid=%d owner_prio=%d owner_cpu=%d",
+ __entry->blocked_comm, __entry->blocked_pid,
+ __entry->blocked_prio, __entry->blocked_cpu,
+ __entry->owner_comm, __entry->owner_pid,
+ __entry->owner_prio, __entry->owner_cpu)
+);
+
+/**
+ * sched_pe_cross_remote_cpu - called when dependency chain crosses remote CPU
+ * @p: pointer to struct task_struct
+ */
+TRACE_EVENT(sched_pe_cross_remote_cpu,
+
+ TP_PROTO(struct task_struct *p),
+
+ TP_ARGS(p),
+
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __field(pid_t, pid )
+ __field(int, prio )
+ __field(int, cpu )
+ ),
+
+ TP_fast_assign(
+ strscpy(__entry->comm, p->comm, TASK_COMM_LEN);
+ __entry->pid = p->pid;
+ __entry->prio = p->prio; /* XXX SCHED_DEADLINE */
+ __entry->cpu = task_cpu(p);
+ ),
+
+ TP_printk("comm=%s pid=%d prio=%d cpu=%d",
+ __entry->comm, __entry->pid, __entry->prio, __entry->cpu)
+);
+
+/**
+ * sched_pe_task_is_migrating - called when mutex owner is in migrating state
+ * @p: pointer to struct task_struct
+ */
+TRACE_EVENT(sched_pe_task_is_migrating,
+
+ TP_PROTO(struct task_struct *p),
+
+ TP_ARGS(p),
+
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __field(pid_t, pid )
+ __field(int, prio )
+ ),
+
+ TP_fast_assign(
+ strscpy(__entry->comm, p->comm, TASK_COMM_LEN);
+ __entry->pid = p->pid;
+ __entry->prio = p->prio; /* XXX SCHED_DEADLINE */
+ ),
+
+ TP_printk("comm=%s pid=%d prio=%d",
+ __entry->comm, __entry->pid, __entry->prio)
+);
+#endif /* CONFIG_SCHED_PROXY_EXEC */
+
+DECLARE_EVENT_CLASS(sched_task_selection_template,
+
+ TP_PROTO(int cpu),
+
+ TP_ARGS(cpu),
+
+ TP_STRUCT__entry(
+ __field(int, cpu)
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ ),
+
+ TP_printk("cpu=%d",
+ __entry->cpu)
+);
+
+/**
+ * sched_start_task_selection - called before selecting next task in
+ * __schedule()
+ * @cpu: The CPU which will run task selection operation.
+ */
+DEFINE_EVENT(sched_task_selection_template, sched_start_task_selection,
+
+ TP_PROTO(int cpu),
+
+ TP_ARGS(cpu));
+
+/**
+ * sched_finish_task_selection - called after selecting next task in
+ * __schedule()
+ * @cpu: The CPU which ran task selection operation.
+ */
+DEFINE_EVENT(sched_task_selection_template, sched_finish_task_selection,
+
+ TP_PROTO(int cpu),
+
+ TP_ARGS(cpu));
+
/*
* Following tracepoints are not exported in tracefs and provide hooking
* mechanisms only for testing and debugging purposes.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 30dfb6f14f2b..866809e52971 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7006,6 +7006,9 @@ static void proxy_enqueue_on_owner(struct rq *rq, struct task_struct *owner,
*/
if (!owner->on_rq) {
BUG_ON(!next->on_rq);
+
+ trace_sched_pe_enqueue_sleeping_task(owner, next);
+
deactivate_task(rq, next, DEQUEUE_SLEEP);
if (task_current_selected(rq, next)) {
put_prev_task(rq, next);
@@ -7100,6 +7103,9 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
if (task_cpu(owner) != cur_cpu) {
target_cpu = task_cpu(owner);
+
+ trace_sched_pe_cross_remote_cpu(owner);
+
/*
* @owner can disappear, simply migrate to @target_cpu and leave that CPU
* to sort things out.
@@ -7113,6 +7119,8 @@ find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
}
if (task_on_rq_migrating(owner)) {
+ trace_sched_pe_task_is_migrating(owner);
+
/*
* One of the chain of mutex owners is currently migrating to this
* CPU, but has not yet been enqueued because we are holding the
@@ -7335,6 +7343,8 @@ static void __sched notrace __schedule(unsigned int sched_mode)
}
prev_not_proxied = !prev->blocked_donor;
+
+ trace_sched_start_task_selection(cpu);
pick_again:
next = pick_next_task(rq, rq_selected(rq), &rf);
rq_set_selected(rq, next);
@@ -7350,6 +7360,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
if (next == rq->idle && prev == rq->idle)
preserve_need_resched = true;
}
+ trace_sched_finish_task_selection(cpu);
if (!preserve_need_resched)
clear_tsk_need_resched(prev);
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] sched: Add trace events for Proxy Execution (PE)
2024-02-02 8:33 [PATCH] sched: Add trace events for Proxy Execution (PE) Metin Kaya
@ 2024-02-21 14:23 ` Steven Rostedt
2024-02-21 14:24 ` Metin Kaya
0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2024-02-21 14:23 UTC (permalink / raw)
To: Metin Kaya
Cc: linux-kernel, John Stultz, Joel Fernandes, Qais Yousef,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Valentin Schneider, Masami Hiramatsu,
Mathieu Desnoyers, Ben Segall, Zimuzo Ezeozue, Youssef Esmat,
Mel Gorman, Daniel Bristot de Oliveira, Will Deacon, Waiman Long,
Boqun Feng, Paul E. McKenney, Xuewen Yan, K Prateek Nayak,
Thomas Gleixner, kernel-team, linux-trace-kernel
On Fri, 2 Feb 2024 08:33:38 +0000
Metin Kaya <metin.kaya@arm.com> wrote:
> Add sched_[start, finish]_task_selection trace events to measure the
> latency of PE patches in task selection.
>
> Moreover, introduce trace events for interesting events in PE:
> 1. sched_pe_enqueue_sleeping_task: a task gets enqueued on wait queue of
> a sleeping task (mutex owner).
> 2. sched_pe_cross_remote_cpu: dependency chain crosses remote CPU.
> 3. sched_pe_task_is_migrating: mutex owner task migrates.
>
> New trace events can be tested via this command:
> $ perf trace \
> -e sched:sched_start_task_selection \
> -e sched:sched_finish_task_selection \
> -e sched:sched_pe_enqueue_sleeping_task \
> -e sched:sched_pe_cross_remote_cpu \
> -e sched:sched_pe_task_is_migrating
>
> Notes:
> 1. These trace events are not intended to merge upstream. Instead, they
> are only for making PE tests easier and will be converted to trace
> points once PE patches hit upstream.
I wonder if the tracepoints should be added though? That is, not adding the
trace_events that show up in tracefs, but just the tracepoints so that bpf
or local modules could hook to them?
-- Steve
> 2. This patch is based on John's Proxy Execution v7 patch series (see
> the link below) which is also available at
> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v7-6.7-rc6/.
>
> Link: https://lore.kernel.org/linux-kernel/CANDhNCrHd+5twWVNqBAhVLfhMhkiO0KjxXBmwVgaCD4kAyFyWw@mail.gmail.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched: Add trace events for Proxy Execution (PE)
2024-02-21 14:23 ` Steven Rostedt
@ 2024-02-21 14:24 ` Metin Kaya
0 siblings, 0 replies; 3+ messages in thread
From: Metin Kaya @ 2024-02-21 14:24 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, John Stultz, Joel Fernandes, Qais Yousef,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Valentin Schneider, Masami Hiramatsu,
Mathieu Desnoyers, Ben Segall, Zimuzo Ezeozue, Youssef Esmat,
Mel Gorman, Daniel Bristot de Oliveira, Will Deacon, Waiman Long,
Boqun Feng, Paul E. McKenney, Xuewen Yan, K Prateek Nayak,
Thomas Gleixner, kernel-team, linux-trace-kernel
On 21/02/2024 2:23 pm, Steven Rostedt wrote:
> On Fri, 2 Feb 2024 08:33:38 +0000
> Metin Kaya <metin.kaya@arm.com> wrote:
>
>> Add sched_[start, finish]_task_selection trace events to measure the
>> latency of PE patches in task selection.
>>
>> Moreover, introduce trace events for interesting events in PE:
>> 1. sched_pe_enqueue_sleeping_task: a task gets enqueued on wait queue of
>> a sleeping task (mutex owner).
>> 2. sched_pe_cross_remote_cpu: dependency chain crosses remote CPU.
>> 3. sched_pe_task_is_migrating: mutex owner task migrates.
>>
>> New trace events can be tested via this command:
>> $ perf trace \
>> -e sched:sched_start_task_selection \
>> -e sched:sched_finish_task_selection \
>> -e sched:sched_pe_enqueue_sleeping_task \
>> -e sched:sched_pe_cross_remote_cpu \
>> -e sched:sched_pe_task_is_migrating
>>
>> Notes:
>> 1. These trace events are not intended to merge upstream. Instead, they
>> are only for making PE tests easier and will be converted to trace
>> points once PE patches hit upstream.
>
> I wonder if the tracepoints should be added though? That is, not adding the
> trace_events that show up in tracefs, but just the tracepoints so that bpf
> or local modules could hook to them?
Yep, the intention is providing necessary support for modules (e.g.,
https://github.com/ARM-software/lisa/blob/main/lisa/_assets/kmodules/lisa/tp.c).
>
> -- Steve
>
>
>> 2. This patch is based on John's Proxy Execution v7 patch series (see
>> the link below) which is also available at
>> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v7-6.7-rc6/.
>>
>> Link: https://lore.kernel.org/linux-kernel/CANDhNCrHd+5twWVNqBAhVLfhMhkiO0KjxXBmwVgaCD4kAyFyWw@mail.gmail.com/
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-02-21 14:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-02 8:33 [PATCH] sched: Add trace events for Proxy Execution (PE) Metin Kaya
2024-02-21 14:23 ` Steven Rostedt
2024-02-21 14:24 ` Metin Kaya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).