* [PATCH 0/4] Tracepoints for deadline scheduler @ 2016-02-22 17:08 Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 1/4] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 17:08 UTC (permalink / raw) To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli Cc: Arnaldo Carvalho de Melo, LKML, linux-rt-users Deadline tasks behave differently of other tasks because deadline task's also depend on their period, deadline and runtime. Hence, the well known sched:sched_wakeup and sched:sched_switch tracepoints are not always enough to precisely explain the behavior of a deadline task with respect to the task's period, deadline and runtime consumption. This patch series implements tracepoints for the following deadline scheduler points of interest: - sched:sched_deadline_replenish: Informs the periodic runtime replenishment of a deadline task. - sched:sched_deadline_yield: Informs that a deadline task called sched_yield(), and will wait for the next period. - sched:sched_deadline_throttle: Informs that a task consumed all its available runtime and was throttled. - sched:sched_deadline_block: Informs that a deadline task went to sleep waiting to be awakened by another task. Daniel Bristot de Oliveira (3): sched: Move deadline container_of() helper functions into sched.h sched/deadline: Tracepoints for deadline scheduler tools lib traceevent: Implements '%' operation Steven Rostedt (Red Hat) (1): tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers include/trace/events/sched.h | 87 ++++++++++++++++++++++++++++++++++++++ include/trace/trace_events.h | 25 +++++++++++ kernel/sched/deadline.c | 34 +++++++-------- kernel/sched/sched.h | 18 ++++++++ tools/lib/traceevent/event-parse.c | 4 ++ 5 files changed, 149 insertions(+), 19 deletions(-) -- 2.5.0 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers 2016-02-22 17:08 [PATCH 0/4] Tracepoints for deadline scheduler Daniel Bristot de Oliveira @ 2016-02-22 17:08 ` Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 2/4] sched: Move deadline container_of() helper functions into sched.h Daniel Bristot de Oliveira ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 17:08 UTC (permalink / raw) To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli Cc: Arnaldo Carvalho de Melo, LKML, linux-rt-users From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> To have nanosecond output displayed in a more human readable format, its nicer to convert it to a seconds format (XXX.YYYYYYYYY). The problem is that to do so, the numbers must be divided by NSEC_PER_SEC, and moded too. But as these numbers are 64 bit, this can not be done simply with '/' and '%' operators, but must use do_div() instead. Instead of performing the expensive do_div() in the hot path of the tracepoint, it is more efficient to perform it during the output phase. But passing in do_div() can confuse the parser, and do_div() doesn't work exactly like a normal C function. It modifies the number in place, and we don't want to modify the actual values in the ring buffer. Two helper functions are now created: __print_ns_to_secs() and __print_ns_without_secs() They both take a value of nanoseconds, and the former will return that number divided by NSEC_PER_SEC, and the latter will mod it with NSEC_PER_SEC giving a way to print a nice human readable format: __print_fmt("time=%llu.%09u", __print_ns_to_secs(REC->nsec_val), __print_ns_without_secs(REC->nsec_val)) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- include/trace/trace_events.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 170c93b..23a5822 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -283,6 +283,21 @@ TRACE_MAKE_SYSTEM_STR(); trace_print_symbols_seq(p, value, symbols); \ }) +#undef __print_ns_to_secs +#define __print_ns_to_secs(value) \ + ({ \ + u64 ____val = (u64)value; \ + do_div(____val, NSEC_PER_SEC); \ + ____val; \ + }) + +#undef __print_ns_without_secs +#define __print_ns_without_secs(value) \ + ({ \ + u64 ____val = (u64)value; \ + do_div(____val, NSEC_PER_SEC); \ + }) + #undef __print_symbolic_u64 #if BITS_PER_LONG == 32 #define __print_symbolic_u64(value, symbol_array...) \ @@ -720,6 +735,16 @@ static inline void ftrace_test_probe_##call(void) \ #undef __get_bitmask #undef __print_array +/* + * The below is not executed in the kernel. It is only what is + * displayed in the print format for userspace to parse. + */ +#undef __print_ns_to_secs +#define __print_ns_to_secs(val) val / 1000000000UL + +#undef __print_ns_without_secs +#define __print_ns_without_secs(val) val % 1000000000UL + #undef TP_printk #define TP_printk(fmt, args...) "\"" fmt "\", " __stringify(args) -- 2.5.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/4] sched: Move deadline container_of() helper functions into sched.h 2016-02-22 17:08 [PATCH 0/4] Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 1/4] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira @ 2016-02-22 17:08 ` Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 4/4] tools lib traceevent: Implements '%' operation Daniel Bristot de Oliveira 3 siblings, 0 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 17:08 UTC (permalink / raw) To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli Cc: Arnaldo Carvalho de Melo, LKML, linux-rt-users Move dl_task_of(), dl_rq_of_se() and rq_of_dl_rq() helper functions from kernel/sched/deadline.c to kernel/sched/sched.h, so they can be used on other scheduler files. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> --- kernel/sched/deadline.c | 18 ------------------ kernel/sched/sched.h | 18 ++++++++++++++++++ 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index cd64c97..2f6d25a 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -20,24 +20,6 @@ struct dl_bandwidth def_dl_bandwidth; -static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se) -{ - return container_of(dl_se, struct task_struct, dl); -} - -static inline struct rq *rq_of_dl_rq(struct dl_rq *dl_rq) -{ - return container_of(dl_rq, struct rq, dl); -} - -static inline struct dl_rq *dl_rq_of_se(struct sched_dl_entity *dl_se) -{ - struct task_struct *p = dl_task_of(dl_se); - struct rq *rq = task_rq(p); - - return &rq->dl; -} - static inline int on_dl_rq(struct sched_dl_entity *dl_se) { return !RB_EMPTY_NODE(&dl_se->rb_node); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 10f1637..b6e55e1 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -749,6 +749,24 @@ static inline void rq_clock_skip_update(struct rq *rq, bool skip) rq->clock_skip_update &= ~RQCF_REQ_SKIP; } +static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se) +{ + return container_of(dl_se, struct task_struct, dl); +} + +static inline struct rq *rq_of_dl_rq(struct dl_rq *dl_rq) +{ + return container_of(dl_rq, struct rq, dl); +} + +static inline struct dl_rq *dl_rq_of_se(struct sched_dl_entity *dl_se) +{ + struct task_struct *p = dl_task_of(dl_se); + struct rq *rq = task_rq(p); + + return &rq->dl; +} + #ifdef CONFIG_NUMA enum numa_topology_type { NUMA_DIRECT, -- 2.5.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:08 [PATCH 0/4] Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 1/4] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 2/4] sched: Move deadline container_of() helper functions into sched.h Daniel Bristot de Oliveira @ 2016-02-22 17:08 ` Daniel Bristot de Oliveira 2016-02-22 17:32 ` Peter Zijlstra 2016-02-22 17:48 ` kbuild test robot 2016-02-22 17:08 ` [PATCH 4/4] tools lib traceevent: Implements '%' operation Daniel Bristot de Oliveira 3 siblings, 2 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 17:08 UTC (permalink / raw) To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli Cc: Arnaldo Carvalho de Melo, LKML, linux-rt-users Deadline tasks behave differently of other tasks because deadline task's also depend on their period, deadline and runtime. Hence, the well known sched:sched_wakeup and sched:sched_switch tracepoints are not always enough to precisely explain the behavior of a deadline task with respect to the task's period, deadline, and runtime consumption and replenishment. In order to provide more precise information about the scheduling of deadline tasks, this patch implements the following tracepoints: - sched:sched_deadline_replenish: Informs the runtime replenishment of a deadline task. Trace example: <idle>-0 [010] d.h. 268.428878: sched_deadline_replenish: \ comm=y pid=1608 now=268.428876113 deadline=268.458863627 \ runtime=20000000 The task y-1608 was replenished with 20000000 ns at 268.428876113 to be used until the deadline at 268.458863627. - sched:sched_deadline_yield: Informs that a deadline task called sched_yield(), and will wait for the next period. Trace example: y-1608 [010] d... 268.428892: sched_deadline_yield: \ now=268.428891932 deadline=268.458863627 \ remaining_runtime=19995653 The task y-1608 yielded before its deadline, with 19995653 ns of remaining runtime. - sched:sched_deadline_throttle: Informs that a task consumed all its available runtime and was throttled. Trace example: t-1613 [011] d.h. 280.419823: sched_deadline_throttle: \ now=280.419823282 deadline=280.430683206 \ remaining_runtime=-13280 The task t-1613 overrun its runtime by 13280 ns and was throttled. - sched:sched_deadline_block: Informs that a deadline task went to sleep waiting to be awakened by another task. Trace example: b-1611 [011] d... 276.228298: sched_deadline_block: \ now=276.228295889 deadline=276.258262555 remaining_runtime=19966666 The task b-1611 blocked waiting for an external event. Its deadline is at 276.258262555, and it stills have 19966666 ns of remaining runtime on the current period. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> --- include/trace/events/sched.h | 87 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/deadline.c | 16 +++++++- 2 files changed, 102 insertions(+), 1 deletion(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 9b90c57..1847c29 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -562,6 +562,93 @@ TRACE_EVENT(sched_wake_idle_without_ipi, TP_printk("cpu=%d", __entry->cpu) ); + +/* + * Tracepoint for runtime replenishment of deadline tasks: + */ +TRACE_EVENT(sched_deadline_replenish, + + TP_PROTO(struct sched_dl_entity *dl_se), + + TP_ARGS(dl_se), + + TP_STRUCT__entry( + __array( char, comm, TASK_COMM_LEN ) + __field( pid_t, pid ) + __field( u64, now ) + __field( u64, deadline ) + __field( s64, runtime ) + ), + + TP_fast_assign( + memcpy(__entry->comm, dl_task_of(dl_se)->comm, TASK_COMM_LEN); + __entry->now = rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))); + __entry->pid = dl_task_of(dl_se)->pid; + __entry->deadline = dl_se->deadline; + __entry->runtime = dl_se->runtime; + ), + + TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", + __entry->comm, + __entry->pid, + __print_ns_to_secs(__entry->now), + __print_ns_without_secs(__entry->now), + __print_ns_to_secs(__entry->deadline), + __print_ns_without_secs(__entry->deadline), + __entry->runtime) +); + +DECLARE_EVENT_CLASS(sched_deadline_template, + + TP_PROTO(struct sched_dl_entity *dl_se), + + TP_ARGS(dl_se), + + TP_STRUCT__entry( + __field( u64, now ) + __field( u64, deadline ) + __field( s64, runtime ) + ), + + TP_fast_assign( + __entry->now = rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))); + __entry->deadline = dl_se->deadline; + __entry->runtime = dl_se->runtime; + ), + + TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", + __print_ns_to_secs(__entry->now), + __print_ns_without_secs(__entry->now), + __print_ns_to_secs(__entry->deadline), + __print_ns_without_secs(__entry->deadline), + __entry->runtime) +); + +/* + * Tracepoint for sched_yield() of a deadline task (the task + * went to sleep waiting for the next period) + */ +DEFINE_EVENT(sched_deadline_template, sched_deadline_yield, + TP_PROTO(struct sched_dl_entity *dl_se), + TP_ARGS(dl_se)); + +/* + * Tracepoint for throttling of a deadline task that consumed all its + * runtime + */ +DEFINE_EVENT_CONDITION(sched_deadline_template, sched_deadline_throttle, + TP_PROTO(struct sched_dl_entity *dl_se), + TP_ARGS(dl_se), + TP_CONDITION(!dl_se->dl_yielded)); + +/* + * Tracepoint for blocking of a deadline task. The deadline task was + * dequeued, but neither by being throttled nor yielding. + */ +DEFINE_EVENT_CONDITION(sched_deadline_template, sched_deadline_block, + TP_PROTO(struct sched_dl_entity *dl_se), + TP_ARGS(dl_se), + TP_CONDITION(!dl_se->dl_yielded && !dl_se->dl_throttled)); #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 2f6d25a..56d0d09 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -17,6 +17,7 @@ #include "sched.h" #include <linux/slab.h> +#include <trace/events/sched.h> struct dl_bandwidth def_dl_bandwidth; @@ -344,6 +345,7 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se, dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; dl_se->runtime = pi_se->dl_runtime; dl_se->dl_new = 0; + trace_sched_deadline_replenish(dl_se); } /* @@ -407,6 +409,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se, dl_se->runtime = pi_se->dl_runtime; } + trace_sched_deadline_replenish(dl_se); + if (dl_se->dl_yielded) dl_se->dl_yielded = 0; if (dl_se->dl_throttled) @@ -495,6 +499,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se, dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) { dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; dl_se->runtime = pi_se->dl_runtime; + trace_sched_deadline_replenish(dl_se); } } @@ -734,6 +739,7 @@ static void update_curr_dl(struct rq *rq) dl_se->runtime -= dl_se->dl_yielded ? 0 : delta_exec; if (dl_runtime_exceeded(dl_se)) { dl_se->dl_throttled = 1; + trace_sched_deadline_throttle(dl_se); __dequeue_task_dl(rq, curr, 0); if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); @@ -910,6 +916,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, static void dequeue_dl_entity(struct sched_dl_entity *dl_se) { __dequeue_dl_entity(dl_se); + trace_sched_deadline_block(dl_se); } static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) @@ -978,6 +985,7 @@ static void yield_task_dl(struct rq *rq) { struct task_struct *p = rq->curr; + update_rq_clock(rq); /* * We make the task go to sleep until its current deadline by * forcing its runtime to zero. This way, update_curr_dl() stops @@ -986,9 +994,15 @@ static void yield_task_dl(struct rq *rq) */ if (p->dl.runtime > 0) { rq->curr->dl.dl_yielded = 1; + if (trace_sched_deadline_yield_enabled()) { + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; + /* Subtract the last run till now */ + if (likely((s64)delta_exec > 0)) + p->dl.runtime -= delta_exec; + trace_sched_deadline_yield(&p->dl); + } p->dl.runtime = 0; } - update_rq_clock(rq); update_curr_dl(rq); /* * Tell update_rq_clock() that we've just updated, -- 2.5.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:08 ` [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler Daniel Bristot de Oliveira @ 2016-02-22 17:32 ` Peter Zijlstra 2016-02-22 17:48 ` Steven Rostedt 2016-02-22 17:48 ` kbuild test robot 1 sibling, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2016-02-22 17:32 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, Feb 22, 2016 at 02:08:21PM -0300, Daniel Bristot de Oliveira wrote: > Deadline tasks behave differently of other tasks because deadline > task's also depend on their period, deadline and runtime. > > Hence, the well known sched:sched_wakeup and sched:sched_switch > tracepoints are not always enough to precisely explain the behavior of a > deadline task with respect to the task's period, deadline, and runtime > consumption and replenishment. > > In order to provide more precise information about the scheduling of > deadline tasks, this patch implements the following tracepoints: > > - sched:sched_deadline_replenish: Informs the runtime replenishment of > a deadline task. Trace example: > > <idle>-0 [010] d.h. 268.428878: sched_deadline_replenish: \ > comm=y pid=1608 now=268.428876113 deadline=268.458863627 \ > runtime=20000000 > > The task y-1608 was replenished with 20000000 ns at 268.428876113 > to be used until the deadline at 268.458863627. > > - sched:sched_deadline_yield: Informs that a deadline task called > sched_yield(), and will wait for the next period. Trace example: > > y-1608 [010] d... 268.428892: sched_deadline_yield: \ > now=268.428891932 deadline=268.458863627 \ > remaining_runtime=19995653 > > The task y-1608 yielded before its deadline, with 19995653 ns of > remaining runtime. > > - sched:sched_deadline_throttle: Informs that a task consumed all its > available runtime and was throttled. Trace example: > > t-1613 [011] d.h. 280.419823: sched_deadline_throttle: \ > now=280.419823282 deadline=280.430683206 \ > remaining_runtime=-13280 > > The task t-1613 overrun its runtime by 13280 ns and was throttled. > > - sched:sched_deadline_block: Informs that a deadline task went to sleep > waiting to be awakened by another task. Trace example: > > b-1611 [011] d... 276.228298: sched_deadline_block: \ > now=276.228295889 deadline=276.258262555 > remaining_runtime=19966666 > > The task b-1611 blocked waiting for an external event. Its deadline is at > 276.258262555, and it stills have 19966666 ns of remaining runtime on the > current period. So I'm a bit allergic to tracepoints and this is very flimsy on reasons why I would want to do this. As it stands, the existing tracepoint have already been an ABI trainwreck, why would I want to add more? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:32 ` Peter Zijlstra @ 2016-02-22 17:48 ` Steven Rostedt 2016-02-22 20:11 ` Daniel Bristot de Oliveira 2016-02-22 21:30 ` Peter Zijlstra 0 siblings, 2 replies; 22+ messages in thread From: Steven Rostedt @ 2016-02-22 17:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, 22 Feb 2016 18:32:59 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > So I'm a bit allergic to tracepoints and this is very flimsy on reasons > why I would want to do this. Because there's no way to know if SCHED_DEADLINE tasks are doing what they suppose to without hacking the kernel and adding your own tracepoints. > > As it stands, the existing tracepoint have already been an ABI > trainwreck, why would I want to add more? Yes, this may become a type of ABI, but even the sched switch tracepoints haven't been that bad. Has it really prevented us from changing anything? "trainwreck" is a harsh word, and the information of scheduling tracepoints have been crucial to finding bugs and such. And has been tremendously useful in loads of cases. But let me ask, what would you recommend to finding out if the kernel has really given your tasks the recommended runtime within a given period? We can't expect users of SCHED_DEADLINE to be modifying the kernel themselves. -- Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:48 ` Steven Rostedt @ 2016-02-22 20:11 ` Daniel Bristot de Oliveira 2016-02-22 21:30 ` Peter Zijlstra 1 sibling, 0 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 20:11 UTC (permalink / raw) To: Steven Rostedt, Peter Zijlstra Cc: Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On 02/22/2016 02:48 PM, Steven Rostedt wrote: > On Mon, 22 Feb 2016 18:32:59 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > >> > So I'm a bit allergic to tracepoints and this is very flimsy on reasons >> > why I would want to do this. > Because there's no way to know if SCHED_DEADLINE tasks are doing what > they suppose to without hacking the kernel and adding your own > tracepoints. That is the point. A potential deadline user would have to become a sched deadline developer to be able to debug its deadline tasks. Given that many deadline scheduler users are expected from application field, e.g. automation, and that these users generally do not even know how to compile a kernel, having a "ready to use" way to observe sched deadline specific points of interest is essential for deadline scheduler adoption. For example, the runtime is an estimated value because there is no deterministic way define it, so be able to observe the remaining runtime at the end of an activation is very useful to define a tighter runtime. On the other hand, it is possible to defined that runtime was under estimated just by observing that a task was throttled. Another example: be able to see that a task is blocking in the middle of an activation is useful on the identification of an unexpected behavior of a deadline task. Although it is possible to guess many things by doing user-space measurements, by using a set of other tracepoints, or even by using both together. The debug process will not be as easy and precise as using these four tracepoints. -- Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:48 ` Steven Rostedt 2016-02-22 20:11 ` Daniel Bristot de Oliveira @ 2016-02-22 21:30 ` Peter Zijlstra 2016-02-22 22:30 ` Steven Rostedt 1 sibling, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2016-02-22 21:30 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote: > > As it stands, the existing tracepoint have already been an ABI > > trainwreck, why would I want to add more? > > Yes, this may become a type of ABI, but even the sched switch > tracepoints haven't been that bad. Has it really prevented us from > changing anything? The whole wakeup thing where we _still_ have a dummy argument, and have been lying about the value for a long time really stinks. > But let me ask, what would you recommend to finding out if the kernel > has really given your tasks the recommended runtime within a given > period? We can't expect users of SCHED_DEADLINE to be modifying the > kernel themselves. So why are these deadline specific tracepoint? Why not extend the ones we already have? Also, will these tracepoints still work if we implement SCHED_DEADLINE using Least-Laxity-First or Pfair or some other exotic algorithm? Or will be forever be bound to EDF just because tracepoint ABI shite? Worse, the proposed tracepoints are atrocious, look at crap like this: > + if (trace_sched_deadline_yield_enabled()) { > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; > + /* Subtract the last run till now */ > + if (likely((s64)delta_exec > 0)) > + p->dl.runtime -= delta_exec; > + trace_sched_deadline_yield(&p->dl); > + } tracepoints should _NEVER_ change state, ever. And there's the whole COND tracepoint muck, which also doesn't win any prices. So tell me why these specific tracepoints and why the existing ones could not be extended to include this information. For example, why a trace_sched_dealine_yield, and not a generic trace_sched_yield() that works for all classes. Tell me that the information presented does not pin the implementation. And clean up the crap. Then I might maybe consider this. But do not present me with a bunch of random arse, hacked together tracepoints and tell me they might be useful, maybe. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 21:30 ` Peter Zijlstra @ 2016-02-22 22:30 ` Steven Rostedt 2016-02-23 10:40 ` Juri Lelli 2016-02-23 10:44 ` Peter Zijlstra 0 siblings, 2 replies; 22+ messages in thread From: Steven Rostedt @ 2016-02-22 22:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, 22 Feb 2016 22:30:17 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote: > > > > As it stands, the existing tracepoint have already been an ABI > > > trainwreck, why would I want to add more? > > > > Yes, this may become a type of ABI, but even the sched switch > > tracepoints haven't been that bad. Has it really prevented us from > > changing anything? > > The whole wakeup thing where we _still_ have a dummy argument, and have > been lying about the value for a long time really stinks. Having a dummy argument is not that bad. Yes, it's inconvenient, and I'm not sure who even uses it (can we delete it without breaking anything?) But it doesn't prevent us from going forward with development. > > > But let me ask, what would you recommend to finding out if the kernel > > has really given your tasks the recommended runtime within a given > > period? We can't expect users of SCHED_DEADLINE to be modifying the > > kernel themselves. > > So why are these deadline specific tracepoint? Why not extend the ones > we already have? I'm not sure how to do that and be able to report when a period has elapsed, and when the next period is coming. > > Also, will these tracepoints still work if we implement SCHED_DEADLINE > using Least-Laxity-First or Pfair or some other exotic algorithm? Or > will be forever be bound to EDF just because tracepoint ABI shite? Can we come up with generic numbers? I mean, the user that asks for their task to have a specific runtime within a specific period/deadline, these seem to be generic already. I'll have to read up on those that you mention, but do that not have a "replenish" for when the period starts again? And then a yield, showing the task has given up its remaining time, or a block, where a task is scheduled out because it blocked on a lock? > > Worse, the proposed tracepoints are atrocious, look at crap like this: > > > + if (trace_sched_deadline_yield_enabled()) { > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; > > + /* Subtract the last run till now */ > > + if (likely((s64)delta_exec > 0)) > > + p->dl.runtime -= delta_exec; > > + trace_sched_deadline_yield(&p->dl); > > + } > > tracepoints should _NEVER_ change state, ever. Heh, it's not really changing state. The code directly after this is: p->dl.runtime = 0; Without updating dl.runtime, you the tracepoint would report inaccurate remaining time. Without that, you would get reports of yielding with full runtimes, making it look like you never ran at all. > > And there's the whole COND tracepoint muck, which also doesn't win any > prices. It keeps us from adding flags that may end up going away and us maintaining a dummy field forever ;-) > > So tell me why these specific tracepoints and why the existing ones > could not be extended to include this information. For example, why a > trace_sched_dealine_yield, and not a generic trace_sched_yield() that > works for all classes. But what about reporting actual runtime, and when the next period will come. That only matters for deadline. > > Tell me that the information presented does not pin the implementation. > > And clean up the crap. > > Then I might maybe consider this. > > But do not present me with a bunch of random arse, hacked together > tracepoints and tell me they might be useful, maybe. They ARE useful. These are the tracepoints I'm currently using to debug the deadline scheduler with. They have been indispensable for my current work. -- Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 22:30 ` Steven Rostedt @ 2016-02-23 10:40 ` Juri Lelli 2016-02-23 10:44 ` Peter Zijlstra 1 sibling, 0 replies; 22+ messages in thread From: Juri Lelli @ 2016-02-23 10:40 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users Hi, On 22/02/16 17:30, Steven Rostedt wrote: > On Mon, 22 Feb 2016 22:30:17 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote: > > [...] > > > > > But let me ask, what would you recommend to finding out if the kernel > > > has really given your tasks the recommended runtime within a given > > > period? We can't expect users of SCHED_DEADLINE to be modifying the > > > kernel themselves. > > > > So why are these deadline specific tracepoint? Why not extend the ones > > we already have? > > I'm not sure how to do that and be able to report when a period has > elapsed, and when the next period is coming. > > > > > Also, will these tracepoints still work if we implement SCHED_DEADLINE > > using Least-Laxity-First or Pfair or some other exotic algorithm? Or > > will be forever be bound to EDF just because tracepoint ABI shite? > > Can we come up with generic numbers? I mean, the user that asks for > their task to have a specific runtime within a specific > period/deadline, these seem to be generic already. I'll have to read up > on those that you mention, but do that not have a "replenish" for when > the period starts again? And then a yield, showing the task has given > up its remaining time, or a block, where a task is scheduled out > because it blocked on a lock? > AFAICT throttle, yield and block seem fairly generic. How we make the definition generic w.r.t. the arguments we want to print is a different matter, though. :-/ I should refresh my memory about the above mentioned algorithms, and "replenish" might be particular to the current implementation. However, shouldn't any algo that has a "throttle" event also have a corresponding "un-throttle (replenish)" event? I guess being able to throttle misbehaving tasks is a property that we will always desire. > > > > Worse, the proposed tracepoints are atrocious, look at crap like this: > > > > > + if (trace_sched_deadline_yield_enabled()) { > > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; > > > + /* Subtract the last run till now */ > > > + if (likely((s64)delta_exec > 0)) > > > + p->dl.runtime -= delta_exec; > > > + trace_sched_deadline_yield(&p->dl); > > > + } > > > > tracepoints should _NEVER_ change state, ever. > > Heh, it's not really changing state. The code directly after this is: > > p->dl.runtime = 0; > > Without updating dl.runtime, you the tracepoint would report inaccurate > remaining time. Without that, you would get reports of yielding with > full runtimes, making it look like you never ran at all. > Right, I guess that might be useful to understand if you over- dimensioned the reservation. Can't we make this a macro or do the computation local to the tracepoint itself so that code looks nicer? [...] > > > > So tell me why these specific tracepoints and why the existing ones > > could not be extended to include this information. For example, why a > > trace_sched_dealine_yield, and not a generic trace_sched_yield() that > > works for all classes. > > But what about reporting actual runtime, and when the next period will > come. That only matters for deadline. > As said above, the event looks generic enough to me. Not sure how to make printed arguments generic, though. > > > > Tell me that the information presented does not pin the implementation. > > > > And clean up the crap. > > > > Then I might maybe consider this. > > > > But do not present me with a bunch of random arse, hacked together > > tracepoints and tell me they might be useful, maybe. > > > They ARE useful. These are the tracepoints I'm currently using to > debug the deadline scheduler with. They have been indispensable for my > current work. > I also think they are very useful. I always had some sort of tracepoints that I continue forward porting during the development of SCHED_DEADLINE, and I couldn't really understand what was going on without those. There is value if we can agree on the mainline incarnation of such tracepoints now, IMHO. Best, - Juri ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 22:30 ` Steven Rostedt 2016-02-23 10:40 ` Juri Lelli @ 2016-02-23 10:44 ` Peter Zijlstra 2016-02-23 13:10 ` Steven Rostedt ` (3 more replies) 1 sibling, 4 replies; 22+ messages in thread From: Peter Zijlstra @ 2016-02-23 10:44 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, Feb 22, 2016 at 05:30:43PM -0500, Steven Rostedt wrote: > On Mon, 22 Feb 2016 22:30:17 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote: > > > > > > As it stands, the existing tracepoint have already been an ABI > > > > trainwreck, why would I want to add more? > > > > > > Yes, this may become a type of ABI, but even the sched switch > > > tracepoints haven't been that bad. Has it really prevented us from > > > changing anything? > > > > The whole wakeup thing where we _still_ have a dummy argument, and have > > been lying about the value for a long time really stinks. > > Having a dummy argument is not that bad. Yes, it's inconvenient, and > I'm not sure who even uses it (can we delete it without breaking > anything?) But it doesn't prevent us from going forward with > development. No it very much illustrates the problem and is a very clear indication that tracepoints are an ABI. Yes, we more or less got away with it this time, but it does prohibit us from doing sane things, see below. > > > But let me ask, what would you recommend to finding out if the kernel > > > has really given your tasks the recommended runtime within a given > > > period? We can't expect users of SCHED_DEADLINE to be modifying the > > > kernel themselves. > > > > So why are these deadline specific tracepoint? Why not extend the ones > > we already have? > > I'm not sure how to do that and be able to report when a period has > elapsed, and when the next period is coming. Seriously? Sit still for a minute and think perhaps? > > Also, will these tracepoints still work if we implement SCHED_DEADLINE > > using Least-Laxity-First or Pfair or some other exotic algorithm? Or > > will be forever be bound to EDF just because tracepoint ABI shite? > > Can we come up with generic numbers? I mean, the user that asks for > their task to have a specific runtime within a specific > period/deadline, these seem to be generic already. I'll have to read up > on those that you mention, but do that not have a "replenish" for when > the period starts again? And then a yield, showing the task has given > up its remaining time, or a block, where a task is scheduled out > because it blocked on a lock? You tell me; why do you expect me to think for you? These should all have been questions you asked yourself before you even considered posting tracepoints. ABI remember? We're careful with that. > > Worse, the proposed tracepoints are atrocious, look at crap like this: > > > > > + if (trace_sched_deadline_yield_enabled()) { > > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; > > > + /* Subtract the last run till now */ > > > + if (likely((s64)delta_exec > 0)) > > > + p->dl.runtime -= delta_exec; > > > + trace_sched_deadline_yield(&p->dl); > > > + } > > > > tracepoints should _NEVER_ change state, ever. > > Heh, it's not really changing state. The code directly after this is: > > p->dl.runtime = 0; Yes, it more or less 'works', but its still atrocious shite. Its the worst kind of anti pattern possible. Suppose someone comes and removes that line, and ignores the tracepoint stuff, because, hell its a tracepoint, those don't modify stuff. Its just really, utterly bad practice. You've done this tracing code long enough, you really should _KNOW_ this. > > So tell me why these specific tracepoints and why the existing ones > > could not be extended to include this information. For example, why a > > trace_sched_dealine_yield, and not a generic trace_sched_yield() that > > works for all classes. > > But what about reporting actual runtime, and when the next period will > come. That only matters for deadline. How is that an answer to the question? Are you implying a generic trace_sched_yield() call could not do this? > > But do not present me with a bunch of random arse, hacked together > > tracepoints and tell me they might be useful, maybe. > > > They ARE useful. These are the tracepoints I'm currently using to > debug the deadline scheduler with. They have been indispensable for my > current work. They are, most obviously, a hacked together debug session for sure. This is _NOT_ what you commit. Now ideally we'd do something like the below, but because trainwreck, we cannot actually do this I think :-( It gets you about half of what your patch does, but shows how to also do a generic sched_yield(). The replenish might have to remain special, although both CFS and RT also have replenishes, albeit significantly different. I really dislike tracepoints, and I'm >.< close to proposing a patch removing them all from the scheduler. --- include/trace/events/sched.h | 97 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 86 insertions(+), 11 deletions(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 9b90c57517a9..b902eb71830b 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -103,9 +103,15 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new, TP_PROTO(struct task_struct *p), TP_ARGS(p)); +#define TASK_STATE_PREEMPT (TASK_STATE_MAX << 0) +#define TASK_STATE_THROTTLED (TASK_STATE_MAX << 1) +#define TASK_STATE_YIELDED (TASK_STATE_MAX << 2) + #ifdef CREATE_TRACE_POINTS static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p) { + long state = p->state; + #ifdef CONFIG_SCHED_DEBUG BUG_ON(p != current); #endif /* CONFIG_SCHED_DEBUG */ @@ -114,10 +120,49 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct * * Preemption ignores task state, therefore preempted tasks are always * RUNNING (we will not have dequeued if state != RUNNING). */ - return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state; + if (preempt) { + state = TASK_RUNNING | TASK_STATE_MAX; + } else if (dl_task(p)) { + if (p->dl.dl_throttled) + state |= TASK_STATE_THROTTLED; + else if (p->dl.dl_yielded) + state |= TASK_STATE_YIELDED; + } + + return state; } #endif /* CREATE_TRACE_POINTS */ +#define __trace_sched_switch_fields(name) do { \ + __entry->name##_policy = name->policy; \ + switch (name->policy) { \ + case SCHED_IDLE: \ + case SCHED_BATCH: \ + case SCHED_NORMAL: \ + __entry->name##_f1 = PRIO_TO_NICE(name->static_prio); \ + __entry->name##_f2 = 0; \ + __entry->name##_f3 = 0; \ + break; \ + case SCHED_RR: \ + case SCHED_FIFO: \ + __entry->name##_f1 = USER_PRIO(name->normal_prio); \ + __entry->name##_f2 = 0; \ + __entry->name##_f3 = 0; \ + break; \ + case SCHED_DEADLINE: \ + __entry->name##_f1 = name->dl.runtime; \ + __entry->name##_f2 = name->dl.deadline; \ + __entry->name##_f3 = name->dl.dl_period; \ + break; \ + default: \ + __entry->name##_f1 = 0; \ + __entry->name##_f2 = 0; \ + __entry->name##_f3 = 0; \ + break; \ + } \ +} while (0) + + /* * Tracepoint for task switches, performed by the scheduler: */ @@ -132,33 +177,63 @@ TRACE_EVENT(sched_switch, TP_STRUCT__entry( __array( char, prev_comm, TASK_COMM_LEN ) __field( pid_t, prev_pid ) - __field( int, prev_prio ) + __field( int, prev_policy ) + __field( s64, prev_f1 ) + __field( u64, prev_f2 ) + __field( u64, prev_f3 ) __field( long, prev_state ) + __array( char, next_comm, TASK_COMM_LEN ) __field( pid_t, next_pid ) - __field( int, next_prio ) + __field( int, next_policy ) + __field( s64, next_f1 ) + __field( u64, next_f2 ) + __field( u64, next_f3 ) ), TP_fast_assign( - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); __entry->prev_pid = prev->pid; - __entry->prev_prio = prev->prio; + __trace_sched_switch_fields(prev); __entry->prev_state = __trace_sched_switch_state(preempt, prev); - memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); + + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); __entry->next_pid = next->pid; - __entry->next_prio = next->prio; + __trace_sched_switch_fields(next); ), - TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", - __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, + TP_printk("prev_comm=%s prev_pid=%d prev=%s:{%Ld,%Lu,%Lu} prev_state=%s%s ==> next_comm=%s next_pid=%d next=%s:{%Ld,%Lu,%Lu}", + __entry->prev_comm, __entry->prev_pid, + __print_symbolic(__entry->prev_policy, + {SCHED_IDLE, "IDLE"}, + {SCHED_BATCH, "BATCH"}, + {SCHED_NORMAL, "NORMAL"}, + {SCHED_RR, "RR"}, + {SCHED_FIFO, "FIFO"}, + {SCHED_DEADLINE,"DEADLINE"}), + __entry->prev_f1, __entry->prev_f2, __entry->prev_f3, + __entry->prev_state & (TASK_STATE_MAX-1) ? __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" }, { 64, "x" }, { 128, "K" }, { 256, "W" }, { 512, "P" }, { 1024, "N" }) : "R", - __entry->prev_state & TASK_STATE_MAX ? "+" : "", - __entry->next_comm, __entry->next_pid, __entry->next_prio) + __print_flags(__entry->prev_state & ~(TASK_STATE_MAX-1), "|", + { TASK_STATE_PREEMPT, "+" }, + { TASK_STATE_THROTTLED, "t" }, + { TASK_STATE_YIELDED, "y" }), + + __entry->next_comm, __entry->next_pid, + __print_symbolic(__entry->next_policy, + {SCHED_IDLE, "IDLE"}, + {SCHED_BATCH, "BATCH"}, + {SCHED_NORMAL, "NORMAL"}, + {SCHED_RR, "RR"}, + {SCHED_FIFO, "FIFO"}, + {SCHED_DEADLINE,"DEADLINE"}), + __entry->next_f1, __entry->next_f2, __entry->next_f3, + ) ); /* ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-23 10:44 ` Peter Zijlstra @ 2016-02-23 13:10 ` Steven Rostedt 2016-02-24 8:48 ` Ingo Molnar 2016-02-23 14:27 ` Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Steven Rostedt @ 2016-02-23 13:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Tue, 23 Feb 2016 11:44:08 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > No it very much illustrates the problem and is a very clear indication > that tracepoints are an ABI. Yes they are. But note, they can change if nobody notices ;-) > > > > Heh, it's not really changing state. The code directly after this is: > > > > p->dl.runtime = 0; > > Yes, it more or less 'works', but its still atrocious shite. Its the > worst kind of anti pattern possible. > > Suppose someone comes and removes that line, and ignores the tracepoint > stuff, because, hell its a tracepoint, those don't modify stuff. > > Its just really, utterly bad practice. > > You've done this tracing code long enough, you really should _KNOW_ > this. You're right. I got too caught up in the cleverness of the hack to acknowledge it is a hack. But Daniel has a patch to clean up the yield code which would also help in making this hack unnecessary for this tracepoint. > > > > So tell me why these specific tracepoints and why the existing ones > > > could not be extended to include this information. For example, why a > > > trace_sched_dealine_yield, and not a generic trace_sched_yield() that > > > works for all classes. > > > > But what about reporting actual runtime, and when the next period will > > come. That only matters for deadline. > > How is that an answer to the question? Are you implying a generic > trace_sched_yield() call could not do this? > > > > But do not present me with a bunch of random arse, hacked together > > > tracepoints and tell me they might be useful, maybe. > > > > > > They ARE useful. These are the tracepoints I'm currently using to > > debug the deadline scheduler with. They have been indispensable for my > > current work. > > They are, most obviously, a hacked together debug session for sure. This > is _NOT_ what you commit. > > Now ideally we'd do something like the below, but because trainwreck, we > cannot actually do this I think :-( > > It gets you about half of what your patch does, but shows how to also > do a generic sched_yield(). The replenish might have to remain special, > although both CFS and RT also have replenishes, albeit significantly > different. OK, I admit. I was very single focused on deadline scheduler. I wasn't looking at how this could work with the rest of the scheduler. I'll take a look at the patches you posted. Thanks! -- Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-23 13:10 ` Steven Rostedt @ 2016-02-24 8:48 ` Ingo Molnar 0 siblings, 0 replies; 22+ messages in thread From: Ingo Molnar @ 2016-02-24 8:48 UTC (permalink / raw) To: Steven Rostedt Cc: Peter Zijlstra, Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users * Steven Rostedt <rostedt@goodmis.org> wrote: > On Tue, 23 Feb 2016 11:44:08 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > No it very much illustrates the problem and is a very clear indication > > that tracepoints are an ABI. > > Yes they are. But note, they can change if nobody notices ;-) Which is rather seldom for something as high profile as core scheduler tracepoints ... You can be cavalier about tracepoints in subsystems where the tracepoints are at most used by the maintainers with specialized, closely maintained tooling that is easy to change. Not so much about scheduler tracepoints which are being widely utilized by various projects. Thanks, Ingo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-23 10:44 ` Peter Zijlstra 2016-02-23 13:10 ` Steven Rostedt @ 2016-02-23 14:27 ` Peter Zijlstra 2016-02-23 16:19 ` Daniel Bristot de Oliveira 2016-02-24 2:29 ` Daniel Bristot de Oliveira 3 siblings, 0 replies; 22+ messages in thread From: Peter Zijlstra @ 2016-02-23 14:27 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Tue, Feb 23, 2016 at 11:44:08AM +0100, Peter Zijlstra wrote: > include/trace/events/sched.h | 97 +++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 86 insertions(+), 11 deletions(-) > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index 9b90c57517a9..b902eb71830b 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -103,9 +103,15 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new, > TP_PROTO(struct task_struct *p), > TP_ARGS(p)); > > +#define TASK_STATE_PREEMPT (TASK_STATE_MAX << 0) > +#define TASK_STATE_THROTTLED (TASK_STATE_MAX << 1) > +#define TASK_STATE_YIELDED (TASK_STATE_MAX << 2) > + > #ifdef CREATE_TRACE_POINTS > static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p) > { > + long state = p->state; > + > #ifdef CONFIG_SCHED_DEBUG > BUG_ON(p != current); > #endif /* CONFIG_SCHED_DEBUG */ > @@ -114,10 +120,49 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct * > * Preemption ignores task state, therefore preempted tasks are always > * RUNNING (we will not have dequeued if state != RUNNING). > */ > - return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state; > + if (preempt) { > + state = TASK_RUNNING | TASK_STATE_MAX; > + } else if (dl_task(p)) { > + if (p->dl.dl_throttled) > + state |= TASK_STATE_THROTTLED; > + else if (p->dl.dl_yielded) > + state |= TASK_STATE_YIELDED; These might want to be inverted, that is, yielded will have throttled set, so yield should take precedence. > + } > + > + return state; > } > #endif /* CREATE_TRACE_POINTS */ > > +#define __trace_sched_switch_fields(name) do { \ > + __entry->name##_policy = name->policy; \ > + switch (name->policy) { \ Sadly we cannot use policy, for that isn't updated on PI. The best we can do is dl_task(), rt_task() else ... > + case SCHED_IDLE: \ This doesn't have nice, so should go with default; > + case SCHED_BATCH: \ > + case SCHED_NORMAL: \ > + __entry->name##_f1 = PRIO_TO_NICE(name->static_prio); \ > + __entry->name##_f2 = 0; \ > + __entry->name##_f3 = 0; \ > + break; \ > + case SCHED_RR: \ > + case SCHED_FIFO: \ > + __entry->name##_f1 = USER_PRIO(name->normal_prio); \ > + __entry->name##_f2 = 0; \ > + __entry->name##_f3 = 0; \ > + break; \ > + case SCHED_DEADLINE: \ > + __entry->name##_f1 = name->dl.runtime; \ > + __entry->name##_f2 = name->dl.deadline; \ > + __entry->name##_f3 = name->dl.dl_period; \ > + break; \ > + default: \ > + __entry->name##_f1 = 0; \ > + __entry->name##_f2 = 0; \ > + __entry->name##_f3 = 0; \ > + break; \ > + } \ > +} while (0) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-23 10:44 ` Peter Zijlstra 2016-02-23 13:10 ` Steven Rostedt 2016-02-23 14:27 ` Peter Zijlstra @ 2016-02-23 16:19 ` Daniel Bristot de Oliveira 2016-02-24 2:29 ` Daniel Bristot de Oliveira 3 siblings, 0 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-23 16:19 UTC (permalink / raw) To: Peter Zijlstra, Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On 02/23/2016 07:44 AM, Peter Zijlstra wrote: >>> Worse, the proposed tracepoints are atrocious, look at crap like this: >>> > > >>>> > > > + if (trace_sched_deadline_yield_enabled()) { >>>> > > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; >>>> > > > + /* Subtract the last run till now */ >>>> > > > + if (likely((s64)delta_exec > 0)) >>>> > > > + p->dl.runtime -= delta_exec; >>>> > > > + trace_sched_deadline_yield(&p->dl); >>>> > > > + } >>> > > >>> > > tracepoints should _NEVER_ change state, ever. >> > >> > Heh, it's not really changing state. The code directly after this is: >> > >> > p->dl.runtime = 0; > Yes, it more or less 'works', but its still atrocious shite. Its the > worst kind of anti pattern possible. > > Suppose someone comes and removes that line, and ignores the tracepoint > stuff, because, hell its a tracepoint, those don't modify stuff. > > Its just really, utterly bad practice. It is possible to clean up this by removing the "p->dl.runtime = 0;" from yield_task_dl(), to carry the dl_runtime until update_curr_dl(). We end up not proposing this change because we thought it could be too much intrusive. But seems that it was the correct approach. Btw, your patch for "Always calculate end of period on sched_yield()" already do the necessary clean up, so in the possible next version of these tracepoint the hack will disappear. >>> > > So tell me why these specific tracepoints and why the existing ones >>> > > could not be extended to include this information. For example, why a >>> > > trace_sched_dealine_yield, and not a generic trace_sched_yield() that >>> > > works for all classes. >> > >> > But what about reporting actual runtime, and when the next period will >> > come. That only matters for deadline. > How is that an answer to the question? Are you implying a generic > trace_sched_yield() call could not do this? I agree that a trace_sched_yield() could partially do this. But each scheduler would have its own specific data to be printed, and it is hard to define how many and the type of these data. >>> > > But do not present me with a bunch of random arse, hacked together >>> > > tracepoints and tell me they might be useful, maybe. >> > >> > >> > They ARE useful. These are the tracepoints I'm currently using to >> > debug the deadline scheduler with. They have been indispensable for my >> > current work. > They are, most obviously, a hacked together debug session for sure. This > is _NOT_ what you commit. > > Now ideally we'd do something like the below, but because trainwreck, we > cannot actually do this I think :-( I do liked this patch, but I agree that bad things can happen on applications that already use sched:sched_switch :-(. > I really dislike tracepoints, and I'm >.< close to proposing a patch > removing them all from the scheduler. Scheduler tracepoints become such a standard not only for debugging the kernel, but also for understanding performance issues and finding optimizations for user-space tasks, e.g. finding the best way to spread task among processors on a large NUMA box. Many people would cry if sched tracepoints disappears :'-(. > @@ -132,33 +177,63 @@ TRACE_EVENT(sched_switch, > TP_STRUCT__entry( > __array( char, prev_comm, TASK_COMM_LEN ) > __field( pid_t, prev_pid ) > - __field( int, prev_prio ) > + __field( int, prev_policy ) > + __field( s64, prev_f1 ) > + __field( u64, prev_f2 ) > + __field( u64, prev_f3 ) This helps me to explain why did I chose to create deadline specific tracepoints. It is not possible to define in advance a common set of parameters to be traced/printed for all future (exotic) scheduler, because each scheduler will use its own set of parameters. Hence, the number and type of the parameter can vary. How many fields do we need? and which are the types of these fields? What if a scheduler needs two s64 fields and no u64? Moreover, it is not possible to define the changes that will be needed on classical schedulers algorithms for them to fit on Linux's abstractions and restrictions. Therefore, it is not safe to mention that LLF would (theoretically) use the same set of parameters of a EDF scheduler, because the theoretical LLF would need to be modified to fit on Linux. For example, there is no throttling concept on the classical deadline scheduler. So, the proposed tracepoints are only valid for Linux's sched deadline scheduler (deadline scheduler + CBS). That is why I believe that it would be a better approach to have scheduler specific tracepoints for the sched_*_entity particularities and sched_* specific points interest. Finally, I was based in the same approach used on the fair scheduler's sched:sched_stat* tracepoints: they report sched_entity data on fair.c, and my patches report sched_deadline_entity data on deadline.c. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-23 10:44 ` Peter Zijlstra ` (2 preceding siblings ...) 2016-02-23 16:19 ` Daniel Bristot de Oliveira @ 2016-02-24 2:29 ` Daniel Bristot de Oliveira 3 siblings, 0 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-24 2:29 UTC (permalink / raw) To: Peter Zijlstra, Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On 02/23/2016 07:44 AM, Peter Zijlstra wrote: > Now ideally we'd do something like the below, but because trainwreck, we > cannot actually do this I think :-( Some other considerations: 1) The majority of tasks run on NORMAL scheduler with default nice. So, prev=NORMAL:{0,0,0} and next=NORMAL:{0,0,0} does not help too much. The tracepoint size increase from 64 to 112, but this difference will not be useful for the majority of the user/use-cases. 2) The sched:switch can be wider than this 212 characters output: " m-1604 [004] d... 1940.393070: sched_switch: prev_comm=m prev_pid=1604 prev=DEADLINE:{-967616,1940402119572,30000000} prev_state=Rt ==> next_comm=kworker/4:2H next_pid=1507 next=NORMAL:{-20,0,0}" 3) On the other tracepoints, rostedt suggested me to also include the now= field because the clock printed in the trace prefix is not the same clock used by the deadline scheduler. 4) Although neither perf record/script nor trace-cmd record/report broke, both printed only the old fields. So, both would need to be updated and carry both the old and the new version. Well, I bet there will be many other applications that were not well written as perf and trace-cmd :-( 5) A new scheduler will probably require modifications on this tracepoint, hence on all other tools that use this tracepoint too. 6) I liked all the code to compose the output string! But it is not fair to say that the "format" is intuitive. One may need to put the finger in the monitor and use a calculator to see that the filter for a yielded tasks is "prev_state == 8192". Btw, it only works for deadline task yield, and the yield call of all other classes will need to be changed to support this. The question is, is it worth for other classes? 7) This patch is currently showing the runtime zeroed on yield. So, it needs the clean up on yield_task_dl() to properly inform the unused runtime of the task's activation. More than it, to be precise, it must push the zeroing of the runtime of an yielded task to the replenishment code. Your patch for the sched yield already do it :-) but once there, the zeroing can't be moved back without break this tracepoint :-(. 8) we still need to define a tracepoint for replenishment, but as it takes place on the following points: a) sched_setattr() b) periodic replenishment c) on the wakeup after a blocking. There is no a "single tracepoint" already available to "attach" this information. So seems the replenishment tracepoint is still needed. Moreover, although other scheduler may have this concept, I am not sure if they take place under the same conditions. So, AFAICS, this integrate approach is neither adding more information, nor preventing ABI problems now and possibly in the future. On the other hand, although a per-scheduler approach would increase the number of tracepoints, the tracepoints tends to be simpler, confined to the scheduler's .c and more coherent to the scheduler's abstractions. I fear that the mix of the deadline scheduler and other schedulers information, and the adaptation on other tracepoints to make the deadline scheduler points of interest fit on current tracepoints, would add undesired noise/imprecision on the interpretation of the sched deadline actions/abstractions :-(. -- Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler 2016-02-22 17:08 ` [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:32 ` Peter Zijlstra @ 2016-02-22 17:48 ` kbuild test robot 1 sibling, 0 replies; 22+ messages in thread From: kbuild test robot @ 2016-02-22 17:48 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: kbuild-all, Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users [-- Attachment #1: Type: text/plain, Size: 7068 bytes --] Hi Daniel, [auto build test WARNING on tip/sched/core] [also build test WARNING on v4.5-rc5 next-20160222] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Daniel-Bristot-de-Oliveira/Tracepoints-for-deadline-scheduler/20160223-011407 config: i386-randconfig-s0-201608 (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from include/trace/define_trace.h:95:0, from include/trace/events/sched.h:655, from kernel/sched/core.c:91: include/trace/events/sched.h: In function 'trace_raw_output_sched_deadline_replenish': >> include/trace/events/sched.h:591:12: warning: format '%u' expects argument of type 'unsigned int', but argument 6 has type 'long unsigned int' [-Wformat=] TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", ^ include/trace/trace_events.h:342:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^ include/trace/trace_events.h:65:9: note: in expansion of macro 'PARAMS' PARAMS(print)); \ ^ include/trace/events/sched.h:569:1: note: in expansion of macro 'TRACE_EVENT' TRACE_EVENT(sched_deadline_replenish, ^ >> include/trace/events/sched.h:591:2: note: in expansion of macro 'TP_printk' TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", ^ include/trace/events/sched.h:591:12: warning: format '%u' expects argument of type 'unsigned int', but argument 8 has type 'long unsigned int' [-Wformat=] TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", ^ include/trace/trace_events.h:342:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^ include/trace/trace_events.h:65:9: note: in expansion of macro 'PARAMS' PARAMS(print)); \ ^ include/trace/events/sched.h:569:1: note: in expansion of macro 'TRACE_EVENT' TRACE_EVENT(sched_deadline_replenish, ^ >> include/trace/events/sched.h:591:2: note: in expansion of macro 'TP_printk' TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", ^ include/trace/events/sched.h: In function 'trace_raw_output_sched_deadline_template': include/trace/events/sched.h:619:12: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' [-Wformat=] TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", ^ include/trace/trace_events.h:342:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^ include/trace/events/sched.h:619:2: note: in expansion of macro 'TP_printk' TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", ^ include/trace/events/sched.h:619:12: warning: format '%u' expects argument of type 'unsigned int', but argument 6 has type 'long unsigned int' [-Wformat=] TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", ^ include/trace/trace_events.h:342:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^ include/trace/events/sched.h:619:2: note: in expansion of macro 'TP_printk' TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", ^ vim +591 include/trace/events/sched.h 563 TP_printk("cpu=%d", __entry->cpu) 564 ); 565 566 /* 567 * Tracepoint for runtime replenishment of deadline tasks: 568 */ > 569 TRACE_EVENT(sched_deadline_replenish, 570 571 TP_PROTO(struct sched_dl_entity *dl_se), 572 573 TP_ARGS(dl_se), 574 575 TP_STRUCT__entry( 576 __array( char, comm, TASK_COMM_LEN ) 577 __field( pid_t, pid ) 578 __field( u64, now ) 579 __field( u64, deadline ) 580 __field( s64, runtime ) 581 ), 582 583 TP_fast_assign( 584 memcpy(__entry->comm, dl_task_of(dl_se)->comm, TASK_COMM_LEN); 585 __entry->now = rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))); 586 __entry->pid = dl_task_of(dl_se)->pid; 587 __entry->deadline = dl_se->deadline; 588 __entry->runtime = dl_se->runtime; 589 ), 590 > 591 TP_printk("comm=%s pid=%d now=%llu.%09u deadline=%llu.%09u runtime=%lld", 592 __entry->comm, 593 __entry->pid, 594 __print_ns_to_secs(__entry->now), 595 __print_ns_without_secs(__entry->now), 596 __print_ns_to_secs(__entry->deadline), 597 __print_ns_without_secs(__entry->deadline), 598 __entry->runtime) 599 ); 600 601 DECLARE_EVENT_CLASS(sched_deadline_template, 602 603 TP_PROTO(struct sched_dl_entity *dl_se), 604 605 TP_ARGS(dl_se), 606 607 TP_STRUCT__entry( 608 __field( u64, now ) 609 __field( u64, deadline ) 610 __field( s64, runtime ) 611 ), 612 613 TP_fast_assign( 614 __entry->now = rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))); 615 __entry->deadline = dl_se->deadline; 616 __entry->runtime = dl_se->runtime; 617 ), 618 619 TP_printk("now=%llu.%09u deadline=%llu.%09u remaining_runtime=%lld", 620 __print_ns_to_secs(__entry->now), 621 __print_ns_without_secs(__entry->now), 622 __print_ns_to_secs(__entry->deadline), 623 __print_ns_without_secs(__entry->deadline), 624 __entry->runtime) 625 ); 626 627 /* 628 * Tracepoint for sched_yield() of a deadline task (the task 629 * went to sleep waiting for the next period) 630 */ 631 DEFINE_EVENT(sched_deadline_template, sched_deadline_yield, 632 TP_PROTO(struct sched_dl_entity *dl_se), 633 TP_ARGS(dl_se)); 634 635 /* 636 * Tracepoint for throttling of a deadline task that consumed all its 637 * runtime 638 */ 639 DEFINE_EVENT_CONDITION(sched_deadline_template, sched_deadline_throttle, 640 TP_PROTO(struct sched_dl_entity *dl_se), 641 TP_ARGS(dl_se), 642 TP_CONDITION(!dl_se->dl_yielded)); 643 644 /* 645 * Tracepoint for blocking of a deadline task. The deadline task was 646 * dequeued, but neither by being throttled nor yielding. 647 */ 648 DEFINE_EVENT_CONDITION(sched_deadline_template, sched_deadline_block, 649 TP_PROTO(struct sched_dl_entity *dl_se), 650 TP_ARGS(dl_se), 651 TP_CONDITION(!dl_se->dl_yielded && !dl_se->dl_throttled)); 652 #endif /* _TRACE_SCHED_H */ 653 654 /* This part must be outside protection */ > 655 #include <trace/define_trace.h> --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 30800 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 4/4] tools lib traceevent: Implements '%' operation 2016-02-22 17:08 [PATCH 0/4] Tracepoints for deadline scheduler Daniel Bristot de Oliveira ` (2 preceding siblings ...) 2016-02-22 17:08 ` [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler Daniel Bristot de Oliveira @ 2016-02-22 17:08 ` Daniel Bristot de Oliveira 2016-02-22 20:23 ` Steven Rostedt 2016-02-25 5:41 ` [tip:perf/core] tools lib traceevent: Implement " tip-bot for Daniel Bristot de Oliveira 3 siblings, 2 replies; 22+ messages in thread From: Daniel Bristot de Oliveira @ 2016-02-22 17:08 UTC (permalink / raw) To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli Cc: Arnaldo Carvalho de Melo, LKML, linux-rt-users The operation '%' is not implemented on event-parse.c, causing an error on the parse of events with '%' operation on its printk format. For example, # perf record -e sched:sched_deadline_yield ~/y Warning: [sched:sched_deadline_yield] unknown op '%' .... # perf script Warning: [sched:sched_deadline_yield] unknown op '%' y 1641 [006] 3364.109319: sched:sched_deadline_yield: \ [FAILED TO PARSE] now=3364109314595 \ deadline=3364139295135 runtime=19975597 This patch implements the '%' operation. With this patch, we see the correct output: # perf record -e sched:sched_deadline_yield ~/y No Warning # perf script y 4005 [001] 4623.650978: sched:sched_deadline_yield: \ now=4623.650974050 \ deadline=4623.680957364 remaining_runtime=19979611 Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> --- tools/lib/traceevent/event-parse.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index c3bd294..575e751 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -1951,6 +1951,7 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok) strcmp(token, "*") == 0 || strcmp(token, "^") == 0 || strcmp(token, "/") == 0 || + strcmp(token, "%") == 0 || strcmp(token, "<") == 0 || strcmp(token, ">") == 0 || strcmp(token, "<=") == 0 || @@ -3689,6 +3690,9 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg case '/': val = left / right; break; + case '%': + val = left % right; + break; case '*': val = left * right; break; -- 2.5.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tools lib traceevent: Implements '%' operation 2016-02-22 17:08 ` [PATCH 4/4] tools lib traceevent: Implements '%' operation Daniel Bristot de Oliveira @ 2016-02-22 20:23 ` Steven Rostedt 2016-02-23 14:38 ` Arnaldo Carvalho de Melo 2016-02-25 5:41 ` [tip:perf/core] tools lib traceevent: Implement " tip-bot for Daniel Bristot de Oliveira 1 sibling, 1 reply; 22+ messages in thread From: Steven Rostedt @ 2016-02-22 20:23 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli, Arnaldo Carvalho de Melo, LKML, linux-rt-users On Mon, 22 Feb 2016 14:08:22 -0300 Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > The operation '%' is not implemented on event-parse.c, causing > an error on the parse of events with '%' operation on its > printk format. For example, > > # perf record -e sched:sched_deadline_yield ~/y > Warning: [sched:sched_deadline_yield] unknown op '%' > .... > # perf script > Warning: [sched:sched_deadline_yield] unknown op '%' > y 1641 [006] 3364.109319: sched:sched_deadline_yield: \ > [FAILED TO PARSE] now=3364109314595 \ > deadline=3364139295135 runtime=19975597 > > This patch implements the '%' operation. With this patch, we see the > correct output: > > # perf record -e sched:sched_deadline_yield ~/y > No Warning > > # perf script > y 4005 [001] 4623.650978: sched:sched_deadline_yield: \ > now=4623.650974050 \ > deadline=4623.680957364 remaining_runtime=19979611 > > Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Arnaldo, This patch is not dependent on the rest of the series and looks like a good general fix. Can you pull this one into your tree, and please add my: Reviewed-by: Steven Rostedt <rsotedt@goodmis.org> Thanks! -- Steve > --- > tools/lib/traceevent/event-parse.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c > index c3bd294..575e751 100644 > --- a/tools/lib/traceevent/event-parse.c > +++ b/tools/lib/traceevent/event-parse.c > @@ -1951,6 +1951,7 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok) > strcmp(token, "*") == 0 || > strcmp(token, "^") == 0 || > strcmp(token, "/") == 0 || > + strcmp(token, "%") == 0 || > strcmp(token, "<") == 0 || > strcmp(token, ">") == 0 || > strcmp(token, "<=") == 0 || > @@ -3689,6 +3690,9 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg > case '/': > val = left / right; > break; > + case '%': > + val = left % right; > + break; > case '*': > val = left * right; > break; ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tools lib traceevent: Implements '%' operation 2016-02-22 20:23 ` Steven Rostedt @ 2016-02-23 14:38 ` Arnaldo Carvalho de Melo 2016-02-23 14:44 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 22+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-02-23 14:38 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli, LKML, linux-rt-users Em Mon, Feb 22, 2016 at 03:23:03PM -0500, Steven Rostedt escreveu: > On Mon, 22 Feb 2016 14:08:22 -0300 > Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > > > The operation '%' is not implemented on event-parse.c, causing > > an error on the parse of events with '%' operation on its > > printk format. For example, > > > > # perf record -e sched:sched_deadline_yield ~/y > > Warning: [sched:sched_deadline_yield] unknown op '%' > > .... > > # perf script > > Warning: [sched:sched_deadline_yield] unknown op '%' > > y 1641 [006] 3364.109319: sched:sched_deadline_yield: \ > > [FAILED TO PARSE] now=3364109314595 \ > > deadline=3364139295135 runtime=19975597 > > > > This patch implements the '%' operation. With this patch, we see the > > correct output: > > > > # perf record -e sched:sched_deadline_yield ~/y > > No Warning > > > > # perf script > > y 4005 [001] 4623.650978: sched:sched_deadline_yield: \ > > now=4623.650974050 \ > > deadline=4623.680957364 remaining_runtime=19979611 > > > > Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> > > Arnaldo, > > This patch is not dependent on the rest of the series and looks like a > good general fix. > > Can you pull this one into your tree, and please add my: > > Reviewed-by: Steven Rostedt <rsotedt@goodmis.org> Done, thanks. - Arnaldo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] tools lib traceevent: Implements '%' operation 2016-02-23 14:38 ` Arnaldo Carvalho de Melo @ 2016-02-23 14:44 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 22+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-02-23 14:44 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli, LKML, acme, linux-rt-users Em Tue, Feb 23, 2016 at 11:38:19AM -0300, Arnaldo Carvalho de Melo escreveu: > Em Mon, Feb 22, 2016 at 03:23:03PM -0500, Steven Rostedt escreveu: > > On Mon, 22 Feb 2016 14:08:22 -0300 > > Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > > > > > The operation '%' is not implemented on event-parse.c, causing > > > an error on the parse of events with '%' operation on its > > > printk format. For example, > > > > > > # perf record -e sched:sched_deadline_yield ~/y > > > Warning: [sched:sched_deadline_yield] unknown op '%' > > > .... > > > # perf script Daniel, please add two spaces before examples, specially examples with the '#' root prompt. Had I not done that when applying this patch, those lines would have been removed by git-am, as it would interpret those as comments. Thanks, - Arnaldo ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tip:perf/core] tools lib traceevent: Implement '%' operation 2016-02-22 17:08 ` [PATCH 4/4] tools lib traceevent: Implements '%' operation Daniel Bristot de Oliveira 2016-02-22 20:23 ` Steven Rostedt @ 2016-02-25 5:41 ` tip-bot for Daniel Bristot de Oliveira 1 sibling, 0 replies; 22+ messages in thread From: tip-bot for Daniel Bristot de Oliveira @ 2016-02-25 5:41 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, rostedt, linux-rt-users, juri.lelli, acme, tglx, linux-kernel, bristot, hpa, peterz Commit-ID: 0e47b38dcd24c78d0699b42f28d5986154d2aa11 Gitweb: http://git.kernel.org/tip/0e47b38dcd24c78d0699b42f28d5986154d2aa11 Author: Daniel Bristot de Oliveira <bristot@redhat.com> AuthorDate: Mon, 22 Feb 2016 14:08:22 -0300 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitDate: Tue, 23 Feb 2016 11:40:51 -0300 tools lib traceevent: Implement '%' operation The operation '%' is not implemented on event-parse.c, causing an error when parsing events with '%' the operation in its printk format. For example, # perf record -e sched:sched_deadline_yield ~/yield-test Warning: [sched:sched_deadline_yield] unknown op '%' .... # perf script Warning: [sched:sched_deadline_yield] unknown op '%' test 1641 [006] 3364.109319: sched:sched_deadline_yield: \ [FAILED TO PARSE] now=3364109314595 \ deadline=3364139295135 runtime=19975597 This patch implements the '%' operation. With this patch, we see the correct output: # perf record -e sched:sched_deadline_yield ~/yield-test No Warning # perf script yield-test 4005 [001] 4623.650978: sched:sched_deadline_yield: \ now=4623.650974050 \ deadline=4623.680957364 remaining_runtime=19979611 Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/5c96a395c56cea6d3d13d949051bdece86cc26e0.1456157869.git.bristot@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/lib/traceevent/event-parse.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index c3bd294..575e751 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -1951,6 +1951,7 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok) strcmp(token, "*") == 0 || strcmp(token, "^") == 0 || strcmp(token, "/") == 0 || + strcmp(token, "%") == 0 || strcmp(token, "<") == 0 || strcmp(token, ">") == 0 || strcmp(token, "<=") == 0 || @@ -3689,6 +3690,9 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg case '/': val = left / right; break; + case '%': + val = left % right; + break; case '*': val = left * right; break; ^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-02-25 6:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-22 17:08 [PATCH 0/4] Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 1/4] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 2/4] sched: Move deadline container_of() helper functions into sched.h Daniel Bristot de Oliveira 2016-02-22 17:08 ` [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler Daniel Bristot de Oliveira 2016-02-22 17:32 ` Peter Zijlstra 2016-02-22 17:48 ` Steven Rostedt 2016-02-22 20:11 ` Daniel Bristot de Oliveira 2016-02-22 21:30 ` Peter Zijlstra 2016-02-22 22:30 ` Steven Rostedt 2016-02-23 10:40 ` Juri Lelli 2016-02-23 10:44 ` Peter Zijlstra 2016-02-23 13:10 ` Steven Rostedt 2016-02-24 8:48 ` Ingo Molnar 2016-02-23 14:27 ` Peter Zijlstra 2016-02-23 16:19 ` Daniel Bristot de Oliveira 2016-02-24 2:29 ` Daniel Bristot de Oliveira 2016-02-22 17:48 ` kbuild test robot 2016-02-22 17:08 ` [PATCH 4/4] tools lib traceevent: Implements '%' operation Daniel Bristot de Oliveira 2016-02-22 20:23 ` Steven Rostedt 2016-02-23 14:38 ` Arnaldo Carvalho de Melo 2016-02-23 14:44 ` Arnaldo Carvalho de Melo 2016-02-25 5:41 ` [tip:perf/core] tools lib traceevent: Implement " tip-bot for Daniel Bristot de Oliveira
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.