[v4,17/19] sched: Add migrate_disable() tracepoints
diff mbox series

Message ID 20201023102347.697960969@infradead.org
State New, archived
Headers show
Series
  • sched: Migrate disable support
Related show

Commit Message

Peter Zijlstra Oct. 23, 2020, 10:12 a.m. UTC
XXX write a tracer:

 - 'migirate_disable() -> migrate_enable()' time in task_sched_runtime()
 - 'migrate_pull -> sched-in' time in task_sched_runtime()

The first will give worst case for the second, which is the actual
interference experienced by the task to due migration constraints of
migrate_disable().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/trace/events/sched.h |   12 ++++++++++++
 kernel/sched/core.c          |    4 ++++
 kernel/sched/deadline.c      |    1 +
 kernel/sched/rt.c            |    8 +++++++-
 4 files changed, 24 insertions(+), 1 deletion(-)

Comments

Valentin Schneider Oct. 29, 2020, 4:27 p.m. UTC | #1
On 23/10/20 11:12, Peter Zijlstra wrote:
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1732,6 +1732,8 @@ void migrate_disable(void)
>               return;
>       }
>
> +	trace_sched_migrate_disable_tp(p);
> +
>       preempt_disable();
>       this_rq()->nr_pinned++;
>       p->migration_disabled = 1;
> @@ -1764,6 +1766,8 @@ void migrate_enable(void)
>       p->migration_disabled = 0;
>       this_rq()->nr_pinned--;
>       preempt_enable();
> +
> +	trace_sched_migrate_enable_tp(p);

Don't you want those directly after the ->migration_disabled write?
esp. for migrate_enable(), if that preempt_enable() leads to a context
switch then the disable->enable deltas won't reflect the kernel view.

That delta may indeed include the time it took to run the stopper and
fix the task's affinity on migrate_enable(), but it could include all
sorts of other higher-priority tasks.

>  }
>  EXPORT_SYMBOL_GPL(migrate_enable);
Peter Zijlstra Oct. 29, 2020, 5:43 p.m. UTC | #2
On Thu, Oct 29, 2020 at 04:27:26PM +0000, Valentin Schneider wrote:
> 
> On 23/10/20 11:12, Peter Zijlstra wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1732,6 +1732,8 @@ void migrate_disable(void)
> >               return;
> >       }
> >
> > +	trace_sched_migrate_disable_tp(p);
> > +
> >       preempt_disable();
> >       this_rq()->nr_pinned++;
> >       p->migration_disabled = 1;
> > @@ -1764,6 +1766,8 @@ void migrate_enable(void)
> >       p->migration_disabled = 0;
> >       this_rq()->nr_pinned--;
> >       preempt_enable();
> > +
> > +	trace_sched_migrate_enable_tp(p);
> 
> Don't you want those directly after the ->migration_disabled write?
> esp. for migrate_enable(), if that preempt_enable() leads to a context
> switch then the disable->enable deltas won't reflect the kernel view.
> 
> That delta may indeed include the time it took to run the stopper and
> fix the task's affinity on migrate_enable(), but it could include all
> sorts of other higher-priority tasks.

I can put them in the preempt_disable() section I suppose, but these
tracers should be looking at task_sched_runtime(), not walltime, and
then the preemption doesn't matter.

Also, a distinct lack of actual users atm.. :/
Valentin Schneider Oct. 29, 2020, 5:56 p.m. UTC | #3
On 29/10/20 17:43, Peter Zijlstra wrote:
> On Thu, Oct 29, 2020 at 04:27:26PM +0000, Valentin Schneider wrote:
>> Don't you want those directly after the ->migration_disabled write?
>> esp. for migrate_enable(), if that preempt_enable() leads to a context
>> switch then the disable->enable deltas won't reflect the kernel view.
>> 
>> That delta may indeed include the time it took to run the stopper and
>> fix the task's affinity on migrate_enable(), but it could include all
>> sorts of other higher-priority tasks.
>
> I can put them in the preempt_disable() section I suppose, but these
> tracers should be looking at task_sched_runtime(), not walltime, and
> then the preemption doesn't matter.
>

True. I was thinking of how to process it downstream, and the first thing
that came to mind was that rd->overutilized flag which we do monitor
fairly closely; however that is system-wide while migrate_disable() is
task-specific.

> Also, a distinct lack of actual users atm.. :/

If you'd rather ditch this one altogether until someone asks for it, that
also works for me.
Peter Zijlstra Oct. 29, 2020, 5:59 p.m. UTC | #4
On Thu, Oct 29, 2020 at 05:56:12PM +0000, Valentin Schneider wrote:
> 
> On 29/10/20 17:43, Peter Zijlstra wrote:
> > On Thu, Oct 29, 2020 at 04:27:26PM +0000, Valentin Schneider wrote:
> >> Don't you want those directly after the ->migration_disabled write?
> >> esp. for migrate_enable(), if that preempt_enable() leads to a context
> >> switch then the disable->enable deltas won't reflect the kernel view.
> >> 
> >> That delta may indeed include the time it took to run the stopper and
> >> fix the task's affinity on migrate_enable(), but it could include all
> >> sorts of other higher-priority tasks.
> >
> > I can put them in the preempt_disable() section I suppose, but these
> > tracers should be looking at task_sched_runtime(), not walltime, and
> > then the preemption doesn't matter.
> >
> 
> True. I was thinking of how to process it downstream, and the first thing
> that came to mind was that rd->overutilized flag which we do monitor
> fairly closely; however that is system-wide while migrate_disable() is
> task-specific.
> 
> > Also, a distinct lack of actual users atm.. :/
> 
> If you'd rather ditch this one altogether until someone asks for it, that
> also works for me.

Yeah, I can pull this patch until we get someone that actually needs it.

Patch
diff mbox series

--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -646,6 +646,18 @@  DECLARE_TRACE(sched_update_nr_running_tp
 	TP_PROTO(struct rq *rq, int change),
 	TP_ARGS(rq, change));
 
+DECLARE_TRACE(sched_migrate_disable_tp,
+	      TP_PROTO(struct task_struct *p),
+	      TP_ARGS(p));
+
+DECLARE_TRACE(sched_migrate_enable_tp,
+	      TP_PROTO(struct task_struct *p),
+	      TP_ARGS(p));
+
+DECLARE_TRACE(sched_migrate_pull_tp,
+	      TP_PROTO(struct task_struct *p),
+	      TP_ARGS(p));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1732,6 +1732,8 @@  void migrate_disable(void)
 		return;
 	}
 
+	trace_sched_migrate_disable_tp(p);
+
 	preempt_disable();
 	this_rq()->nr_pinned++;
 	p->migration_disabled = 1;
@@ -1764,6 +1766,8 @@  void migrate_enable(void)
 	p->migration_disabled = 0;
 	this_rq()->nr_pinned--;
 	preempt_enable();
+
+	trace_sched_migrate_enable_tp(p);
 }
 EXPORT_SYMBOL_GPL(migrate_enable);
 
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2245,6 +2245,7 @@  static void pull_dl_task(struct rq *this
 				goto skip;
 
 			if (is_migration_disabled(p)) {
+				trace_sched_migrate_pull_tp(p);
 				push_task = get_push_task(src_rq);
 			} else {
 				deactivate_task(src_rq, p, 0);
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1877,7 +1877,12 @@  static int push_rt_task(struct rq *rq, b
 		struct task_struct *push_task = NULL;
 		int cpu;
 
-		if (!pull || rq->push_busy)
+		if (!pull)
+			return 0;
+
+		trace_sched_migrate_pull_tp(next_task);
+
+		if (rq->push_busy)
 			return 0;
 
 		cpu = find_lowest_rq(rq->curr);
@@ -2223,6 +2228,7 @@  static void pull_rt_task(struct rq *this
 				goto skip;
 
 			if (is_migration_disabled(p)) {
+				trace_sched_migrate_pull_tp(p);
 				push_task = get_push_task(src_rq);
 			} else {
 				deactivate_task(src_rq, p, 0);