* [patch 3/21] x86, bts: wait until traced task has been scheduled out @ 2009-03-31 12:59 Markus Metzger 2009-04-01 0:17 ` Oleg Nesterov 2009-04-01 0:26 ` Oleg Nesterov 0 siblings, 2 replies; 10+ messages in thread From: Markus Metzger @ 2009-03-31 12:59 UTC (permalink / raw) To: linux-kernel, mingo, tglx, hpa Cc: markus.t.metzger, markus.t.metzger, roland, eranian, oleg, juan.villacis, ak In order to stop branch tracing for a running task, we need to first clear the branch tracing control bits before we may free the tracing buffer. If the traced task is running, the cpu might still trace that task after the branch trace control bits have cleared. Wait until the traced task has been scheduled out before proceeding. A similar problem affects the task debug store context. We first remove the context, then we need to wait until the task has been scheduled out before we can free the context memory. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> --- Index: git-tip/arch/x86/kernel/ds.c =================================================================== --- git-tip.orig/arch/x86/kernel/ds.c 2009-03-30 17:19:14.000000000 +0200 +++ git-tip/arch/x86/kernel/ds.c 2009-03-30 17:20:11.000000000 +0200 @@ -250,6 +250,42 @@ static DEFINE_PER_CPU(struct ds_context #define system_context per_cpu(system_context_array, smp_processor_id()) +/* + * Wait for the traced task to unschedule. + * + * This guarantees that the bts trace configuration has been + * synchronized with the cpu executing the task. + */ +static void wait_to_unschedule(struct task_struct *task) +{ + unsigned long nvcsw; + unsigned long nivcsw; + + if (!task) + return; + + if (task == current) + return; + + nvcsw = task->nvcsw; + nivcsw = task->nivcsw; + for (;;) { + if (!task_is_running(task)) + break; + /* + * The switch count is incremented before the actual + * context switch. We thus wait for two switches to be + * sure at least one completed. + */ + if ((task->nvcsw - nvcsw) > 1) + break; + if ((task->nivcsw - nivcsw) > 1) + break; + + schedule(); + } +} + static inline struct ds_context *ds_get_context(struct task_struct *task) { struct ds_context **p_context = @@ -321,6 +357,9 @@ static inline void ds_put_context(struct spin_unlock_irqrestore(&ds_lock, irq); + /* The context might still be in use for context switching. */ + wait_to_unschedule(context->task); + kfree(context); } @@ -789,6 +828,9 @@ void ds_release_bts(struct bts_tracer *t WARN_ON_ONCE(tracer->ds.context->bts_master != tracer); tracer->ds.context->bts_master = NULL; + /* Make sure tracing stopped and the tracer is not in use. */ + wait_to_unschedule(tracer->ds.context->task); + put_tracer(tracer->ds.context->task); ds_put_context(tracer->ds.context); --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-03-31 12:59 [patch 3/21] x86, bts: wait until traced task has been scheduled out Markus Metzger @ 2009-04-01 0:17 ` Oleg Nesterov 2009-04-01 8:09 ` Metzger, Markus T 2009-04-01 11:41 ` Ingo Molnar 2009-04-01 0:26 ` Oleg Nesterov 1 sibling, 2 replies; 10+ messages in thread From: Oleg Nesterov @ 2009-04-01 0:17 UTC (permalink / raw) To: Markus Metzger Cc: linux-kernel, mingo, tglx, hpa, markus.t.metzger, roland, eranian, juan.villacis, ak On 03/31, Markus Metzger wrote: > > +static void wait_to_unschedule(struct task_struct *task) > +{ > + unsigned long nvcsw; > + unsigned long nivcsw; > + > + if (!task) > + return; > + > + if (task == current) > + return; > + > + nvcsw = task->nvcsw; > + nivcsw = task->nivcsw; > + for (;;) { > + if (!task_is_running(task)) > + break; > + /* > + * The switch count is incremented before the actual > + * context switch. We thus wait for two switches to be > + * sure at least one completed. > + */ > + if ((task->nvcsw - nvcsw) > 1) > + break; > + if ((task->nivcsw - nivcsw) > 1) > + break; > + > + schedule(); schedule() is a nop here. We can wait unpredictably long... Ingo, do have have any ideas to improve this helper? Not that I really like it, but how about int force_unschedule(struct task_struct *p) { struct rq *rq; unsigned long flags; int running; rq = task_rq_lock(p, &flags); running = task_running(rq, p); task_rq_unlock(rq, &flags); if (running) wake_up_process(rq->migration_thread); return running; } which should be used instead of task_is_running() ? We can even do something like void wait_to_unschedule(struct task_struct *task) { struct migration_req req; rq = task_rq_lock(p, &task); running = task_running(rq, p); if (running) { // make sure __migrate_task() will do nothing req->dest_cpu = NR_CPUS + 1; init_completion(&req->done); list_add(&req->list, &rq->migration_queue); } task_rq_unlock(rq, &flags); if (running) { wake_up_process(rq->migration_thread); wait_for_completion(&req.done); } } This way we don't poll, and we need only one helper. (Can't resist, this patch is not bisect friendly, without the next patches wait_to_unschedule() is called under write_lock_irq, this is deadlockable). But anyway, I think we can do this later. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 0:17 ` Oleg Nesterov @ 2009-04-01 8:09 ` Metzger, Markus T 2009-04-01 19:04 ` Oleg Nesterov 2009-04-01 11:41 ` Ingo Molnar 1 sibling, 1 reply; 10+ messages in thread From: Metzger, Markus T @ 2009-04-01 8:09 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, mingo, tglx, hpa, markus.t.metzger, roland, eranian, Villacis, Juan, ak >-----Original Message----- >From: Oleg Nesterov [mailto:oleg@redhat.com] >Sent: Wednesday, April 01, 2009 2:17 AM >To: Metzger, Markus T >> +static void wait_to_unschedule(struct task_struct *task) >> +{ >> + unsigned long nvcsw; >> + unsigned long nivcsw; >> + >> + if (!task) >> + return; >> + >> + if (task == current) >> + return; >> + >> + nvcsw = task->nvcsw; >> + nivcsw = task->nivcsw; >> + for (;;) { >> + if (!task_is_running(task)) >> + break; >> + /* >> + * The switch count is incremented before the actual >> + * context switch. We thus wait for two switches to be >> + * sure at least one completed. >> + */ >> + if ((task->nvcsw - nvcsw) > 1) >> + break; >> + if ((task->nivcsw - nivcsw) > 1) >> + break; >> + >> + schedule(); > >schedule() is a nop here. We can wait unpredictably long... Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class and can thus not be preempted by normal threads. Non-rt workqueues use the fair_sched_class. And schedule_work() uses a non-rt workqueue. In practice, task is ptraced. It is either stopped or exiting. I don't expect to loop very often. > >Ingo, do have have any ideas to improve this helper? > >Not that I really like it, but how about > > int force_unschedule(struct task_struct *p) > { > struct rq *rq; > unsigned long flags; > int running; > > rq = task_rq_lock(p, &flags); > running = task_running(rq, p); > task_rq_unlock(rq, &flags); > > if (running) > wake_up_process(rq->migration_thread); > > return running; > } > >which should be used instead of task_is_running() ? > > >We can even do something like > > void wait_to_unschedule(struct task_struct *task) > { > struct migration_req req; > > rq = task_rq_lock(p, &task); > running = task_running(rq, p); > if (running) { > // make sure __migrate_task() will do nothing > req->dest_cpu = NR_CPUS + 1; > init_completion(&req->done); > list_add(&req->list, &rq->migration_queue); > } > task_rq_unlock(rq, &flags); > > if (running) { > wake_up_process(rq->migration_thread); > wait_for_completion(&req.done); > } > } > >This way we don't poll, and we need only one helper. > >(Can't resist, this patch is not bisect friendly, without the next patches > wait_to_unschedule() is called under write_lock_irq, this is deadlockable). I know. See the reply to patch 0; I tried to keep the patches small and focused to simplify the review work and attract reviewers. thanks and regards, markus. --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 8:09 ` Metzger, Markus T @ 2009-04-01 19:04 ` Oleg Nesterov 2009-04-01 19:52 ` Markus Metzger 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2009-04-01 19:04 UTC (permalink / raw) To: Metzger, Markus T Cc: linux-kernel, mingo, tglx, hpa, markus.t.metzger, roland, eranian, Villacis, Juan, ak On 04/01, Metzger, Markus T wrote: > > >-----Original Message----- > >From: Oleg Nesterov [mailto:oleg@redhat.com] > >Sent: Wednesday, April 01, 2009 2:17 AM > >To: Metzger, Markus T > > >> +static void wait_to_unschedule(struct task_struct *task) > >> +{ > >> + unsigned long nvcsw; > >> + unsigned long nivcsw; > >> + > >> + if (!task) > >> + return; > >> + > >> + if (task == current) > >> + return; > >> + > >> + nvcsw = task->nvcsw; > >> + nivcsw = task->nivcsw; > >> + for (;;) { > >> + if (!task_is_running(task)) > >> + break; > >> + /* > >> + * The switch count is incremented before the actual > >> + * context switch. We thus wait for two switches to be > >> + * sure at least one completed. > >> + */ > >> + if ((task->nvcsw - nvcsw) > 1) > >> + break; > >> + if ((task->nivcsw - nivcsw) > 1) > >> + break; > >> + > >> + schedule(); > > > >schedule() is a nop here. We can wait unpredictably long... > > Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class > and can thus not be preempted by normal threads. Non-rt workqueues > use the fair_sched_class. And schedule_work() uses a non-rt workqueue. I was unclear, sorry. I meant, in this case while (!CONDITION) schedule(); is not better compared to while (!CONDITION) ; /* do nothing */ (OK, schedule() is better without CONFIG_PREEMPT, but this doesn't matter). wait_to_unschedule() just spins waiting for ->nXvcsw, this is not optimal. And another problem, we can wait unpredictably long, because > In practice, task is ptraced. It is either stopped or exiting. > I don't expect to loop very often. No. The task _was_ ptraced when we called (say) ptrace_detach(). But when work->func() runs, the tracee is not traced, it is running (not necessary of course, the tracer _can_ leave it in TASK_STOPPED). Now, again, suppose that this task does "for (;;) ;" in user-space. If CPU is "free", it can spin "forever" without re-scheduling. Yes sure, this case is not likely in practice, but still. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 19:04 ` Oleg Nesterov @ 2009-04-01 19:52 ` Markus Metzger 0 siblings, 0 replies; 10+ messages in thread From: Markus Metzger @ 2009-04-01 19:52 UTC (permalink / raw) To: Oleg Nesterov Cc: Metzger, Markus T, linux-kernel, mingo, tglx, hpa, roland, eranian, Villacis, Juan, ak On Wed, 2009-04-01 at 21:04 +0200, Oleg Nesterov wrote: > On 04/01, Metzger, Markus T wrote: > > > > >-----Original Message----- > > >From: Oleg Nesterov [mailto:oleg@redhat.com] > > >Sent: Wednesday, April 01, 2009 2:17 AM > > >To: Metzger, Markus T > > > > >> +static void wait_to_unschedule(struct task_struct *task) > > >> +{ > > >> + unsigned long nvcsw; > > >> + unsigned long nivcsw; > > >> + > > >> + if (!task) > > >> + return; > > >> + > > >> + if (task == current) > > >> + return; > > >> + > > >> + nvcsw = task->nvcsw; > > >> + nivcsw = task->nivcsw; > > >> + for (;;) { > > >> + if (!task_is_running(task)) > > >> + break; > > >> + /* > > >> + * The switch count is incremented before the actual > > >> + * context switch. We thus wait for two switches to be > > >> + * sure at least one completed. > > >> + */ > > >> + if ((task->nvcsw - nvcsw) > 1) > > >> + break; > > >> + if ((task->nivcsw - nivcsw) > 1) > > >> + break; > > >> + > > >> + schedule(); > > > > > >schedule() is a nop here. We can wait unpredictably long... > > > > Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class > > and can thus not be preempted by normal threads. Non-rt workqueues > > use the fair_sched_class. And schedule_work() uses a non-rt workqueue. > > I was unclear, sorry. > > I meant, in this case > > while (!CONDITION) > schedule(); > > is not better compared to > > while (!CONDITION) > ; /* do nothing */ > > (OK, schedule() is better without CONFIG_PREEMPT, but this doesn't matter). > wait_to_unschedule() just spins waiting for ->nXvcsw, this is not optimal. > > And another problem, we can wait unpredictably long, because > > > In practice, task is ptraced. It is either stopped or exiting. > > I don't expect to loop very often. > > No. The task _was_ ptraced when we called (say) ptrace_detach(). But when > work->func() runs, the tracee is not traced, it is running (not necessary > of course, the tracer _can_ leave it in TASK_STOPPED). > > Now, again, suppose that this task does "for (;;) ;" in user-space. > If CPU is "free", it can spin "forever" without re-scheduling. Yes sure, > this case is not likely in practice, but still. So I should rather not call schedule()? I thought it's better to yield the cpu than to spin. I will resend a bisect-friendly version of the series (using quilt mail, this time) tomorrow. I will remove schedule() in the wait_to_unschedule() loop and also address the minor nitpicks you mentioned in your other reviews. thanks, markus. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 0:17 ` Oleg Nesterov 2009-04-01 8:09 ` Metzger, Markus T @ 2009-04-01 11:41 ` Ingo Molnar 2009-04-01 12:43 ` Metzger, Markus T 2009-04-01 19:45 ` Oleg Nesterov 1 sibling, 2 replies; 10+ messages in thread From: Ingo Molnar @ 2009-04-01 11:41 UTC (permalink / raw) To: Oleg Nesterov, Peter Zijlstra Cc: Markus Metzger, linux-kernel, tglx, hpa, markus.t.metzger, roland, eranian, juan.villacis, ak * Oleg Nesterov <oleg@redhat.com> wrote: > On 03/31, Markus Metzger wrote: > > > > +static void wait_to_unschedule(struct task_struct *task) > > +{ > > + unsigned long nvcsw; > > + unsigned long nivcsw; > > + > > + if (!task) > > + return; > > + > > + if (task == current) > > + return; > > + > > + nvcsw = task->nvcsw; > > + nivcsw = task->nivcsw; > > + for (;;) { > > + if (!task_is_running(task)) > > + break; > > + /* > > + * The switch count is incremented before the actual > > + * context switch. We thus wait for two switches to be > > + * sure at least one completed. > > + */ > > + if ((task->nvcsw - nvcsw) > 1) > > + break; > > + if ((task->nivcsw - nivcsw) > 1) > > + break; > > + > > + schedule(); > > schedule() is a nop here. We can wait unpredictably long... > > Ingo, do have have any ideas to improve this helper? hm, there's a similar looking existing facility: wait_task_inactive(). Have i missed some subtle detail that makes it inappropriate for use here? > Not that I really like it, but how about > > int force_unschedule(struct task_struct *p) > { > struct rq *rq; > unsigned long flags; > int running; > > rq = task_rq_lock(p, &flags); > running = task_running(rq, p); > task_rq_unlock(rq, &flags); > > if (running) > wake_up_process(rq->migration_thread); > > return running; > } > > which should be used instead of task_is_running() ? Yes - wait_task_inactive() should be switched to a scheme like that - it would fix bugs like: 53da1d9: fix ptrace slowness in a cleaner way. > We can even do something like > > void wait_to_unschedule(struct task_struct *task) > { > struct migration_req req; > > rq = task_rq_lock(p, &task); > running = task_running(rq, p); > if (running) { > // make sure __migrate_task() will do nothing > req->dest_cpu = NR_CPUS + 1; > init_completion(&req->done); > list_add(&req->list, &rq->migration_queue); > } > task_rq_unlock(rq, &flags); > > if (running) { > wake_up_process(rq->migration_thread); > wait_for_completion(&req.done); > } > } > > This way we don't poll, and we need only one helper. Looks even better. The migration thread would run complete(), right? A detail: i suspect this needs to be in a while() loop, for the case that the victim task raced with us and went to another CPU before we kicked it off via the migration thread. This looks very useful to me. It could also be tested easily: revert 53da1d9 and you should see: time strace dd if=/dev/zero of=/dev/null bs=1024 count=1000000 performance plummet on an SMP box. The with your fix it should go up to near full speed again. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 11:41 ` Ingo Molnar @ 2009-04-01 12:43 ` Metzger, Markus T 2009-04-01 12:53 ` Ingo Molnar 2009-04-01 19:45 ` Oleg Nesterov 1 sibling, 1 reply; 10+ messages in thread From: Metzger, Markus T @ 2009-04-01 12:43 UTC (permalink / raw) To: Ingo Molnar, Oleg Nesterov, Peter Zijlstra Cc: linux-kernel, tglx, hpa, markus.t.metzger, roland, eranian, Villacis, Juan, ak >-----Original Message----- >From: Ingo Molnar [mailto:mingo@elte.hu] >Sent: Wednesday, April 01, 2009 1:42 PM >To: Oleg Nesterov; Peter Zijlstra >* Oleg Nesterov <oleg@redhat.com> wrote: > >> On 03/31, Markus Metzger wrote: >> > >> > +static void wait_to_unschedule(struct task_struct *task) >> > +{ >> > + unsigned long nvcsw; >> > + unsigned long nivcsw; >> > + >> > + if (!task) >> > + return; >> > + >> > + if (task == current) >> > + return; >> > + >> > + nvcsw = task->nvcsw; >> > + nivcsw = task->nivcsw; >> > + for (;;) { >> > + if (!task_is_running(task)) >> > + break; >> > + /* >> > + * The switch count is incremented before the actual >> > + * context switch. We thus wait for two switches to be >> > + * sure at least one completed. >> > + */ >> > + if ((task->nvcsw - nvcsw) > 1) >> > + break; >> > + if ((task->nivcsw - nivcsw) > 1) >> > + break; >> > + >> > + schedule(); >> >> schedule() is a nop here. We can wait unpredictably long... >> >> Ingo, do have have any ideas to improve this helper? > >hm, there's a similar looking existing facility: >wait_task_inactive(). Have i missed some subtle detail that makes it >inappropriate for use here? wait_task_inactive() waits until the task is no longer TASK_RUNNING. I need to wait until the task has been scheduled out at least once. regards, markus. --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 12:43 ` Metzger, Markus T @ 2009-04-01 12:53 ` Ingo Molnar 0 siblings, 0 replies; 10+ messages in thread From: Ingo Molnar @ 2009-04-01 12:53 UTC (permalink / raw) To: Metzger, Markus T Cc: Oleg Nesterov, Peter Zijlstra, linux-kernel, tglx, hpa, markus.t.metzger, roland, eranian, Villacis, Juan, ak * Metzger, Markus T <markus.t.metzger@intel.com> wrote: > >-----Original Message----- > >From: Ingo Molnar [mailto:mingo@elte.hu] > >Sent: Wednesday, April 01, 2009 1:42 PM > >To: Oleg Nesterov; Peter Zijlstra > > > >* Oleg Nesterov <oleg@redhat.com> wrote: > > > >> On 03/31, Markus Metzger wrote: > >> > > >> > +static void wait_to_unschedule(struct task_struct *task) > >> > +{ > >> > + unsigned long nvcsw; > >> > + unsigned long nivcsw; > >> > + > >> > + if (!task) > >> > + return; > >> > + > >> > + if (task == current) > >> > + return; > >> > + > >> > + nvcsw = task->nvcsw; > >> > + nivcsw = task->nivcsw; > >> > + for (;;) { > >> > + if (!task_is_running(task)) > >> > + break; > >> > + /* > >> > + * The switch count is incremented before the actual > >> > + * context switch. We thus wait for two switches to be > >> > + * sure at least one completed. > >> > + */ > >> > + if ((task->nvcsw - nvcsw) > 1) > >> > + break; > >> > + if ((task->nivcsw - nivcsw) > 1) > >> > + break; > >> > + > >> > + schedule(); > >> > >> schedule() is a nop here. We can wait unpredictably long... > >> > >> Ingo, do have have any ideas to improve this helper? > > > >hm, there's a similar looking existing facility: > >wait_task_inactive(). Have i missed some subtle detail that makes it > >inappropriate for use here? > > wait_task_inactive() waits until the task is no longer > TASK_RUNNING. No, that's wrong, wait_task_inactive() waits until the task deschedules. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-04-01 11:41 ` Ingo Molnar 2009-04-01 12:43 ` Metzger, Markus T @ 2009-04-01 19:45 ` Oleg Nesterov 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2009-04-01 19:45 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Markus Metzger, linux-kernel, tglx, hpa, markus.t.metzger, roland, eranian, juan.villacis, ak On 04/01, Ingo Molnar wrote: > > * Oleg Nesterov <oleg@redhat.com> wrote: > > > On 03/31, Markus Metzger wrote: > > > > > > +static void wait_to_unschedule(struct task_struct *task) > > > +{ > > > + unsigned long nvcsw; > > > + unsigned long nivcsw; > > > + > > > + if (!task) > > > + return; > > > + > > > + if (task == current) > > > + return; > > > + > > > + nvcsw = task->nvcsw; > > > + nivcsw = task->nivcsw; > > > + for (;;) { > > > + if (!task_is_running(task)) > > > + break; > > > + /* > > > + * The switch count is incremented before the actual > > > + * context switch. We thus wait for two switches to be > > > + * sure at least one completed. > > > + */ > > > + if ((task->nvcsw - nvcsw) > 1) > > > + break; > > > + if ((task->nivcsw - nivcsw) > 1) > > > + break; > > > + > > > + schedule(); > > > > schedule() is a nop here. We can wait unpredictably long... > > > > Ingo, do have have any ideas to improve this helper? > > hm, there's a similar looking existing facility: > wait_task_inactive(). Have i missed some subtle detail that makes it > inappropriate for use here? Yes, there are similar, but still different. wait_to_unschedule(task) waits until this task does context switch at least once. It is fine if this task runs again when wait_to_unschedule() returns. (if !task_is_running(task), it already did context switch). wait_task_inactive() ensures that this task is deactivated. It can't be used here, because it can "never" be deactivated. > > int force_unschedule(struct task_struct *p) > > { > > struct rq *rq; > > unsigned long flags; > > int running; > > > > rq = task_rq_lock(p, &flags); > > running = task_running(rq, p); > > task_rq_unlock(rq, &flags); > > > > if (running) > > wake_up_process(rq->migration_thread); > > > > return running; > > } > > > > which should be used instead of task_is_running() ? > > Yes - wait_task_inactive() should be switched to a scheme like that Yes, I thought about this, perhaps we can improve wait_task_inactive() a bit. Unfortunately, this is not enough to kill schedule_timeout(1). > - it would fix bugs like: > > 53da1d9: fix ptrace slowness I don't think so. Quite contrary, the problem with "fix ptrace slowness" is that we do not want the TASK_TRACED task to be preempted before it does the voluntary schedule() (without PREEMPT_ACTIVE). > > void wait_to_unschedule(struct task_struct *task) > > { > > struct migration_req req; > > > > rq = task_rq_lock(p, &task); > > running = task_running(rq, p); > > if (running) { > > // make sure __migrate_task() will do nothing > > req->dest_cpu = NR_CPUS + 1; > > init_completion(&req->done); > > list_add(&req->list, &rq->migration_queue); > > } > > task_rq_unlock(rq, &flags); > > > > if (running) { > > wake_up_process(rq->migration_thread); > > wait_for_completion(&req.done); > > } > > } > > > > This way we don't poll, and we need only one helper. > > Looks even better. The migration thread would run complete(), right? Yes, > A detail: i suspect this needs to be in a while() loop, for the case > that the victim task raced with us and went to another CPU before we > kicked it off via the migration thread. I think this doesn't matter. If the task is not running - we don't care and do nothing. If it is running and migrates - it should do a context switch at least once. But the code above is not right wrt cpu hotplug. wake_up_process() can hit the NULL rq->migration_thread if we race with CPU_DEAD. Hmm. don't we have this problem in, say, set_cpus_allowed_ptr() ? Unless it is called without get_online_cpus(), ->migration_thread can go away once we drop rq->lock. Perhaps, we need something like this --- kernel/sched.c +++ kernel/sched.c @@ -6132,8 +6132,10 @@ int set_cpus_allowed_ptr(struct task_str if (migrate_task(p, cpumask_any_and(cpu_online_mask, new_mask), &req)) { /* Need help from migration thread: drop lock and wait. */ + preempt_disable(); task_rq_unlock(rq, &flags); wake_up_process(rq->migration_thread); + preempt_enable(); wait_for_completion(&req.done); tlb_migrate_finish(p->mm); return 0; ? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out 2009-03-31 12:59 [patch 3/21] x86, bts: wait until traced task has been scheduled out Markus Metzger 2009-04-01 0:17 ` Oleg Nesterov @ 2009-04-01 0:26 ` Oleg Nesterov 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2009-04-01 0:26 UTC (permalink / raw) To: Markus Metzger Cc: linux-kernel, mingo, tglx, hpa, markus.t.metzger, roland, eranian, juan.villacis, ak Sorry for noise, forgot to mention... On 03/31, Markus Metzger wrote: > > static inline struct ds_context *ds_get_context(struct task_struct *task) Completely off-topic, but ds_get_context() is rather fat, imho makes sense to uninline. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-04-01 19:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-03-31 12:59 [patch 3/21] x86, bts: wait until traced task has been scheduled out Markus Metzger 2009-04-01 0:17 ` Oleg Nesterov 2009-04-01 8:09 ` Metzger, Markus T 2009-04-01 19:04 ` Oleg Nesterov 2009-04-01 19:52 ` Markus Metzger 2009-04-01 11:41 ` Ingo Molnar 2009-04-01 12:43 ` Metzger, Markus T 2009-04-01 12:53 ` Ingo Molnar 2009-04-01 19:45 ` Oleg Nesterov 2009-04-01 0:26 ` Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.