From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754477AbcIFCMG (ORCPT ); Mon, 5 Sep 2016 22:12:06 -0400 Received: from email.kedacom.com ([221.224.36.251]:37054 "EHLO test1.kedacom.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752652AbcIFCMD (ORCPT ); Mon, 5 Sep 2016 22:12:03 -0400 Subject: Re: [PATCH] sched/core: simpler function for sched_exec migration To: Oleg Nesterov References: <1473056403-7877-1-git-send-email-chengchao@kedacom.com> <20160905131147.GA8552@redhat.com> Cc: mingo@kernel.org, peterz@infradead.org, tj@kernel.org, akpm@linux-foundation.org, chris@chris-wilson.co.uk, linux-kernel@vger.kernel.org From: chengchao Message-ID: Date: Tue, 6 Sep 2016 10:11:59 +0800 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <20160905131147.GA8552@redhat.com> X-MIMETrack: Itemize by SMTP Server on kedacomsmtp/kedacom(Release 8.5.3|September 15, 2011) at 2016-09-06 10:11:57, Serialize by Router on kedacomsmtp/kedacom(Release 8.5.3|September 15, 2011) at 2016-09-06 10:11:57, Serialize complete at 2016-09-06 10:11:57, Itemize by SMTP Server on kedacomtest1/kedacom(Release 8.5.3|September 15, 2011) at 2016/09/06 10:11:55, Serialize by Router on kedacomtest1/kedacom(Release 8.5.3|September 15, 2011) at 2016/09/06 10:12:00, Serialize complete at 2016/09/06 10:12:00 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=windows-1252 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg, thank you. the key point is for CONFIG_PREEMPT_NONE=y, if sched_exec needs migrate the current, migration_cpu_stop doesn't migrate the task(current) at all, it means that the stopper thread does some unuseful works in this scenario. finally,the stopper thread calls cpu_stop_signal_done() to wake up this task, it calls select_task_rq() again, maybe select another different cpu. totally calls select_task_rq() two times(first at sched_exec()) plus one time(wake_up_new_task() also calls select_task_rq()). it is too much overhead for one task(fork()+exec()), isn't it? 1. sched_exec() ->stop_one_cpu() ->wait_for_completion(). wait_for_completion() makes the current TASK_UNINTERRUPTIBLE and call schedule_timeout() schedule_timeout(timeout) timeout is MAX_SCHEDULE_TIMEOUT. ->schedule() deactivate_task(rq, current, DEQUEUE_SLEEP); current->on_rq = 0; 2. migration_cpu_stop() checks the task_on_rq_queued(p), but the task p->on_rq is 0. #define TASK_ON_RQ_QUEUED 1 static inline int task_on_rq_queued(struct task_struct *p) { return p->on_rq == TASK_ON_RQ_QUEUED; } migration_cpu_stop() ... if (task_rq(p) == rq && task_on_rq_queued(p)) rq = __migrate_task(rq, p, arg->dest_cpu); ... thanks again, any suggestions and more reviews are welcome. on 09/05/2016 09:11 PM, Oleg Nesterov wrote: > On 09/05, cheng chao wrote: >> >> @@ -2958,7 +2958,7 @@ void sched_exec(void) >> struct migration_arg arg = { p, dest_cpu }; >> >> raw_spin_unlock_irqrestore(&p->pi_lock, flags); >> - stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg); >> + stop_one_cpu_sync(task_cpu(p), migration_cpu_stop, &arg); >> return; >> } >> unlock: >> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c >> index 4a1ca5f..24f8637 100644 >> --- a/kernel/stop_machine.c >> +++ b/kernel/stop_machine.c >> @@ -130,6 +130,27 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg) >> return done.ret; >> } >> >> +/** >> + * the caller keeps task_on_rq_queued, so it's more suitable for >> + * sched_exec on the case when needs migration >> + */ >> +void stop_one_cpu_sync(unsigned int cpu, cpu_stop_fn_t fn, void *arg) >> +{ >> + struct cpu_stop_work work = { .fn = fn, .arg = arg, .done = NULL }; >> + >> + if (!cpu_stop_queue_work(cpu, &work)) >> + return; >> + >> +#if defined(CONFIG_PREEMPT_NONE) || defined(CONFIG_PREEMPT_VOLUNTARY) >> + /* >> + * CONFIG_PREEMPT doesn't need call schedule here, because >> + * preempt_enable already does the similar thing when call >> + * cpu_stop_queue_work >> + */ >> + schedule(); >> +#endif >> +} > > Honestly, I don't really understand the changelog, but this looks wrong. > > stop_one_cpu_sync() assumes that cpu == smp_processor_id/task_cpu(current), > and thus the stopper thread should preempt us at least after schedule() > (if CONFIG_PREEMPT_NONE), so we do not need to synchronize. > yes. the stop_one_cpu_sync is not a good name, stop_one_cpu_schedule is better? there is nothing about synchronization. > But this is not necessarily true? This task can migrate to another CPU > before cpu_stop_queue_work() ? > before sched_exec() calls stop_one_cpu()/cpu_stop_queue_work(), this task(current) cannot migrate to another cpu,because this task is running on the cpu. > Oleg. > >