All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Benjamin Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 2/4] sched/fair: Introduce cfs_migration
Date: Sat, 6 Nov 2021 15:40:02 +0800	[thread overview]
Message-ID: <CALOAHbAHQ0UBn2GqRNWQwH32UPOuFo0b550oi6WCKr8+wFgdsw@mail.gmail.com> (raw)
In-Reply-To: <87a6iitu3r.mognet@arm.com>

On Sat, Nov 6, 2021 at 1:01 AM Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 04/11/21 14:57, Yafang Shao wrote:
> > A new per-cpu kthread named "cfs_migration/N" is introduced to do
> > cfs specific balance works. It is a FIFO task with priority FIFO-1,
> > which means it can preempt any cfs tasks but can't preempt other FIFO
> > tasks. The kthreads as follows,
> >
> >     PID     COMMAND
> >     13      [cfs_migration/0]
> >     20      [cfs_migration/1]
> >     25      [cfs_migration/2]
> >     32      [cfs_migration/3]
> >     38      [cfs_migration/4]
> >     ...
> >
> >     $ cat /proc/13/sched
> >     ...
> >     policy                                       :                    1
> >     prio                                         :                   98
> >     ...
> >
> >     $ cat /proc/13/status
> >     ...
> >     Cpus_allowed:     0001
> >     Cpus_allowed_list:        0
> >     ...
> >
> > All the works need to be done will be queued into a singly linked list,
> > in which the first queued will be first serviced.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > ---
> >  kernel/sched/fair.c | 93 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 93 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 87db481e8a56..56b3fa91828b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -20,6 +20,8 @@
> >   *  Adaptive scheduling granularity, math enhancements by Peter Zijlstra
> >   *  Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
> >   */
> > +#include <linux/smpboot.h>
> > +#include <linux/stop_machine.h>
> >  #include "sched.h"
> >
> >  /*
> > @@ -11915,3 +11917,94 @@ int sched_trace_rq_nr_running(struct rq *rq)
> >          return rq ? rq->nr_running : -1;
> >  }
> >  EXPORT_SYMBOL_GPL(sched_trace_rq_nr_running);
> > +
> > +#ifdef CONFIG_SMP
> > +struct cfs_migrater {
> > +     struct task_struct *thread;
> > +     struct list_head works;
> > +     raw_spinlock_t lock;
>
> Hm so the handler of that work queue will now be a SCHED_FIFO task (which
> can block) rather than a CPU stopper (which can't), but AFAICT the
> callsites that would enqueue an item can't block, so having this as a
> raw_spinlock_t should still make sense.
>
> > +};
> > +
> > +DEFINE_PER_CPU(struct cfs_migrater, cfs_migrater);
> > +
> > +static int cfs_migration_should_run(unsigned int cpu)
> > +{
> > +     struct cfs_migrater *migrater = &per_cpu(cfs_migrater, cpu);
> > +     unsigned long flags;
> > +     int run;
> > +
> > +     raw_spin_lock_irqsave(&migrater->lock, flags);
> > +     run = !list_empty(&migrater->works);
> > +     raw_spin_unlock_irqrestore(&migrater->lock, flags);
> > +
> > +     return run;
> > +}
> > +
> > +static void cfs_migration_setup(unsigned int cpu)
> > +{
> > +     /* cfs_migration should have a higher priority than normal tasks,
> > +      * but a lower priority than other FIFO tasks.
> > +      */
> > +     sched_set_fifo_low(current);
> > +}
> > +
> > +static void cfs_migrater_thread(unsigned int cpu)
> > +{
> > +     struct cfs_migrater *migrater = &per_cpu(cfs_migrater, cpu);
> > +     struct cpu_stop_work *work;
> > +
> > +repeat:
> > +     work = NULL;
> > +     raw_spin_lock_irq(&migrater->lock);
> > +     if (!list_empty(&migrater->works)) {
> > +             work = list_first_entry(&migrater->works,
> > +                                     struct cpu_stop_work, list);
> > +             list_del_init(&work->list);
> > +     }
> > +     raw_spin_unlock_irq(&migrater->lock);
> > +
> > +     if (work) {
> > +             struct cpu_stop_done *done = work->done;
> > +             cpu_stop_fn_t fn = work->fn;
> > +             void *arg = work->arg;
> > +             int ret;
> > +
> > +             preempt_count_inc();
> > +             ret = fn(arg);
> > +             if (done) {
> > +                     if (ret)
> > +                             done->ret = ret;
> > +                     cpu_stop_signal_done(done);
> > +             }
> > +             preempt_count_dec();
> > +             goto repeat;
> > +     }
> > +}
>
> You're pretty much copying the CPU stopper setup, but that seems overkill
> for the functionality we're after: migrate a CFS task from one CPU to
> another. This shouldn't need to be able to run any arbitrary callback
> function.
>
> Unfortunately you are tackling both CFS active balancing and NUMA balancing
> at the same time, and right now they're plumbed a bit differently which
> probably drove you to use something a bit for polymorphic. Ideally we
> should be making them use the same paths, but IMO it would be acceptable as
> a first step to just cater to CFS active balancing - folks that really care
> about their RT tasks can disable CONFIG_NUMA_BALANCING, but there is
> nothing to disable CFS active balancing.
>

Right. The code will be more simplified if we only care about CFS
active balancing in this patchset.
We have disabled the numa balancing through
/proc/sys/kernel/numa_balancing, so it is not a critical issue now.

>
> Now, I'm thinking the bare information we need is:
>
> - a task to migrate
> - a CPU to move it to
>
> And then you can do something like...
>
> trigger_migration(task_struct *p, unsigned int dst_cpu)
> {
>         work = { p, dst_cpu };
>         get_task_struct(p);
>         /* queue work + wake migrater + wait for completion */
> }
>
> cfs_migrater_thread()
> {
>         /* ... */
>         p = work->p;
>
>         if (task_rq(p) != this_rq())
>                 goto out;
>
>         /* migrate task to work->dst_cpu */
> out:
>         complete(<some completion struct>);
>         put_task_struct(p);
> }
>

Agreed.

>
> We should also probably add something to prevent the migration from
> happening after it is no longer relevant. Say if we have something like:
>
>   <queue work to migrate p from CPU0 to CPU1>
>   <FIFO-2 task runs for 42 seconds on CPU0>
>   <cfs_migration/0 now gets to run>
>
> p could have moved elsewhere while cfs_migration/0. I'm thinking source CPU
> could be a useful information, but that doesn't tell you if the task moved
> around in the meantime...
>
> WDYT?

Agreed.
It seems we'd better take the patch[1] I sent several weeks back.

[1]. https://lore.kernel.org/lkml/20210615121551.31138-1-laoar.shao@gmail.com/

-- 
Thanks
Yafang

  reply	other threads:[~2021-11-06  7:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-04 14:57 [RFC PATCH 0/4] sched: Introduce cfs_migration Yafang Shao
2021-11-04 14:57 ` [RFC PATCH 1/4] stop_machine: Move cpu_stop_done into stop_machine.h Yafang Shao
2021-11-05 17:00   ` Valentin Schneider
2021-11-06  7:30     ` Yafang Shao
2021-11-04 14:57 ` [RFC PATCH 2/4] sched/fair: Introduce cfs_migration Yafang Shao
2021-11-04 22:22   ` kernel test robot
2021-11-05  6:48     ` Yafang Shao
2021-11-04 22:22   ` [RFC PATCH] sched/fair: __pcpu_scope_cfs_migrater can be static kernel test robot
2021-11-05  6:45     ` Yafang Shao
2021-11-05 17:01   ` [RFC PATCH 2/4] sched/fair: Introduce cfs_migration Valentin Schneider
2021-11-06  7:40     ` Yafang Shao [this message]
2021-11-09 10:47       ` Valentin Schneider
2021-11-10 14:17         ` Yafang Shao
2021-11-07  9:38   ` [sched/fair] 64228563c2: WARNING:at_kernel/kthread.c:#__kthread_bind_mask kernel test robot
2021-11-07  9:38     ` kernel test robot
2021-11-08  3:53     ` Yafang Shao
2021-11-08  3:53       ` Yafang Shao
2021-11-04 14:57 ` [RFC PATCH 3/4] sched/fair: Do active load balance in cfs_migration Yafang Shao
2021-11-04 14:57 ` [RFC PATCH 4/4] sched/core: Do numa " Yafang Shao
2021-11-05 17:02   ` Valentin Schneider
2021-11-06  7:40     ` Yafang Shao
2021-11-05 17:00 ` [RFC PATCH 0/4] sched: Introduce cfs_migration Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbAHQ0UBn2GqRNWQwH32UPOuFo0b550oi6WCKr8+wFgdsw@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.