All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Auld <pauld@redhat.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Mike Galbraith <efault@gmx.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running
Date: Thu, 23 Sep 2021 08:24:03 -0400	[thread overview]
Message-ID: <YUxx42W3K2Ur7W84@lorien.usersys.redhat.com> (raw)
In-Reply-To: <CAKfTPtBBqLghrXrayyoBQQyDqdv6+pdCjiZkmzLaGvdNtN=Aug@mail.gmail.com>

On Thu, Sep 23, 2021 at 10:40:48AM +0200 Vincent Guittot wrote:
> On Thu, 23 Sept 2021 at 03:47, Mike Galbraith <efault@gmx.de> wrote:
> >
> > On Wed, 2021-09-22 at 20:22 +0200, Vincent Guittot wrote:
> > > On Wed, 22 Sept 2021 at 19:38, Mel Gorman <mgorman@techsingularity.net> wrote:
> > > >
> > > >
> > > > I'm not seeing an alternative suggestion that could be turned into
> > > > an implementation. The current value for sched_wakeup_granularity
> > > > was set 12 years ago was exposed for tuning which is no longer
> > > > the case. The intent was to allow some dynamic adjustment between
> > > > sysctl_sched_wakeup_granularity and sysctl_sched_latency to reduce
> > > > over-scheduling in the worst case without disabling preemption entirely
> > > > (which the first version did).
> >
> > I don't think those knobs were ever _intended_ for general purpose
> > tuning, but they did get used that way by some folks.
> >
> > > >
> > > > Should we just ignore this problem and hope it goes away or just let
> > > > people keep poking silly values into debugfs via tuned?
> > >
> > > We should certainly not add a bandaid because people will continue to
> > > poke silly value at the end. And increasing
> > > sysctl_sched_wakeup_granularity based on the number of running threads
> > > is not the right solution.
> >
> > Watching my desktop box stack up large piles of very short running
> > threads, I agree, instantaneous load looks like a non-starter.
> >
> > >  According to the description of your
> > > problem that the current task doesn't get enough time to move forward,
> > > sysctl_sched_min_granularity should be part of the solution. Something
> > > like below will ensure that current got a chance to move forward
> >
> > Nah, progress is guaranteed, the issue is a zillion very similar short
> > running threads preempting each other with no win to be had, thus
> > spending cycles in the scheduler that are utterly wasted.  It's a valid
> > issue, trouble is teaching the scheduler to recognize that situation
> > without mucking up other situations where there IS a win for even very
> > short running threads say, doing a synchronous handoff; preemption is
> > cheaper than scheduling off if the waker is going be awakened again in
> > very short order.
> >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 9bf540f04c2d..39d4e4827d3d 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7102,6 +7102,7 @@ static void check_preempt_wakeup(struct rq *rq,
> > > struct task_struct *p, int wake_
> > >         int scale = cfs_rq->nr_running >= sched_nr_latency;
> > >         int next_buddy_marked = 0;
> > >         int cse_is_idle, pse_is_idle;
> > > +       unsigned long delta_exec;
> > >
> > >         if (unlikely(se == pse))
> > >                 return;
> > > @@ -7161,6 +7162,13 @@ static void check_preempt_wakeup(struct rq *rq,
> > > struct task_struct *p, int wake_
> > >                 return;
> > >
> > >         update_curr(cfs_rq_of(se));
> > > +       delta_exec = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > > +       /*
> > > +        * Ensure that current got a chance to move forward
> > > +        */
> > > +       if (delta_exec < sysctl_sched_min_granularity)
> > > +               return;
> > > +
> > >         if (wakeup_preempt_entity(se, pse) == 1) {
> > >                 /*
> > >                  * Bias pick_next to pick the sched entity that is
> >
> > Yikes!  If you do that, you may as well go the extra nanometer and rip
> > wakeup preemption out entirely, same result, impressive diffstat.
> 
> This patch is mainly there to show that there are other ways to ensure
> progress without using some load heuristic.
> sysctl_sched_min_granularity has the problem of scaling with the
> number of cpus and this can generate large values. At least we should
> use the normalized_sysctl_sched_min_granularity or even a smaller
> value but wakeup preemption still happens with this change. It only
> ensures that we don't waste time preempting each other without any
> chance to do actual stuff.
>

It's capped at 8 cpus, which is pretty easy to reach these days, so the
values don't get too large.  That scaling is almost a no-op these days.


Cheers,
Phil


> a 100us value should even be enough to fix Mel's problem without
> impacting common wakeup preemption cases.
> 
> 
> >
> >         -Mike
> 

-- 


  parent reply	other threads:[~2021-09-23 12:24 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 14:26 [PATCH 0/2] Scale wakeup granularity relative to nr_running Mel Gorman
2021-09-20 14:26 ` [PATCH 1/2] sched/fair: Remove redundant lookup of rq in check_preempt_wakeup Mel Gorman
2021-09-21  7:21   ` Vincent Guittot
2021-09-21  7:53     ` Mel Gorman
2021-09-21  8:12       ` Vincent Guittot
2021-09-21  8:21       ` Peter Zijlstra
2021-09-21 10:03         ` Mel Gorman
2021-09-20 14:26 ` [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running Mel Gorman
2021-09-21  3:52   ` Mike Galbraith
2021-09-21  5:50     ` Mike Galbraith
2021-09-21  7:04     ` Mike Galbraith
2021-09-21 10:36     ` Mel Gorman
2021-09-21 12:32       ` Mike Galbraith
2021-09-21 14:03         ` Mel Gorman
2021-10-05  9:24         ` Peter Zijlstra
2021-09-22  5:22       ` Mike Galbraith
2021-09-22 13:20         ` Mel Gorman
2021-09-22 14:04           ` Mike Galbraith
2021-09-22 14:15           ` Vincent Guittot
2021-09-22 15:04             ` Mel Gorman
2021-09-22 16:00               ` Vincent Guittot
2021-09-22 17:38                 ` Mel Gorman
2021-09-22 18:22                   ` Vincent Guittot
2021-09-22 18:57                     ` Mel Gorman
2021-09-23  1:47                     ` Mike Galbraith
2021-09-23  8:40                       ` Vincent Guittot
2021-09-23  9:21                         ` Mike Galbraith
2021-09-23 12:41                           ` Vincent Guittot
2021-09-23 13:14                             ` Mike Galbraith
2021-09-27 11:17                             ` Mel Gorman
2021-09-27 14:17                               ` Mike Galbraith
2021-10-04  8:05                                 ` Mel Gorman
2021-10-04 16:37                                   ` Vincent Guittot
2021-10-05  7:41                                     ` Mel Gorman
2021-09-27 14:19                               ` Vincent Guittot
2021-09-27 15:02                                 ` Mel Gorman
2021-09-23 12:24                         ` Phil Auld [this message]
2021-10-05 10:36                           ` Peter Zijlstra
2021-10-05 14:12                             ` Phil Auld
2021-10-05 14:32                               ` Peter Zijlstra
2021-10-05 10:28                     ` Peter Zijlstra
2021-10-05 10:23                   ` Peter Zijlstra
2021-10-05  9:41               ` Peter Zijlstra
2021-09-22 15:05             ` Vincent Guittot
2021-10-05  9:32           ` Peter Zijlstra
2021-10-03  3:07         ` wakeup_affine_weight() is b0rked - was " Mike Galbraith
2021-10-03  7:34           ` Barry Song
2021-10-03 14:52             ` Mike Galbraith
2021-10-03 21:06               ` Barry Song
2021-10-04  1:49                 ` Mike Galbraith
2021-10-04  4:34             ` Mike Galbraith
2021-10-04  9:06               ` Mike Galbraith
2021-10-05  7:47                 ` Mel Gorman
2021-10-05  8:42                   ` Mike Galbraith
2021-10-05  9:31                     ` Mel Gorman
2021-10-06  6:46                       ` Mike Galbraith
2021-10-08  5:06                       ` Mike Galbraith
2021-09-21  8:03   ` Vincent Guittot
2021-09-21 10:45     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUxx42W3K2Ur7W84@lorien.usersys.redhat.com \
    --to=pauld@redhat.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.