linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
@ 2021-04-19  2:17 Rik van Riel
  2021-04-19  6:28 ` kernel test robot
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Rik van Riel @ 2021-04-19  2:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Mel Gorman

The try_to_wake_up function has an optimization where it can queue
a task for wakeup on its previous CPU, if the task is still in the
middle of going to sleep inside schedule().

Once schedule() re-enables IRQs, the task will be woken up with an
IPI, and placed back on the runqueue.

If we have such a wakeup pending, there is no need to search other
CPUs for runnable tasks. Just skip (or bail out early from) newidle
balancing, and run the just woken up task.

For a memcache like workload test, this reduces total CPU use by
about 2%, proportionally split between user and system time,
and p99 and p95 application response time by 2-3% on average.
The schedstats run_delay number shows a similar improvement.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 kernel/sched/fair.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69680158963f..19a92c48939f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7163,6 +7163,14 @@ done: __maybe_unused;
 	if (!rf)
 		return NULL;
 
+	/*
+	 * We have a woken up task pending here. No need to search for ones
+	 * elsewhere. This task will be enqueued the moment we unblock irqs
+	 * upon exiting the scheduler.
+	 */
+	if (rq->ttwu_pending)
+		return NULL;
+
 	new_tasks = newidle_balance(rq, rf);
 
 	/*
@@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 		 * Stop searching for tasks to pull if there are
 		 * now runnable tasks on this rq.
 		 */
-		if (pulled_task || this_rq->nr_running > 0)
+		if (pulled_task || this_rq->nr_running > 0 ||
+						this_rq->ttwu_pending)
 			break;
 	}
 	rcu_read_unlock();
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
  2021-04-19  2:17 [PATCH] sched,fair: skip newidle_balance if a wakeup is pending Rik van Riel
@ 2021-04-19  6:28 ` kernel test robot
  2021-04-19 10:02 ` Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: kernel test robot @ 2021-04-19  6:28 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: kbuild-all, clang-built-linux, kernel-team, Peter Zijlstra,
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 9860 bytes --]

Hi Rik,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linux/master linus/master v5.12-rc8 next-20210416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Rik-van-Riel/sched-fair-skip-newidle_balance-if-a-wakeup-is-pending/20210419-101843
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 9406415f46f6127fd31bb66f0260f7a61a8d2786
config: riscv-randconfig-r025-20210419 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 2b50f5a4343f8fb06acaa5c36355bcf58092c9cd)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/3f6b55f5258c8d8d217e8b8408de056a20745824
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Rik-van-Riel/sched-fair-skip-newidle_balance-if-a-wakeup-is-pending/20210419-101843
        git checkout 3f6b55f5258c8d8d217e8b8408de056a20745824
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return inw(addr);
                  ^~~~~~~~~
   arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inw'
   #define inw(c)          ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
                                                                           ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:88:76: note: expanded from macro 'readw_cpu'
   #define readw_cpu(c)            ({ u16 __r = le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
                                                                                        ^
   include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:17:
   In file included from include/linux/sched/isolation.h:6:
   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return inl(addr);
                  ^~~~~~~~~
   arch/riscv/include/asm/io.h:57:76: note: expanded from macro 'inl'
   #define inl(c)          ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
                                                                           ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
   #define readl_cpu(c)            ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
                                                                                        ^
   include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:17:
   In file included from include/linux/sched/isolation.h:6:
   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outb(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:59:68: note: expanded from macro 'outb'
   #define outb(v,c)       ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:91:52: note: expanded from macro 'writeb_cpu'
   #define writeb_cpu(v, c)        ((void)__raw_writeb((v), (c)))
                                                             ^
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:17:
   In file included from include/linux/sched/isolation.h:6:
   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:588:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outw(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:60:68: note: expanded from macro 'outw'
   #define outw(v,c)       ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:92:76: note: expanded from macro 'writew_cpu'
   #define writew_cpu(v, c)        ((void)__raw_writew((__force u16)cpu_to_le16(v), (c)))
                                                                                     ^
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:17:
   In file included from include/linux/sched/isolation.h:6:
   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:596:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outl(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:61:68: note: expanded from macro 'outl'
   #define outl(v,c)       ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:93:76: note: expanded from macro 'writel_cpu'
   #define writel_cpu(v, c)        ((void)__raw_writel((__force u32)cpu_to_le32(v), (c)))
                                                                                     ^
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:17:
   In file included from include/linux/sched/isolation.h:6:
   In file included from include/linux/tick.h:8:
   In file included from include/linux/clockchips.h:14:
   In file included from include/linux/clocksource.h:21:
   In file included from arch/riscv/include/asm/io.h:149:
   include/asm-generic/io.h:1005:55: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
                                                     ~~~~~~~~~~ ^
   kernel/sched/fair.c:637:5: warning: no previous prototype for function 'sched_update_scaling' [-Wmissing-prototypes]
   int sched_update_scaling(void)
       ^
   kernel/sched/fair.c:637:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int sched_update_scaling(void)
   ^
   static 
>> kernel/sched/fair.c:7208:10: error: no member named 'ttwu_pending' in 'struct rq'
           if (rq->ttwu_pending)
               ~~  ^
   8 warnings and 1 error generated.


vim +7208 kernel/sched/fair.c

  7191	
  7192		if (hrtick_enabled_fair(rq))
  7193			hrtick_start_fair(rq, p);
  7194	
  7195		update_misfit_status(p, rq);
  7196	
  7197		return p;
  7198	
  7199	idle:
  7200		if (!rf)
  7201			return NULL;
  7202	
  7203		/*
  7204		 * We have a woken up task pending here. No need to search for ones
  7205		 * elsewhere. This task will be enqueued the moment we unblock irqs
  7206		 * upon exiting the scheduler.
  7207		 */
> 7208		if (rq->ttwu_pending)
  7209			return NULL;
  7210	
  7211		new_tasks = newidle_balance(rq, rf);
  7212	
  7213		/*
  7214		 * Because newidle_balance() releases (and re-acquires) rq->lock, it is
  7215		 * possible for any higher priority task to appear. In that case we
  7216		 * must re-start the pick_next_entity() loop.
  7217		 */
  7218		if (new_tasks < 0)
  7219			return RETRY_TASK;
  7220	
  7221		if (new_tasks > 0)
  7222			goto again;
  7223	
  7224		/*
  7225		 * rq is about to be idle, check if we need to update the
  7226		 * lost_idle_time of clock_pelt
  7227		 */
  7228		update_idle_rq_clock_pelt(rq);
  7229	
  7230		return NULL;
  7231	}
  7232	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27518 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
  2021-04-19  2:17 [PATCH] sched,fair: skip newidle_balance if a wakeup is pending Rik van Riel
  2021-04-19  6:28 ` kernel test robot
@ 2021-04-19 10:02 ` Peter Zijlstra
  2021-04-19 11:22 ` Valentin Schneider
  2021-04-19 12:12 ` Vincent Guittot
  3 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2021-04-19 10:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Mel Gorman

On Sun, Apr 18, 2021 at 10:17:51PM -0400, Rik van Riel wrote:
> The try_to_wake_up function has an optimization where it can queue
> a task for wakeup on its previous CPU, if the task is still in the
> middle of going to sleep inside schedule().
> 
> Once schedule() re-enables IRQs, the task will be woken up with an
> IPI, and placed back on the runqueue.
> 
> If we have such a wakeup pending, there is no need to search other
> CPUs for runnable tasks. Just skip (or bail out early from) newidle
> balancing, and run the just woken up task.
> 
> For a memcache like workload test, this reduces total CPU use by
> about 2%, proportionally split between user and system time,
> and p99 and p95 application response time by 2-3% on average.
> The schedstats run_delay number shows a similar improvement.

Nice.

> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  kernel/sched/fair.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69680158963f..19a92c48939f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7163,6 +7163,14 @@ done: __maybe_unused;
>  	if (!rf)
>  		return NULL;
>  
> +	/*
> +	 * We have a woken up task pending here. No need to search for ones
> +	 * elsewhere. This task will be enqueued the moment we unblock irqs
> +	 * upon exiting the scheduler.
> +	 */
> +	if (rq->ttwu_pending)
> +		return NULL;

As reported by the robot, that needs an CONFIG_SMP guard of sorts,
#ifdef might work I suppose.

>  	new_tasks = newidle_balance(rq, rf);
>  
>  	/*
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  		 * Stop searching for tasks to pull if there are
>  		 * now runnable tasks on this rq.
>  		 */
> -		if (pulled_task || this_rq->nr_running > 0)
> +		if (pulled_task || this_rq->nr_running > 0 ||
> +						this_rq->ttwu_pending)

Either cino=(0:0 or just bust the limit and make it 84 chars.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
  2021-04-19  2:17 [PATCH] sched,fair: skip newidle_balance if a wakeup is pending Rik van Riel
  2021-04-19  6:28 ` kernel test robot
  2021-04-19 10:02 ` Peter Zijlstra
@ 2021-04-19 11:22 ` Valentin Schneider
  2021-04-19 16:46   ` Rik van Riel
  2021-04-19 12:12 ` Vincent Guittot
  3 siblings, 1 reply; 6+ messages in thread
From: Valentin Schneider @ 2021-04-19 11:22 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: kernel-team, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Mel Gorman

On 18/04/21 22:17, Rik van Riel wrote:
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>                * Stop searching for tasks to pull if there are
>                * now runnable tasks on this rq.
>                */
> -		if (pulled_task || this_rq->nr_running > 0)
> +		if (pulled_task || this_rq->nr_running > 0 ||
> +						this_rq->ttwu_pending)
>                       break;

I thought newidle_balance() would already handle this by checking
idle_cpu(), but that can't work due to rq->curr never being rq->idle here
(we're trying very hard to prevent this!).

Would there be any point in bunching up these two checks from idle_cpu()
into an inline helper and reusing it here?

_nohz_idle_balance() "accidentally" already checks for that, but at the
head of the domain loop. What's the reason for doing it at the tail here?
It means we try at least one loop, but is there a point in that?

>       }
>       rcu_read_unlock();
> --
> 2.25.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
  2021-04-19  2:17 [PATCH] sched,fair: skip newidle_balance if a wakeup is pending Rik van Riel
                   ` (2 preceding siblings ...)
  2021-04-19 11:22 ` Valentin Schneider
@ 2021-04-19 12:12 ` Vincent Guittot
  3 siblings, 0 replies; 6+ messages in thread
From: Vincent Guittot @ 2021-04-19 12:12 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Kernel Team, Peter Zijlstra, Ingo Molnar,
	Dietmar Eggemann, Mel Gorman

On Mon, 19 Apr 2021 at 04:18, Rik van Riel <riel@surriel.com> wrote:
>
> The try_to_wake_up function has an optimization where it can queue
> a task for wakeup on its previous CPU, if the task is still in the
> middle of going to sleep inside schedule().
>
> Once schedule() re-enables IRQs, the task will be woken up with an
> IPI, and placed back on the runqueue.
>
> If we have such a wakeup pending, there is no need to search other
> CPUs for runnable tasks. Just skip (or bail out early from) newidle
> balancing, and run the just woken up task.
>
> For a memcache like workload test, this reduces total CPU use by
> about 2%, proportionally split between user and system time,
> and p99 and p95 application response time by 2-3% on average.
> The schedstats run_delay number shows a similar improvement.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  kernel/sched/fair.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69680158963f..19a92c48939f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7163,6 +7163,14 @@ done: __maybe_unused;
>         if (!rf)
>                 return NULL;
>
> +       /*
> +        * We have a woken up task pending here. No need to search for ones
> +        * elsewhere. This task will be enqueued the moment we unblock irqs
> +        * upon exiting the scheduler.
> +        */
> +       if (rq->ttwu_pending)
> +               return NULL;

Would it be better to put this check at the beg of newidle_balance() ?
If prev is not a cfs task, we never reach this point but instead use the path:
class->balance => balance_fair => newidle_balance

and we will not check for rq->ttwu_pending

> +
>         new_tasks = newidle_balance(rq, rf);
>
>         /*
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>                  * Stop searching for tasks to pull if there are
>                  * now runnable tasks on this rq.
>                  */
> -               if (pulled_task || this_rq->nr_running > 0)
> +               if (pulled_task || this_rq->nr_running > 0 ||
> +                                               this_rq->ttwu_pending)
>                         break;
>         }
>         rcu_read_unlock();
> --
> 2.25.4
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending
  2021-04-19 11:22 ` Valentin Schneider
@ 2021-04-19 16:46   ` Rik van Riel
  0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2021-04-19 16:46 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: kernel-team, Peter Zijlstra, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Mel Gorman

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

On Mon, 2021-04-19 at 12:22 +0100, Valentin Schneider wrote:
> On 18/04/21 22:17, Rik van Riel wrote:
> > @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq
> > *this_rq, struct rq_flags *rf)
> >                * Stop searching for tasks to pull if there are
> >                * now runnable tasks on this rq.
> >                */
> > -		if (pulled_task || this_rq->nr_running > 0)
> > +		if (pulled_task || this_rq->nr_running > 0 ||
> > +						this_rq->ttwu_pending)
> >                       break;
> 
> I thought newidle_balance() would already handle this by checking
> idle_cpu(), but that can't work due to rq->curr never being rq->idle
> here
> (we're trying very hard to prevent this!).
> 
> Would there be any point in bunching up these two checks from
> idle_cpu()
> into an inline helper and reusing it here?

I'm not sure, all the sched classes seem to have their own
magic in the balance functions, and there are enough special
situations out there that we might only be able to consolidate
a few callsites, not the rest.

Also, it turns out newidle_balance needs a different return
value depending on whether we have ->ttwu_pending, a pulled
task, or a queued realtime task...

I'll send v2 without any considation here, since I cannot
figure out a way to make things better here :)

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-19 16:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19  2:17 [PATCH] sched,fair: skip newidle_balance if a wakeup is pending Rik van Riel
2021-04-19  6:28 ` kernel test robot
2021-04-19 10:02 ` Peter Zijlstra
2021-04-19 11:22 ` Valentin Schneider
2021-04-19 16:46   ` Rik van Riel
2021-04-19 12:12 ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).