linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: sanitize vruntime of entity being placed
@ 2023-01-27 16:32 Roman Kagan
  2023-01-28  6:42 ` kernel test robot
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Roman Kagan @ 2023-01-27 16:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Daniel Bristot de Oliveira, Ben Segall, Zhang Qiao, Ingo Molnar,
	Waiman Long, Dietmar Eggemann, Peter Zijlstra,
	Valentin Schneider, Steven Rostedt, Vincent Guittot, Mel Gorman,
	Juri Lelli

From: Zhang Qiao <zhangqiao22@huawei.com>

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq->min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow.  This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for much longer than the characteristic sheduler time
scale.

[rkagan: formatted, adjusted commit log, comments, cutoff value]
Co-developed-by: Roman Kagan <rkagan@amazon.de>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
@zhangqiao22, I took the liberty to put you as the author of the patch,
as this is essentially what you posted for discussion, with minor
tweaks.  Please stamp with your s-o-b if you're ok with it.

 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f8736991427..d6cf131ebb0b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4656,6 +4656,7 @@ static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
+	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		vruntime -= thresh;
 	}
 
-	/* ensure we never gain time by being placed backwards. */
-	se->vruntime = max_vruntime(se->vruntime, vruntime);
+	/*
+	 * Pull vruntime of the entity being placed to the base level of
+	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
+	 * slept for a long time, don't even try to compare its vruntime with
+	 * the base as it may be too far off and the comparison may get
+	 * inversed due to s64 overflow.
+	 */
+	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+	if ((s64)sleep_time > 60 * NSEC_PER_SEC)
+		se->vruntime = vruntime;
+	else
+		se->vruntime = max_vruntime(se->vruntime, vruntime);
 }
 
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
-- 
2.34.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-27 16:32 [PATCH] sched/fair: sanitize vruntime of entity being placed Roman Kagan
@ 2023-01-28  6:42 ` kernel test robot
  2023-01-28  7:45 ` kernel test robot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2023-01-28  6:42 UTC (permalink / raw)
  To: Roman Kagan, linux-kernel
  Cc: oe-kbuild-all, Daniel Bristot de Oliveira, Ben Segall,
	Zhang Qiao, Ingo Molnar, Waiman Long, Dietmar Eggemann,
	Peter Zijlstra, Valentin Schneider, Steven Rostedt,
	Vincent Guittot, Mel Gorman, Juri Lelli

Hi Roman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.2-rc5 next-20230127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
patch link:    https://lore.kernel.org/r/20230127163230.3339408-1-rkagan%40amazon.de
patch subject: [PATCH] sched/fair: sanitize vruntime of entity being placed
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230128/202301281410.4cvSCADA-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2db31a18bcb88c280481672d7721f7e003d8df5a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
        git checkout 2db31a18bcb88c280481672d7721f7e003d8df5a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/sched/fair.c:688:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
     688 | int sched_update_scaling(void)
         |     ^~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c: In function 'place_entity':
>> kernel/sched/fair.c:4697:34: warning: integer overflow in expression of type 'long int' results in '-129542144' [-Woverflow]
    4697 |         if ((s64)sleep_time > 60 * NSEC_PER_SEC)
         |                                  ^


vim +4697 kernel/sched/fair.c

  4654	
  4655	static void
  4656	place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
  4657	{
  4658		u64 vruntime = cfs_rq->min_vruntime;
  4659		u64 sleep_time;
  4660	
  4661		/*
  4662		 * The 'current' period is already promised to the current tasks,
  4663		 * however the extra weight of the new task will slow them down a
  4664		 * little, place the new task so that it fits in the slot that
  4665		 * stays open at the end.
  4666		 */
  4667		if (initial && sched_feat(START_DEBIT))
  4668			vruntime += sched_vslice(cfs_rq, se);
  4669	
  4670		/* sleeps up to a single latency don't count. */
  4671		if (!initial) {
  4672			unsigned long thresh;
  4673	
  4674			if (se_is_idle(se))
  4675				thresh = sysctl_sched_min_granularity;
  4676			else
  4677				thresh = sysctl_sched_latency;
  4678	
  4679			/*
  4680			 * Halve their sleep time's effect, to allow
  4681			 * for a gentler effect of sleepers:
  4682			 */
  4683			if (sched_feat(GENTLE_FAIR_SLEEPERS))
  4684				thresh >>= 1;
  4685	
  4686			vruntime -= thresh;
  4687		}
  4688	
  4689		/*
  4690		 * Pull vruntime of the entity being placed to the base level of
  4691		 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
  4692		 * slept for a long time, don't even try to compare its vruntime with
  4693		 * the base as it may be too far off and the comparison may get
  4694		 * inversed due to s64 overflow.
  4695		 */
  4696		sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> 4697		if ((s64)sleep_time > 60 * NSEC_PER_SEC)
  4698			se->vruntime = vruntime;
  4699		else
  4700			se->vruntime = max_vruntime(se->vruntime, vruntime);
  4701	}
  4702	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-27 16:32 [PATCH] sched/fair: sanitize vruntime of entity being placed Roman Kagan
  2023-01-28  6:42 ` kernel test robot
@ 2023-01-28  7:45 ` kernel test robot
  2023-01-28 10:41 ` Zhang Qiao
  2023-01-28 17:27 ` Chen Yu
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2023-01-28  7:45 UTC (permalink / raw)
  To: Roman Kagan, linux-kernel
  Cc: llvm, oe-kbuild-all, Daniel Bristot de Oliveira, Ben Segall,
	Zhang Qiao, Ingo Molnar, Waiman Long, Dietmar Eggemann,
	Peter Zijlstra, Valentin Schneider, Steven Rostedt,
	Vincent Guittot, Mel Gorman, Juri Lelli

Hi Roman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.2-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
patch link:    https://lore.kernel.org/r/20230127163230.3339408-1-rkagan%40amazon.de
patch subject: [PATCH] sched/fair: sanitize vruntime of entity being placed
config: hexagon-randconfig-r041-20230123 (https://download.01.org/0day-ci/archive/20230128/202301281536.CiaoBlBO-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 4196ca3278f78c6e19246e54ab0ecb364e37d66a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2db31a18bcb88c280481672d7721f7e003d8df5a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
        git checkout 2db31a18bcb88c280481672d7721f7e003d8df5a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/sched/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/sched/fair.c:28:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from kernel/sched/fair.c:28:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from kernel/sched/fair.c:28:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> kernel/sched/fair.c:4697:27: warning: overflow in expression; result is -129542144 with type 'long' [-Winteger-overflow]
           if ((s64)sleep_time > 60 * NSEC_PER_SEC)
                                    ^
   kernel/sched/fair.c:6078:6: warning: no previous prototype for function 'init_cfs_bandwidth' [-Wmissing-prototypes]
   void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
        ^
   kernel/sched/fair.c:6078:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
   ^
   static 
   kernel/sched/fair.c:12498:6: warning: no previous prototype for function 'free_fair_sched_group' [-Wmissing-prototypes]
   void free_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:12498:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void free_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:12500:5: warning: no previous prototype for function 'alloc_fair_sched_group' [-Wmissing-prototypes]
   int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
       ^
   kernel/sched/fair.c:12500:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
   ^
   static 
   kernel/sched/fair.c:12505:6: warning: no previous prototype for function 'online_fair_sched_group' [-Wmissing-prototypes]
   void online_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:12505:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void online_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:12507:6: warning: no previous prototype for function 'unregister_fair_sched_group' [-Wmissing-prototypes]
   void unregister_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:12507:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void unregister_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:535:20: warning: unused function 'list_del_leaf_cfs_rq' [-Wunused-function]
   static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
                      ^
   kernel/sched/fair.c:556:19: warning: unused function 'tg_is_idle' [-Wunused-function]
   static inline int tg_is_idle(struct task_group *tg)
                     ^
   kernel/sched/fair.c:6051:20: warning: unused function 'cfs_bandwidth_used' [-Wunused-function]
   static inline bool cfs_bandwidth_used(void)
                      ^
   kernel/sched/fair.c:6059:20: warning: unused function 'sync_throttle' [-Wunused-function]
   static inline void sync_throttle(struct task_group *tg, int cpu) {}
                      ^
   kernel/sched/fair.c:6084:37: warning: unused function 'tg_cfs_bandwidth' [-Wunused-function]
   static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
                                       ^
   kernel/sched/fair.c:6088:20: warning: unused function 'destroy_cfs_bandwidth' [-Wunused-function]
   static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
                      ^
   18 warnings generated.


vim +/long +4697 kernel/sched/fair.c

  4654	
  4655	static void
  4656	place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
  4657	{
  4658		u64 vruntime = cfs_rq->min_vruntime;
  4659		u64 sleep_time;
  4660	
  4661		/*
  4662		 * The 'current' period is already promised to the current tasks,
  4663		 * however the extra weight of the new task will slow them down a
  4664		 * little, place the new task so that it fits in the slot that
  4665		 * stays open at the end.
  4666		 */
  4667		if (initial && sched_feat(START_DEBIT))
  4668			vruntime += sched_vslice(cfs_rq, se);
  4669	
  4670		/* sleeps up to a single latency don't count. */
  4671		if (!initial) {
  4672			unsigned long thresh;
  4673	
  4674			if (se_is_idle(se))
  4675				thresh = sysctl_sched_min_granularity;
  4676			else
  4677				thresh = sysctl_sched_latency;
  4678	
  4679			/*
  4680			 * Halve their sleep time's effect, to allow
  4681			 * for a gentler effect of sleepers:
  4682			 */
  4683			if (sched_feat(GENTLE_FAIR_SLEEPERS))
  4684				thresh >>= 1;
  4685	
  4686			vruntime -= thresh;
  4687		}
  4688	
  4689		/*
  4690		 * Pull vruntime of the entity being placed to the base level of
  4691		 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
  4692		 * slept for a long time, don't even try to compare its vruntime with
  4693		 * the base as it may be too far off and the comparison may get
  4694		 * inversed due to s64 overflow.
  4695		 */
  4696		sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> 4697		if ((s64)sleep_time > 60 * NSEC_PER_SEC)
  4698			se->vruntime = vruntime;
  4699		else
  4700			se->vruntime = max_vruntime(se->vruntime, vruntime);
  4701	}
  4702	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-27 16:32 [PATCH] sched/fair: sanitize vruntime of entity being placed Roman Kagan
  2023-01-28  6:42 ` kernel test robot
  2023-01-28  7:45 ` kernel test robot
@ 2023-01-28 10:41 ` Zhang Qiao
  2023-01-28 17:27 ` Chen Yu
  3 siblings, 0 replies; 7+ messages in thread
From: Zhang Qiao @ 2023-01-28 10:41 UTC (permalink / raw)
  To: Roman Kagan, linux-kernel
  Cc: Daniel Bristot de Oliveira, Ben Segall, Ingo Molnar, Waiman Long,
	Dietmar Eggemann, Peter Zijlstra, Valentin Schneider,
	Steven Rostedt, Vincent Guittot, Mel Gorman, Juri Lelli



在 2023/1/28 0:32, Roman Kagan 写道:
> From: Zhang Qiao <zhangqiao22@huawei.com>
> 
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
> 
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow.  This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
> 
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for much longer than the characteristic sheduler time
> scale.
> 


Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>


> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <rkagan@amazon.de>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>
> ---
> @zhangqiao22, I took the liberty to put you as the author of the patch,
> as this is essentially what you posted for discussion, with minor
> tweaks.  Please stamp with your s-o-b if you're ok with it.
> 
>  kernel/sched/fair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..d6cf131ebb0b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  {
>  	u64 vruntime = cfs_rq->min_vruntime;
> +	u64 sleep_time;
>  
>  	/*
>  	 * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  		vruntime -= thresh;
>  	}
>  
> -	/* ensure we never gain time by being placed backwards. */
> -	se->vruntime = max_vruntime(se->vruntime, vruntime);
> +	/*
> +	 * Pull vruntime of the entity being placed to the base level of
> +	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
> +	 * slept for a long time, don't even try to compare its vruntime with
> +	 * the base as it may be too far off and the comparison may get
> +	 * inversed due to s64 overflow.
> +	 */
> +	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> +	if ((s64)sleep_time > 60 * NSEC_PER_SEC)

In order to avoid overflowing, it'd better be "60LL * NSEC_PER_SEC"

Thanks,
Qiao.


> +		se->vruntime = vruntime;
> +	else
> +		se->vruntime = max_vruntime(se->vruntime, vruntime);
>  }
>  
>  static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-27 16:32 [PATCH] sched/fair: sanitize vruntime of entity being placed Roman Kagan
                   ` (2 preceding siblings ...)
  2023-01-28 10:41 ` Zhang Qiao
@ 2023-01-28 17:27 ` Chen Yu
  2023-01-29  1:27   ` Zhang Qiao
  3 siblings, 1 reply; 7+ messages in thread
From: Chen Yu @ 2023-01-28 17:27 UTC (permalink / raw)
  To: Roman Kagan
  Cc: linux-kernel, Daniel Bristot de Oliveira, Ben Segall, Zhang Qiao,
	Ingo Molnar, Waiman Long, Dietmar Eggemann, Peter Zijlstra,
	Valentin Schneider, Steven Rostedt, Vincent Guittot, Mel Gorman,
	Juri Lelli

Hi Roman, Qiao,
On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
> From: Zhang Qiao <zhangqiao22@huawei.com>
> 
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
> 
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow.  This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
Looks interesting,
case 1:
  se->vruntime = 1, cfs_rq->min_vruntime = ULONG_MAX
  ==> max = 1
case 2:
  se->vruntime = 1, cfs_rq->min_vruntime = LONG_MAX
  ==> max = LONG_MAX

May I know if the issue you described above is in case 1? We want
the max to be ULONG_MAX but it returns 1 because of s64
comparison? Then max = 1 is incorrectly used as se's vruntime?
Could you please elaborate a little more about this issue?
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for much longer than the characteristic sheduler time
> scale.
> 
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <rkagan@amazon.de>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>
> ---
> @zhangqiao22, I took the liberty to put you as the author of the patch,
> as this is essentially what you posted for discussion, with minor
> tweaks.  Please stamp with your s-o-b if you're ok with it.
> 
>  kernel/sched/fair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..d6cf131ebb0b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  {
>  	u64 vruntime = cfs_rq->min_vruntime;
> +	u64 sleep_time;
>  
>  	/*
>  	 * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  		vruntime -= thresh;
>  	}
>  
> -	/* ensure we never gain time by being placed backwards. */
> -	se->vruntime = max_vruntime(se->vruntime, vruntime);
> +	/*
> +	 * Pull vruntime of the entity being placed to the base level of
> +	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
> +	 * slept for a long time, don't even try to compare its vruntime with
> +	 * the base as it may be too far off and the comparison may get
> +	 * inversed due to s64 overflow.
> +	 */
> +	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),
then place_entity() in invoked here, I'm not sure if sleep_time above
could reflect the real sleep time. Maybe something like:
rq_clock_task(rq_of(cfs_rq)) - se->time_stamp_dequeued ?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-28 17:27 ` Chen Yu
@ 2023-01-29  1:27   ` Zhang Qiao
  2023-01-30 13:15     ` Chen Yu
  0 siblings, 1 reply; 7+ messages in thread
From: Zhang Qiao @ 2023-01-29  1:27 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-kernel, Roman Kagan, Daniel Bristot de Oliveira,
	Ben Segall, Ingo Molnar, Waiman Long, Dietmar Eggemann,
	Peter Zijlstra, Valentin Schneider, Steven Rostedt,
	Vincent Guittot, Mel Gorman, Juri Lelli

Hi, Chenyu,

在 2023/1/29 1:27, Chen Yu 写道:
> Hi Roman, Qiao,
> On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
>> From: Zhang Qiao <zhangqiao22@huawei.com>
>>
>> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
>> to the base level (around cfs_rq->min_vruntime), so that the entity
>> doesn't gain extra boost when placed backwards.
>>
>> However, if the entity being placed wasn't executed for a long time, its
>> vruntime may get too far behind (e.g. while cfs_rq was executing a
>> low-weight hog), which can inverse the vruntime comparison due to s64
>> overflow.  This results in the entity being placed with its original
>> vruntime way forwards, so that it will effectively never get to the cpu.
>>
> Looks interesting,
> case 1:
>   se->vruntime = 1, cfs_rq->min_vruntime = ULONG_MAX
>   ==> max = 1
> case 2:
>   se->vruntime = 1, cfs_rq->min_vruntime = LONG_MAX
>   ==> max = LONG_MAX
> 
> May I know if the issue you described above is in case 1? We want
> the max to be ULONG_MAX but it returns 1 because of s64
> comparison? Then max = 1 is incorrectly used as se's vruntime?
> Could you please elaborate a little more about this issue?

Yes, the issue is in case 1.

For more detailed discussion, can see https://lkml.org/lkml/2022/12/21/435.

>> To prevent that, ignore the vruntime of the entity being placed if it
>> didn't execute for much longer than the characteristic sheduler time
>> scale.
>>
>> [rkagan: formatted, adjusted commit log, comments, cutoff value]
>> Co-developed-by: Roman Kagan <rkagan@amazon.de>
>> Signed-off-by: Roman Kagan <rkagan@amazon.de>
>> ---
>> @zhangqiao22, I took the liberty to put you as the author of the patch,
>> as this is essentially what you posted for discussion, with minor
>> tweaks.  Please stamp with your s-o-b if you're ok with it.
>>
>>  kernel/sched/fair.c | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 0f8736991427..d6cf131ebb0b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4656,6 +4656,7 @@ static void
>>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>  {
>>  	u64 vruntime = cfs_rq->min_vruntime;
>> +	u64 sleep_time;
>>  
>>  	/*
>>  	 * The 'current' period is already promised to the current tasks,
>> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>  		vruntime -= thresh;
>>  	}
>>  
>> -	/* ensure we never gain time by being placed backwards. */
>> -	se->vruntime = max_vruntime(se->vruntime, vruntime);
>> +	/*
>> +	 * Pull vruntime of the entity being placed to the base level of
>> +	 * cfs_rq, to prevent boosting it if placed backwards.  If the entity
>> +	 * slept for a long time, don't even try to compare its vruntime with
>> +	 * the base as it may be too far off and the comparison may get
>> +	 * inversed due to s64 overflow.
>> +	 */
>> +	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),

When a task go to sleep, se->exec_start will update at dequeue_entity()->update_curr(cfs_rq).
And enqueue_entity()->update_curr(cfs_rq) just update current se.

Thank,
Qiao.

> then place_entity() in invoked here, I'm not sure if sleep_time above
> could reflect the real sleep time. Maybe something like:
> rq_clock_task(rq_of(cfs_rq)) - se->time_stamp_dequeued ?
> 
> thanks,
> Chenyu
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/fair: sanitize vruntime of entity being placed
  2023-01-29  1:27   ` Zhang Qiao
@ 2023-01-30 13:15     ` Chen Yu
  0 siblings, 0 replies; 7+ messages in thread
From: Chen Yu @ 2023-01-30 13:15 UTC (permalink / raw)
  To: Zhang Qiao
  Cc: linux-kernel, Roman Kagan, Daniel Bristot de Oliveira,
	Ben Segall, Ingo Molnar, Waiman Long, Dietmar Eggemann,
	Peter Zijlstra, Valentin Schneider, Steven Rostedt,
	Vincent Guittot, Mel Gorman, Juri Lelli

On 2023-01-29 at 09:27:07 +0800, Zhang Qiao wrote:
> Hi, Chenyu,
> 
> On 2023/1/29 1:27, Chen Yu wrote:
> > Hi Roman, Qiao,
> > On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
> >> From: Zhang Qiao <zhangqiao22@huawei.com>
[snip]
> > If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),
> 
> When a task go to sleep, se->exec_start will update at dequeue_entity()->update_curr(cfs_rq).
> And enqueue_entity()->update_curr(cfs_rq) just update current se.
>
You are right. update_curr(cfs_rq) only updates the current running one and the se
has not been queued yet.

thanks,
Chenyu 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-01-30 13:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-27 16:32 [PATCH] sched/fair: sanitize vruntime of entity being placed Roman Kagan
2023-01-28  6:42 ` kernel test robot
2023-01-28  7:45 ` kernel test robot
2023-01-28 10:41 ` Zhang Qiao
2023-01-28 17:27 ` Chen Yu
2023-01-29  1:27   ` Zhang Qiao
2023-01-30 13:15     ` Chen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).