All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] sched/rt: Fix RT (group) throttling with nohz_full
@ 2021-02-01  9:34 Jonathan Schwender
  2021-02-01 14:30 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jonathan Schwender @ 2021-02-01  9:34 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	linux-kernel, linux-rt-users

If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
do_sched_rt_period_timer may be called on a housekeeping CPU,
which would not service the isolated CPU for a non-root cgroup
(requires a kernel with RT_GROUP_SCHEDULING).
This causes RT tasks in a non-root cgroup to get throttled 
indefinitely (unless throttling is disabled) once the timer has 
been moved to a housekeeping CPU.
To fix this, housekeeping CPUs now service all online CPUs 
if HK_FLAG_TIMER (nohz_full) is set.

I'm not really sure how this relates to  Mike Galbraith previous
commit e221d028bb08 ("sched,rt: fix isolated CPUs leaving root_task_group
indefinitely throttled"), (which is dated before the housekeeping changes,)
so I'm posting this as an RFC.


Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
 kernel/sched/rt.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 49ec096a8aa1..3185e00b828a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -865,9 +865,16 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 	 * isolation is really required, the user will turn the throttle
 	 * off to kill the perturbations it causes anyway.  Meanwhile,
 	 * this maintains functionality for boot and/or troubleshooting.
+	 * If nohz_full is active and the timer was offloaded to a
+	 * housekeeping CPU, sched_rt_period_mask() will not contain
+	 * the isolated CPU. To prevent indefinite throttling of tasks
+	 * on isolated CPUs, housekeeping CPUs service all online CPUs.
 	 */
-	if (rt_b == &root_task_group.rt_bandwidth)
+	if (rt_b == &root_task_group.rt_bandwidth
+		|| (housekeeping_enabled(HK_FLAG_TIMER)
+			&& housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
 		span = cpu_online_mask;
+	}
 #endif
 	for_each_cpu(i, span) {
 		int enqueue = 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/rt: Fix RT (group) throttling with nohz_full
  2021-02-01  9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
@ 2021-02-01 14:30 ` kernel test robot
  2021-02-01 22:54 ` kernel test robot
  2021-02-02  9:00 ` [RFC v2] " Jonathan Schwender
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-02-01 14:30 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5693 bytes --]

Hi Jonathan,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7a976f77bb962ce9486e09eb839aa135619b54f3
config: h8300-randconfig-r022-20210201 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f2f23b24036429e0d47deb121f16367a2444247e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
        git checkout f2f23b24036429e0d47deb121f16367a2444247e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/rt.c:669:6: warning: no previous prototype for 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
     669 | bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/sched/rt.c: In function 'do_sched_rt_period_timer':
>> kernel/sched/rt.c:876:33: error: 'struct rq' has no member named 'cpu'
     876 |    && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
         |                                 ^~


vim +876 kernel/sched/rt.c

   853	
   854	static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
   855	{
   856		int i, idle = 1, throttled = 0;
   857		const struct cpumask *span;
   858	
   859		span = sched_rt_period_mask();
   860	#ifdef CONFIG_RT_GROUP_SCHED
   861		/*
   862		 * FIXME: isolated CPUs should really leave the root task group,
   863		 * whether they are isolcpus or were isolated via cpusets, lest
   864		 * the timer run on a CPU which does not service all runqueues,
   865		 * potentially leaving other CPUs indefinitely throttled.  If
   866		 * isolation is really required, the user will turn the throttle
   867		 * off to kill the perturbations it causes anyway.  Meanwhile,
   868		 * this maintains functionality for boot and/or troubleshooting.
   869		 * If nohz_full is active and the timer was offloaded to a
   870		 * housekeeping CPU, sched_rt_period_mask() will not contain
   871		 * the isolated CPU. To prevent indefinite throttling of tasks
   872		 * on isolated CPUs, housekeeping CPUs service all online CPUs.
   873		 */
   874		if (rt_b == &root_task_group.rt_bandwidth
   875			|| (housekeeping_enabled(HK_FLAG_TIMER)
 > 876				&& housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
   877			span = cpu_online_mask;
   878		}
   879	#endif
   880		for_each_cpu(i, span) {
   881			int enqueue = 0;
   882			struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
   883			struct rq *rq = rq_of_rt_rq(rt_rq);
   884			int skip;
   885	
   886			/*
   887			 * When span == cpu_online_mask, taking each rq->lock
   888			 * can be time-consuming. Try to avoid it when possible.
   889			 */
   890			raw_spin_lock(&rt_rq->rt_runtime_lock);
   891			if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF)
   892				rt_rq->rt_runtime = rt_b->rt_runtime;
   893			skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
   894			raw_spin_unlock(&rt_rq->rt_runtime_lock);
   895			if (skip)
   896				continue;
   897	
   898			raw_spin_lock(&rq->lock);
   899			update_rq_clock(rq);
   900	
   901			if (rt_rq->rt_time) {
   902				u64 runtime;
   903	
   904				raw_spin_lock(&rt_rq->rt_runtime_lock);
   905				if (rt_rq->rt_throttled)
   906					balance_runtime(rt_rq);
   907				runtime = rt_rq->rt_runtime;
   908				rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);
   909				if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
   910					rt_rq->rt_throttled = 0;
   911					enqueue = 1;
   912	
   913					/*
   914					 * When we're idle and a woken (rt) task is
   915					 * throttled check_preempt_curr() will set
   916					 * skip_update and the time between the wakeup
   917					 * and this unthrottle will get accounted as
   918					 * 'runtime'.
   919					 */
   920					if (rt_rq->rt_nr_running && rq->curr == rq->idle)
   921						rq_clock_cancel_skipupdate(rq);
   922				}
   923				if (rt_rq->rt_time || rt_rq->rt_nr_running)
   924					idle = 0;
   925				raw_spin_unlock(&rt_rq->rt_runtime_lock);
   926			} else if (rt_rq->rt_nr_running) {
   927				idle = 0;
   928				if (!rt_rq_throttled(rt_rq))
   929					enqueue = 1;
   930			}
   931			if (rt_rq->rt_throttled)
   932				throttled = 1;
   933	
   934			if (enqueue)
   935				sched_rt_rq_enqueue(rt_rq);
   936			raw_spin_unlock(&rq->lock);
   937		}
   938	
   939		if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
   940			return 1;
   941	
   942		return idle;
   943	}
   944	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27325 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] sched/rt: Fix RT (group) throttling with nohz_full
  2021-02-01  9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
  2021-02-01 14:30 ` kernel test robot
@ 2021-02-01 22:54 ` kernel test robot
  2021-02-02  9:00 ` [RFC v2] " Jonathan Schwender
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-02-01 22:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6051 bytes --]

Hi Jonathan,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7a976f77bb962ce9486e09eb839aa135619b54f3
config: x86_64-randconfig-a013-20210201 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 275c6af7d7f1ed63a03d05b4484413e447133269)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f2f23b24036429e0d47deb121f16367a2444247e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
        git checkout f2f23b24036429e0d47deb121f16367a2444247e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/rt.c:669:6: warning: no previous prototype for function 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
   bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
        ^
   kernel/sched/rt.c:669:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
   ^
   static 
>> kernel/sched/rt.c:876:35: error: no member named 'cpu' in 'struct rq'
                           && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
                                               ~~~~~~~~~  ^
   1 warning and 1 error generated.


vim +876 kernel/sched/rt.c

   853	
   854	static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
   855	{
   856		int i, idle = 1, throttled = 0;
   857		const struct cpumask *span;
   858	
   859		span = sched_rt_period_mask();
   860	#ifdef CONFIG_RT_GROUP_SCHED
   861		/*
   862		 * FIXME: isolated CPUs should really leave the root task group,
   863		 * whether they are isolcpus or were isolated via cpusets, lest
   864		 * the timer run on a CPU which does not service all runqueues,
   865		 * potentially leaving other CPUs indefinitely throttled.  If
   866		 * isolation is really required, the user will turn the throttle
   867		 * off to kill the perturbations it causes anyway.  Meanwhile,
   868		 * this maintains functionality for boot and/or troubleshooting.
   869		 * If nohz_full is active and the timer was offloaded to a
   870		 * housekeeping CPU, sched_rt_period_mask() will not contain
   871		 * the isolated CPU. To prevent indefinite throttling of tasks
   872		 * on isolated CPUs, housekeeping CPUs service all online CPUs.
   873		 */
   874		if (rt_b == &root_task_group.rt_bandwidth
   875			|| (housekeeping_enabled(HK_FLAG_TIMER)
 > 876				&& housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
   877			span = cpu_online_mask;
   878		}
   879	#endif
   880		for_each_cpu(i, span) {
   881			int enqueue = 0;
   882			struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
   883			struct rq *rq = rq_of_rt_rq(rt_rq);
   884			int skip;
   885	
   886			/*
   887			 * When span == cpu_online_mask, taking each rq->lock
   888			 * can be time-consuming. Try to avoid it when possible.
   889			 */
   890			raw_spin_lock(&rt_rq->rt_runtime_lock);
   891			if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF)
   892				rt_rq->rt_runtime = rt_b->rt_runtime;
   893			skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
   894			raw_spin_unlock(&rt_rq->rt_runtime_lock);
   895			if (skip)
   896				continue;
   897	
   898			raw_spin_lock(&rq->lock);
   899			update_rq_clock(rq);
   900	
   901			if (rt_rq->rt_time) {
   902				u64 runtime;
   903	
   904				raw_spin_lock(&rt_rq->rt_runtime_lock);
   905				if (rt_rq->rt_throttled)
   906					balance_runtime(rt_rq);
   907				runtime = rt_rq->rt_runtime;
   908				rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);
   909				if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
   910					rt_rq->rt_throttled = 0;
   911					enqueue = 1;
   912	
   913					/*
   914					 * When we're idle and a woken (rt) task is
   915					 * throttled check_preempt_curr() will set
   916					 * skip_update and the time between the wakeup
   917					 * and this unthrottle will get accounted as
   918					 * 'runtime'.
   919					 */
   920					if (rt_rq->rt_nr_running && rq->curr == rq->idle)
   921						rq_clock_cancel_skipupdate(rq);
   922				}
   923				if (rt_rq->rt_time || rt_rq->rt_nr_running)
   924					idle = 0;
   925				raw_spin_unlock(&rt_rq->rt_runtime_lock);
   926			} else if (rt_rq->rt_nr_running) {
   927				idle = 0;
   928				if (!rt_rq_throttled(rt_rq))
   929					enqueue = 1;
   930			}
   931			if (rt_rq->rt_throttled)
   932				throttled = 1;
   933	
   934			if (enqueue)
   935				sched_rt_rq_enqueue(rt_rq);
   936			raw_spin_unlock(&rq->lock);
   937		}
   938	
   939		if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
   940			return 1;
   941	
   942		return idle;
   943	}
   944	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 40385 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC v2] sched/rt: Fix RT (group) throttling with nohz_full
  2021-02-01  9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
  2021-02-01 14:30 ` kernel test robot
  2021-02-01 22:54 ` kernel test robot
@ 2021-02-02  9:00 ` Jonathan Schwender
  2021-02-22 13:44   ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 5+ messages in thread
From: Jonathan Schwender @ 2021-02-02  9:00 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	linux-kernel, linux-rt-users, kernel test robot


RFC v2 changes: Fix compile error if CONFIG_SMP is not set, which was
Reported-by: kernel test robot <lkp@intel.com>
I'm now using smp_processor_id() to get the ID of the CPU the timer 
is running on.


If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
do_sched_rt_period_timer may be called on a housekeeping CPU,
which would not service the isolated CPU for a non-root cgroup
(requires a kernel with RT_GROUP_SCHEDULING).
This causes RT tasks in a non-root cgroup to get throttled
indefinitely (unless throttling is disabled) once the timer has
been moved to a housekeeping CPU.
To fix this, housekeeping CPUs now service all online CPUs
if HK_FLAG_TIMER (nohz_full) is set.

I'm not really sure how this relates to  Mike Galbraith previous
commit e221d028bb08 ("sched,rt: fix isolated CPUs leaving root_task_group
indefinitely throttled"), (which is dated before the housekeeping changes,)
so I'm posting this as an RFC.

Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
 kernel/sched/rt.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8f720b71d13d..879abb5de023 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -866,9 +866,16 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 	 * isolation is really required, the user will turn the throttle
 	 * off to kill the perturbations it causes anyway.  Meanwhile,
 	 * this maintains functionality for boot and/or troubleshooting.
+	 * If nohz_full is active and the timer was offloaded to a
+	 * housekeeping CPU, sched_rt_period_mask() will not contain
+	 * the isolated CPU. To prevent indefinite throttling of tasks
+	 * on isolated CPUs, housekeeping CPUs service all online CPUs.
 	 */
-	if (rt_b == &root_task_group.rt_bandwidth)
+	if (rt_b == &root_task_group.rt_bandwidth
+		|| (housekeeping_enabled(HK_FLAG_TIMER)
+			&& housekeeping_cpu(smp_processor_id(), HK_FLAG_TIMER))) {
 		span = cpu_online_mask;
+	}
 #endif
 	for_each_cpu(i, span) {
 		int enqueue = 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC v2] sched/rt: Fix RT (group) throttling with nohz_full
  2021-02-02  9:00 ` [RFC v2] " Jonathan Schwender
@ 2021-02-22 13:44   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-02-22 13:44 UTC (permalink / raw)
  To: Jonathan Schwender
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, linux-rt-users

On 2021-02-02 10:00:10 [+0100], Jonathan Schwender wrote:
> If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
> do_sched_rt_period_timer may be called on a housekeeping CPU,
> which would not service the isolated CPU for a non-root cgroup
> (requires a kernel with RT_GROUP_SCHEDULING).
> This causes RT tasks in a non-root cgroup to get throttled
> indefinitely (unless throttling is disabled) once the timer has
> been moved to a housekeeping CPU.
> To fix this, housekeeping CPUs now service all online CPUs
> if HK_FLAG_TIMER (nohz_full) is set.

This originates from
   https://lore.kernel.org/linux-rt-users/b07b6fc7-1a5f-0dcc-ca65-821de96cd8b4@gmail.com/

Could someone please take a look?

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-22 14:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01  9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
2021-02-01 14:30 ` kernel test robot
2021-02-01 22:54 ` kernel test robot
2021-02-02  9:00 ` [RFC v2] " Jonathan Schwender
2021-02-22 13:44   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.