* [RFC] sched/rt: Fix RT (group) throttling with nohz_full
@ 2021-02-01 9:34 Jonathan Schwender
2021-02-01 14:30 ` kernel test robot
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Jonathan Schwender @ 2021-02-01 9:34 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
linux-kernel, linux-rt-users
If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
do_sched_rt_period_timer may be called on a housekeeping CPU,
which would not service the isolated CPU for a non-root cgroup
(requires a kernel with RT_GROUP_SCHEDULING).
This causes RT tasks in a non-root cgroup to get throttled
indefinitely (unless throttling is disabled) once the timer has
been moved to a housekeeping CPU.
To fix this, housekeeping CPUs now service all online CPUs
if HK_FLAG_TIMER (nohz_full) is set.
I'm not really sure how this relates to Mike Galbraith previous
commit e221d028bb08 ("sched,rt: fix isolated CPUs leaving root_task_group
indefinitely throttled"), (which is dated before the housekeeping changes,)
so I'm posting this as an RFC.
Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
kernel/sched/rt.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 49ec096a8aa1..3185e00b828a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -865,9 +865,16 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
* isolation is really required, the user will turn the throttle
* off to kill the perturbations it causes anyway. Meanwhile,
* this maintains functionality for boot and/or troubleshooting.
+ * If nohz_full is active and the timer was offloaded to a
+ * housekeeping CPU, sched_rt_period_mask() will not contain
+ * the isolated CPU. To prevent indefinite throttling of tasks
+ * on isolated CPUs, housekeeping CPUs service all online CPUs.
*/
- if (rt_b == &root_task_group.rt_bandwidth)
+ if (rt_b == &root_task_group.rt_bandwidth
+ || (housekeeping_enabled(HK_FLAG_TIMER)
+ && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
span = cpu_online_mask;
+ }
#endif
for_each_cpu(i, span) {
int enqueue = 0;
--
2.29.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC] sched/rt: Fix RT (group) throttling with nohz_full
2021-02-01 9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
@ 2021-02-01 14:30 ` kernel test robot
2021-02-01 22:54 ` kernel test robot
2021-02-02 9:00 ` [RFC v2] " Jonathan Schwender
2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-02-01 14:30 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 5693 bytes --]
Hi Jonathan,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7a976f77bb962ce9486e09eb839aa135619b54f3
config: h8300-randconfig-r022-20210201 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/f2f23b24036429e0d47deb121f16367a2444247e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
git checkout f2f23b24036429e0d47deb121f16367a2444247e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
kernel/sched/rt.c:669:6: warning: no previous prototype for 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
669 | bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/sched/rt.c: In function 'do_sched_rt_period_timer':
>> kernel/sched/rt.c:876:33: error: 'struct rq' has no member named 'cpu'
876 | && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
| ^~
vim +876 kernel/sched/rt.c
853
854 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
855 {
856 int i, idle = 1, throttled = 0;
857 const struct cpumask *span;
858
859 span = sched_rt_period_mask();
860 #ifdef CONFIG_RT_GROUP_SCHED
861 /*
862 * FIXME: isolated CPUs should really leave the root task group,
863 * whether they are isolcpus or were isolated via cpusets, lest
864 * the timer run on a CPU which does not service all runqueues,
865 * potentially leaving other CPUs indefinitely throttled. If
866 * isolation is really required, the user will turn the throttle
867 * off to kill the perturbations it causes anyway. Meanwhile,
868 * this maintains functionality for boot and/or troubleshooting.
869 * If nohz_full is active and the timer was offloaded to a
870 * housekeeping CPU, sched_rt_period_mask() will not contain
871 * the isolated CPU. To prevent indefinite throttling of tasks
872 * on isolated CPUs, housekeeping CPUs service all online CPUs.
873 */
874 if (rt_b == &root_task_group.rt_bandwidth
875 || (housekeeping_enabled(HK_FLAG_TIMER)
> 876 && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
877 span = cpu_online_mask;
878 }
879 #endif
880 for_each_cpu(i, span) {
881 int enqueue = 0;
882 struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
883 struct rq *rq = rq_of_rt_rq(rt_rq);
884 int skip;
885
886 /*
887 * When span == cpu_online_mask, taking each rq->lock
888 * can be time-consuming. Try to avoid it when possible.
889 */
890 raw_spin_lock(&rt_rq->rt_runtime_lock);
891 if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF)
892 rt_rq->rt_runtime = rt_b->rt_runtime;
893 skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
894 raw_spin_unlock(&rt_rq->rt_runtime_lock);
895 if (skip)
896 continue;
897
898 raw_spin_lock(&rq->lock);
899 update_rq_clock(rq);
900
901 if (rt_rq->rt_time) {
902 u64 runtime;
903
904 raw_spin_lock(&rt_rq->rt_runtime_lock);
905 if (rt_rq->rt_throttled)
906 balance_runtime(rt_rq);
907 runtime = rt_rq->rt_runtime;
908 rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);
909 if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
910 rt_rq->rt_throttled = 0;
911 enqueue = 1;
912
913 /*
914 * When we're idle and a woken (rt) task is
915 * throttled check_preempt_curr() will set
916 * skip_update and the time between the wakeup
917 * and this unthrottle will get accounted as
918 * 'runtime'.
919 */
920 if (rt_rq->rt_nr_running && rq->curr == rq->idle)
921 rq_clock_cancel_skipupdate(rq);
922 }
923 if (rt_rq->rt_time || rt_rq->rt_nr_running)
924 idle = 0;
925 raw_spin_unlock(&rt_rq->rt_runtime_lock);
926 } else if (rt_rq->rt_nr_running) {
927 idle = 0;
928 if (!rt_rq_throttled(rt_rq))
929 enqueue = 1;
930 }
931 if (rt_rq->rt_throttled)
932 throttled = 1;
933
934 if (enqueue)
935 sched_rt_rq_enqueue(rt_rq);
936 raw_spin_unlock(&rq->lock);
937 }
938
939 if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
940 return 1;
941
942 return idle;
943 }
944
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27325 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] sched/rt: Fix RT (group) throttling with nohz_full
2021-02-01 9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
2021-02-01 14:30 ` kernel test robot
@ 2021-02-01 22:54 ` kernel test robot
2021-02-02 9:00 ` [RFC v2] " Jonathan Schwender
2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-02-01 22:54 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 6051 bytes --]
Hi Jonathan,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 7a976f77bb962ce9486e09eb839aa135619b54f3
config: x86_64-randconfig-a013-20210201 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 275c6af7d7f1ed63a03d05b4484413e447133269)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/f2f23b24036429e0d47deb121f16367a2444247e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jonathan-Schwender/sched-rt-Fix-RT-group-throttling-with-nohz_full/20210201-173818
git checkout f2f23b24036429e0d47deb121f16367a2444247e
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
kernel/sched/rt.c:669:6: warning: no previous prototype for function 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
^
kernel/sched/rt.c:669:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
^
static
>> kernel/sched/rt.c:876:35: error: no member named 'cpu' in 'struct rq'
&& housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
~~~~~~~~~ ^
1 warning and 1 error generated.
vim +876 kernel/sched/rt.c
853
854 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
855 {
856 int i, idle = 1, throttled = 0;
857 const struct cpumask *span;
858
859 span = sched_rt_period_mask();
860 #ifdef CONFIG_RT_GROUP_SCHED
861 /*
862 * FIXME: isolated CPUs should really leave the root task group,
863 * whether they are isolcpus or were isolated via cpusets, lest
864 * the timer run on a CPU which does not service all runqueues,
865 * potentially leaving other CPUs indefinitely throttled. If
866 * isolation is really required, the user will turn the throttle
867 * off to kill the perturbations it causes anyway. Meanwhile,
868 * this maintains functionality for boot and/or troubleshooting.
869 * If nohz_full is active and the timer was offloaded to a
870 * housekeeping CPU, sched_rt_period_mask() will not contain
871 * the isolated CPU. To prevent indefinite throttling of tasks
872 * on isolated CPUs, housekeeping CPUs service all online CPUs.
873 */
874 if (rt_b == &root_task_group.rt_bandwidth
875 || (housekeeping_enabled(HK_FLAG_TIMER)
> 876 && housekeeping_cpu(this_rq()->cpu, HK_FLAG_TIMER))) {
877 span = cpu_online_mask;
878 }
879 #endif
880 for_each_cpu(i, span) {
881 int enqueue = 0;
882 struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
883 struct rq *rq = rq_of_rt_rq(rt_rq);
884 int skip;
885
886 /*
887 * When span == cpu_online_mask, taking each rq->lock
888 * can be time-consuming. Try to avoid it when possible.
889 */
890 raw_spin_lock(&rt_rq->rt_runtime_lock);
891 if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF)
892 rt_rq->rt_runtime = rt_b->rt_runtime;
893 skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
894 raw_spin_unlock(&rt_rq->rt_runtime_lock);
895 if (skip)
896 continue;
897
898 raw_spin_lock(&rq->lock);
899 update_rq_clock(rq);
900
901 if (rt_rq->rt_time) {
902 u64 runtime;
903
904 raw_spin_lock(&rt_rq->rt_runtime_lock);
905 if (rt_rq->rt_throttled)
906 balance_runtime(rt_rq);
907 runtime = rt_rq->rt_runtime;
908 rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);
909 if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
910 rt_rq->rt_throttled = 0;
911 enqueue = 1;
912
913 /*
914 * When we're idle and a woken (rt) task is
915 * throttled check_preempt_curr() will set
916 * skip_update and the time between the wakeup
917 * and this unthrottle will get accounted as
918 * 'runtime'.
919 */
920 if (rt_rq->rt_nr_running && rq->curr == rq->idle)
921 rq_clock_cancel_skipupdate(rq);
922 }
923 if (rt_rq->rt_time || rt_rq->rt_nr_running)
924 idle = 0;
925 raw_spin_unlock(&rt_rq->rt_runtime_lock);
926 } else if (rt_rq->rt_nr_running) {
927 idle = 0;
928 if (!rt_rq_throttled(rt_rq))
929 enqueue = 1;
930 }
931 if (rt_rq->rt_throttled)
932 throttled = 1;
933
934 if (enqueue)
935 sched_rt_rq_enqueue(rt_rq);
936 raw_spin_unlock(&rq->lock);
937 }
938
939 if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
940 return 1;
941
942 return idle;
943 }
944
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 40385 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC v2] sched/rt: Fix RT (group) throttling with nohz_full
2021-02-01 9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
2021-02-01 14:30 ` kernel test robot
2021-02-01 22:54 ` kernel test robot
@ 2021-02-02 9:00 ` Jonathan Schwender
2021-02-22 13:44 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 5+ messages in thread
From: Jonathan Schwender @ 2021-02-02 9:00 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
linux-kernel, linux-rt-users, kernel test robot
RFC v2 changes: Fix compile error if CONFIG_SMP is not set, which was
Reported-by: kernel test robot <lkp@intel.com>
I'm now using smp_processor_id() to get the ID of the CPU the timer
is running on.
If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
do_sched_rt_period_timer may be called on a housekeeping CPU,
which would not service the isolated CPU for a non-root cgroup
(requires a kernel with RT_GROUP_SCHEDULING).
This causes RT tasks in a non-root cgroup to get throttled
indefinitely (unless throttling is disabled) once the timer has
been moved to a housekeeping CPU.
To fix this, housekeeping CPUs now service all online CPUs
if HK_FLAG_TIMER (nohz_full) is set.
I'm not really sure how this relates to Mike Galbraith previous
commit e221d028bb08 ("sched,rt: fix isolated CPUs leaving root_task_group
indefinitely throttled"), (which is dated before the housekeeping changes,)
so I'm posting this as an RFC.
Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
kernel/sched/rt.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8f720b71d13d..879abb5de023 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -866,9 +866,16 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
* isolation is really required, the user will turn the throttle
* off to kill the perturbations it causes anyway. Meanwhile,
* this maintains functionality for boot and/or troubleshooting.
+ * If nohz_full is active and the timer was offloaded to a
+ * housekeeping CPU, sched_rt_period_mask() will not contain
+ * the isolated CPU. To prevent indefinite throttling of tasks
+ * on isolated CPUs, housekeeping CPUs service all online CPUs.
*/
- if (rt_b == &root_task_group.rt_bandwidth)
+ if (rt_b == &root_task_group.rt_bandwidth
+ || (housekeeping_enabled(HK_FLAG_TIMER)
+ && housekeeping_cpu(smp_processor_id(), HK_FLAG_TIMER))) {
span = cpu_online_mask;
+ }
#endif
for_each_cpu(i, span) {
int enqueue = 0;
--
2.29.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC v2] sched/rt: Fix RT (group) throttling with nohz_full
2021-02-02 9:00 ` [RFC v2] " Jonathan Schwender
@ 2021-02-22 13:44 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-02-22 13:44 UTC (permalink / raw)
To: Jonathan Schwender
Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, linux-kernel, linux-rt-users
On 2021-02-02 10:00:10 [+0100], Jonathan Schwender wrote:
> If nohz_full is enabled (more precisely HK_FLAG_TIMER is set), then
> do_sched_rt_period_timer may be called on a housekeeping CPU,
> which would not service the isolated CPU for a non-root cgroup
> (requires a kernel with RT_GROUP_SCHEDULING).
> This causes RT tasks in a non-root cgroup to get throttled
> indefinitely (unless throttling is disabled) once the timer has
> been moved to a housekeeping CPU.
> To fix this, housekeeping CPUs now service all online CPUs
> if HK_FLAG_TIMER (nohz_full) is set.
This originates from
https://lore.kernel.org/linux-rt-users/b07b6fc7-1a5f-0dcc-ca65-821de96cd8b4@gmail.com/
Could someone please take a look?
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-02-22 14:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 9:34 [RFC] sched/rt: Fix RT (group) throttling with nohz_full Jonathan Schwender
2021-02-01 14:30 ` kernel test robot
2021-02-01 22:54 ` kernel test robot
2021-02-02 9:00 ` [RFC v2] " Jonathan Schwender
2021-02-22 13:44 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.