linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Xuewei Zhang <xueweiz@google.com>, Phil Auld <pauld@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Anton Blanchard <anton@ozlabs.org>,
	Ben Segall <bsegall@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.3 63/89] sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
Date: Fri, 18 Oct 2019 18:02:58 -0400	[thread overview]
Message-ID: <20191018220324.8165-63-sashal@kernel.org> (raw)
In-Reply-To: <20191018220324.8165-1-sashal@kernel.org>

From: Xuewei Zhang <xueweiz@google.com>

[ Upstream commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 ]

The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:

  normalized_cfs_quota() = [(quota_us << 20) / period_us]

If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.

See below example:

A userspace container manager (kubelet) does three operations:

 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
 2) Create a few children cgroups.
 3) Set quota to 1,000us and period to 10,000us on a child cgroup.

These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:

  new_quota: 1148437ns,   1148us
 new_period: 11484375ns, 11484us

And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.

Scaling them by a factor of 2 will fix the problem.

Tested-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Phil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/sched/fair.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 86cfc5d5129ce..16b5d29bd7300 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4995,20 +4995,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (++count > 3) {
 			u64 new, old = ktime_to_ns(cfs_b->period);
 
-			new = (old * 147) / 128; /* ~115% */
-			new = min(new, max_cfs_quota_period);
-
-			cfs_b->period = ns_to_ktime(new);
-
-			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
-			cfs_b->quota *= new;
-			cfs_b->quota = div64_u64(cfs_b->quota, old);
-
-			pr_warn_ratelimited(
-	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
-				smp_processor_id(),
-				div_u64(new, NSEC_PER_USEC),
-				div_u64(cfs_b->quota, NSEC_PER_USEC));
+			/*
+			 * Grow period by a factor of 2 to avoid losing precision.
+			 * Precision loss in the quota/period ratio can cause __cfs_schedulable
+			 * to fail.
+			 */
+			new = old * 2;
+			if (new < max_cfs_quota_period) {
+				cfs_b->period = ns_to_ktime(new);
+				cfs_b->quota *= 2;
+
+				pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n",
+					smp_processor_id(),
+					div_u64(new, NSEC_PER_USEC),
+					div_u64(cfs_b->quota, NSEC_PER_USEC));
+			} else {
+				pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n",
+					smp_processor_id(),
+					div_u64(old, NSEC_PER_USEC),
+					div_u64(cfs_b->quota, NSEC_PER_USEC));
+			}
 
 			/* reset count so we don't come right back in here */
 			count = 0;
-- 
2.20.1


  parent reply	other threads:[~2019-10-18 22:04 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 22:01 [PATCH AUTOSEL 5.3 01/89] iio: adc: meson_saradc: Fix memory allocation order Sasha Levin
2019-10-18 22:01 ` [PATCH AUTOSEL 5.3 02/89] iio: fix center temperature of bmc150-accel-core Sasha Levin
2019-10-18 22:01 ` [PATCH AUTOSEL 5.3 03/89] libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature Sasha Levin
2019-10-18 22:01 ` [PATCH AUTOSEL 5.3 04/89] perf tests: Avoid raising SEGV using an obvious NULL dereference Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 05/89] perf map: Fix overlapped map handling Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 06/89] perf script brstackinsn: Fix recovery from LBR/binary mismatch Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 07/89] perf jevents: Fix period for Intel fixed counters Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 08/89] perf tools: Propagate get_cpuid() error Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 09/89] perf annotate: Propagate perf_env__arch() error Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 10/89] perf annotate: Fix the signedness of failure returns Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 11/89] perf annotate: Propagate the symbol__annotate() error return Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 12/89] perf annotate: Fix arch specific ->init() failure errors Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 13/89] perf annotate: Return appropriate error code for allocation failures Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 14/89] perf annotate: Don't return -1 for error when doing BPF disassembly Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 15/89] staging: rtl8188eu: fix null dereference when kzalloc fails Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 16/89] crypto: arm/aes-ce - add dependency on AES library Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 17/89] RDMA/siw: Fix serialization issue in write_space() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 18/89] RDMA/hfi1: Prevent memory leak in sdma_init Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 19/89] RDMA/iw_cxgb4: fix SRQ access from dump_qp() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 20/89] RDMA/iwcm: Fix a lock inversion issue Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 21/89] HID: hyperv: Use in-place iterator API in the channel callback Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 22/89] kselftest: exclude failed TARGETS from runlist Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 23/89] selftests/kselftest/runner.sh: Add 45 second timeout per test Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 24/89] nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 25/89] arm64: cpufeature: Effectively expose FRINT capability to userspace Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 26/89] arm64: Fix incorrect irqflag restore for priority masking for compat Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 27/89] arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419 Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 28/89] tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()' Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 29/89] tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()' Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 30/89] serial/sifive: select SERIAL_EARLYCON Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 31/89] tty: n_hdlc: fix build on SPARC Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 32/89] misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 33/89] RDMA/core: Fix an error handling path in 'res_get_common_doit()' Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 34/89] RDMA/cm: Fix memory leak in cm_add/remove_one Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 35/89] RDMA/cxgb4: Do not dma memory off of the stack Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 36/89] RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 37/89] RDMA/mlx5: Do not allow rereg of a ODP MR Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 38/89] RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 39/89] RDMA/mlx5: Add missing synchronize_srcu() for MW cases Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 40/89] gpio: max77620: Use correct unit for debounce times Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 41/89] fs: cifs: mute -Wunused-const-variable message Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 42/89] arm64: vdso32: Fix broken compat vDSO build warnings Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 43/89] arm64: vdso32: Detect binutils support for dmb ishld Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 44/89] serial: mctrl_gpio: Check for NULL pointer Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 45/89] serial: 8250_omap: Fix gpio check for auto RTS/CTS Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 46/89] arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 47/89] arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 48/89] efi/cper: Fix endianness of PCIe class code Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 49/89] efi/x86: Do not clean dummy variable in kexec path Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 50/89] kbuild: fix build error of 'make nsdeps' in clean tree Sasha Levin
2019-10-19  0:14   ` Masahiro Yamada
2019-10-29  9:09     ` Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 51/89] MIPS: include: Mark __cmpxchg as __always_inline Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 52/89] riscv: avoid kernel hangs when trapped in BUG() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 53/89] riscv: avoid sending a SIGTRAP to a user thread trapped in WARN() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 54/89] riscv: Correct the handling of unexpected ebreak in do_trap_break() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 55/89] x86/xen: Return from panic notifier Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 56/89] ocfs2: clear zero in unaligned direct IO Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 57/89] fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 58/89] fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 59/89] fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 60/89] btrfs: silence maybe-uninitialized warning in clone_range Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 61/89] arm64: armv8_deprecated: Checking return value for memory allocation Sasha Levin
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 62/89] x86/cpu: Add Comet Lake to the Intel CPU models header Sasha Levin
2019-10-18 22:02 ` Sasha Levin [this message]
2019-10-18 22:02 ` [PATCH AUTOSEL 5.3 64/89] sched/vtime: Fix guest/system mis-accounting on task switch Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 65/89] perf/core: Rework memory accounting in perf_mmap() Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 66/89] perf/core: Fix corner case in perf_rotate_context() Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 67/89] perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 68/89] drm/amdgpu: fix memory leak Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 69/89] iio: adc: hx711: fix bug in sampling of data Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 70/89] iio: accel: adxl372: Fix/remove limitation for FIFO samples Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 71/89] iio: accel: adxl372: Fix push to buffers lost samples Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 72/89] iio: accel: adxl372: Perform a reset at start up Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 73/89] iio: imu: adis16400: release allocated memory on failure Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 74/89] iio: imu: adis16400: fix memory leak Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 75/89] iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 76/89] iio: light: fix vcnl4000 devicetree hooks Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 77/89] iio: light: add missing vcnl4040 of_compatible Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 78/89] iio: adc: ad799x: fix probe error handling Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 79/89] iio: light: opt3001: fix mutex unlock race Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 80/89] MIPS: include: Mark __xchg as __always_inline Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 81/89] MIPS: fw: sni: Fix out of bounds init of o32 stack Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 82/89] s390/cio: fix virtio-ccw DMA without PV Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 83/89] USB: usb-skeleton: fix use-after-free after driver unbind Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 84/89] virt: vbox: fix memory leak in hgcm_call_preprocess_linaddr Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 85/89] nbd: fix possible sysfs duplicate warning Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 86/89] NFSv4: Fix leak of clp->cl_acceptor string Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 87/89] SUNRPC: fix race to sk_err after xs_error_report Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 88/89] s390/uaccess: avoid (false positive) compiler warnings Sasha Levin
2019-10-18 22:03 ` [PATCH AUTOSEL 5.3 89/89] tracing: Initialize iter->seq after zeroing in tracing_read_pipe() Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191018220324.8165-63-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=anton@ozlabs.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=xueweiz@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).