linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Michael Wang <yun.wang@linux.alibaba.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.4 19/46] sched: Avoid scale real weight down to zero
Date: Thu,  9 Apr 2020 23:48:42 -0400	[thread overview]
Message-ID: <20200410034909.8922-19-sashal@kernel.org> (raw)
In-Reply-To: <20200410034909.8922-1-sashal@kernel.org>

From: Michael Wang <yun.wang@linux.alibaba.com>

[ Upstream commit 26cf52229efc87e2effa9d788f9b33c40fb3358a ]

During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:

  /sys/fs/cgroup/cpu/A		(shares=102400)
  /sys/fs/cgroup/cpu/A/B	(shares=2)
  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)

  /sys/fs/cgroup/cpu/D		(shares=1024)
  /sys/fs/cgroup/cpu/D/E	(shares=1024)
  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)

The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.

We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.

The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.

And in calc_group_shares() we calculate shares as:

  load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
  shares = (tg_shares * load) / tg_weight;

Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.

While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.

Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.

This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/sched/sched.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e5e2605778c97..c7e7481968bfa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -118,7 +118,13 @@ extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 #ifdef CONFIG_64BIT
 # define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
 # define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)
-# define scale_load_down(w)	((w) >> SCHED_FIXEDPOINT_SHIFT)
+# define scale_load_down(w) \
+({ \
+	unsigned long __w = (w); \
+	if (__w) \
+		__w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \
+	__w; \
+})
 #else
 # define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)
 # define scale_load(w)		(w)
-- 
2.20.1


  parent reply	other threads:[~2020-04-10  3:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-10  3:48 [PATCH AUTOSEL 5.4 01/46] cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 02/46] staging: wilc1000: avoid double unlocking of 'wilc->hif_cs' mutex Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 03/46] media: venus: hfi_parser: Ignore HEVC encoding for V1 Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 04/46] firmware: arm_sdei: fix double-lock on hibernate with shared events Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 05/46] null_blk: Fix the null_add_dev() error path Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 06/46] null_blk: Handle null_add_dev() failures properly Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 07/46] null_blk: fix spurious IO errors after failed past-wp access Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 08/46] media: imx: imx7_mipi_csis: Power off the source when stopping streaming Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 09/46] media: imx: imx7-media-csi: Fix video field handling Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 10/46] xhci: bail out early if driver can't accress host in resume Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 11/46] ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add() Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 12/46] x86: Don't let pgprot_modify() change the page encryption bit Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 13/46] dma-mapping: Fix dma_pgprot() for unencrypted coherent pages Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 14/46] block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 15/46] debugfs: Check module state before warning in {full/open}_proxy_open() Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 16/46] irqchip/versatile-fpga: Handle chained IRQs properly Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 17/46] time/sched_clock: Expire timer in hardirq context Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 18/46] media: allegro: fix type of gop_length in channel_create message Sasha Levin
2020-04-10  3:48 ` Sasha Levin [this message]
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 20/46] selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 21/46] PCI/switchtec: Fix init_completion race condition with poll_wait() Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 22/46] block, bfq: move forward the getting of an extra ref in bfq_bfqq_move Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 23/46] media: i2c: video-i2c: fix build errors due to 'imply hwmon' Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 24/46] libata: Remove extra scsi_host_put() in ata_scsi_add_hosts() Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 25/46] pstore/platform: fix potential mem leak if pstore_init_fs failed Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 26/46] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 27/46] gfs2: Don't demote a glock until its revokes are written Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 28/46] cpufreq: imx6q: fix error handling Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 29/46] x86/boot: Use unsigned comparison for addresses Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 30/46] efi/x86: Ignore the memory attributes table on i386 Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 31/46] genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy() Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 32/46] blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 33/46] block: Fix use-after-free issue accessing struct io_cq Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 34/46] media: i2c: ov5695: Fix power on and off sequences Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 35/46] usb: dwc3: core: add support for disabling SS instances in park mode Sasha Levin
2020-04-10  3:48 ` [PATCH AUTOSEL 5.4 36/46] irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 37/46] md: check arrays is suspended in mddev_detach before call quiesce operations Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 38/46] firmware: fix a double abort case with fw_load_sysfs_fallback Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 39/46] spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 40/46] locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps() Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 41/46] block, bfq: fix use-after-free in bfq_idle_slice_timer_body Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 42/46] btrfs: hold a ref on the root in btrfs_recover_relocation Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 43/46] btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 44/46] btrfs: remove a BUG_ON() from merge_reloc_roots() Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 45/46] btrfs: restart relocate_tree_blocks properly Sasha Levin
2020-04-10  3:49 ` [PATCH AUTOSEL 5.4 46/46] btrfs: track reloc roots based on their commit root bytenr Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200410034909.8922-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yun.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).