All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Odin Ugedal <odin@uged.al>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 49/90] sched/fair: Correctly insert cfs_rqs to list on unthrottle
Date: Mon, 21 Jun 2021 18:15:24 +0200	[thread overview]
Message-ID: <20210621154905.799543802@linuxfoundation.org> (raw)
In-Reply-To: <20210621154904.159672728@linuxfoundation.org>

From: Odin Ugedal <odin@uged.al>

[ Upstream commit a7b359fc6a37faaf472125867c8dc5a068c90982 ]

Fix an issue where fairness is decreased since cfs_rq's can end up not
being decayed properly. For two sibling control groups with the same
priority, this can often lead to a load ratio of 99/1 (!!).

This happens because when a cfs_rq is throttled, all the descendant
cfs_rq's will be removed from the leaf list. When they initial cfs_rq
is unthrottled, it will currently only re add descendant cfs_rq's if
they have one or more entities enqueued. This is not a perfect
heuristic.

Instead, we insert all cfs_rq's that contain one or more enqueued
entities, or it its load is not completely decayed.

Can often lead to situations like this for equally weighted control
groups:

  $ ps u -C stress
  USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root       10009 88.8  0.0   3676   100 pts/1    R+   11:04   0:13 stress --cpu 1
  root       10023  3.0  0.0   3676   104 pts/1    R+   11:04   0:00 stress --cpu 1

Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
[vingo: !SMP build fix]
Signed-off-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/sched/fair.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3f4113e87de..877672df822f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3131,6 +3131,24 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags)
 
 #ifdef CONFIG_SMP
 #ifdef CONFIG_FAIR_GROUP_SCHED
+
+static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
+{
+	if (cfs_rq->load.weight)
+		return false;
+
+	if (cfs_rq->avg.load_sum)
+		return false;
+
+	if (cfs_rq->avg.util_sum)
+		return false;
+
+	if (cfs_rq->avg.runnable_load_sum)
+		return false;
+
+	return true;
+}
+
 /**
  * update_tg_load_avg - update the tg's load avg
  * @cfs_rq: the cfs_rq whose avg changed
@@ -3833,6 +3851,11 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
 
 #else /* CONFIG_SMP */
 
+static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
+{
+	return true;
+}
+
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
@@ -4488,8 +4511,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 		cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
 					     cfs_rq->throttled_clock_task;
 
-		/* Add cfs_rq with already running entity in the list */
-		if (cfs_rq->nr_running >= 1)
+		/* Add cfs_rq with load or one or more already running entities to the list */
+		if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq->nr_running)
 			list_add_leaf_cfs_rq(cfs_rq);
 	}
 
@@ -7620,23 +7643,6 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
-{
-	if (cfs_rq->load.weight)
-		return false;
-
-	if (cfs_rq->avg.load_sum)
-		return false;
-
-	if (cfs_rq->avg.util_sum)
-		return false;
-
-	if (cfs_rq->avg.runnable_load_sum)
-		return false;
-
-	return true;
-}
-
 static bool __update_blocked_fair(struct rq *rq, bool *done)
 {
 	struct cfs_rq *cfs_rq, *pos;
-- 
2.30.2




  parent reply	other threads:[~2021-06-21 16:19 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 16:14 [PATCH 5.4 00/90] 5.4.128-rc1 review Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 01/90] dmaengine: ALTERA_MSGDMA depends on HAS_IOMEM Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 02/90] dmaengine: QCOM_HIDMA_MGMT " Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 03/90] dmaengine: stedma40: add missing iounmap() on error in d40_probe() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 04/90] afs: Fix an IS_ERR() vs NULL check Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 05/90] mm/memory-failure: make sure wait for page writeback in memory_failure Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 06/90] kvm: LAPIC: Restore guard to prevent illegal APIC register access Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 07/90] batman-adv: Avoid WARN_ON timing related checks Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 08/90] net: ipv4: fix memory leak in netlbl_cipsov4_add_std Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 09/90] vrf: fix maximum MTU Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 10/90] net: rds: fix memory leak in rds_recvmsg Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 11/90] net: lantiq: disable interrupt before sheduling NAPI Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 12/90] udp: fix race between close() and udp_abort() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 13/90] rtnetlink: Fix regression in bridge VLAN configuration Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 14/90] net/sched: act_ct: handle DNAT tuple collision Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 15/90] net/mlx5e: Remove dependency in IPsec initialization flows Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 16/90] net/mlx5e: Fix page reclaim for dead peer hairpin Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 17/90] net/mlx5: Consider RoCE cap before init RDMA resources Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 18/90] net/mlx5e: allow TSO on VXLAN over VLAN topologies Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 19/90] net/mlx5e: Block offload of outer header csum for UDP tunnels Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 20/90] netfilter: synproxy: Fix out of bounds when parsing TCP options Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 21/90] sch_cake: Fix out of bounds when parsing TCP options and header Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 22/90] alx: Fix an error handling path in alx_probe() Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 23/90] net: stmmac: dwmac1000: Fix extended MAC address registers definition Greg Kroah-Hartman
2021-06-21 16:14 ` [PATCH 5.4 24/90] net: make get_net_ns return error if NET_NS is disabled Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 25/90] qlcnic: Fix an error handling path in qlcnic_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 26/90] netxen_nic: Fix an error handling path in netxen_nic_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 27/90] net: qrtr: fix OOB Read in qrtr_endpoint_post Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 28/90] ptp: improve max_adj check against unreasonable values Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 29/90] net: cdc_ncm: switch to eth%d interface naming Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 30/90] lantiq: net: fix duplicated skb in rx descriptor ring Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 31/90] net: usb: fix possible use-after-free in smsc75xx_bind Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 32/90] net: fec_ptp: fix issue caused by refactor the fec_devtype Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 33/90] net: ipv4: fix memory leak in ip_mc_add1_src Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 34/90] net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 35/90] be2net: Fix an error handling path in be_probe() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 36/90] net: hamradio: fix memory leak in mkiss_close Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 37/90] net: cdc_eem: fix tx fixup skb leak Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 38/90] cxgb4: fix wrong shift Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 39/90] bnxt_en: Rediscover PHY capabilities after firmware reset Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 40/90] bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 41/90] icmp: dont send out ICMP messages with a source address of 0.0.0.0 Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 42/90] net: ethernet: fix potential use-after-free in ec_bhf_remove Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 43/90] regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 44/90] ASoC: rt5659: Fix the lost powers for the HDA header Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 45/90] spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 46/90] pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 47/90] radeon: use memcpy_to/fromio for UVD fw upload Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 48/90] hwmon: (scpi-hwmon) shows the negative temperature properly Greg Kroah-Hartman
2021-06-21 16:15 ` Greg Kroah-Hartman [this message]
2021-06-21 16:15 ` [PATCH 5.4 50/90] can: bcm: fix infoleak in struct bcm_msg_head Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 51/90] can: bcm/raw/isotp: use per module netdevice notifier Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 52/90] can: j1939: fix Use-after-Free, hold skb ref while in use Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 53/90] can: mcba_usb: fix memory leak in mcba_usb Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 54/90] usb: core: hub: Disable autosuspend for Cypress CY7C65632 Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 55/90] tracing: Do not stop recording cmdlines when tracing is off Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 56/90] tracing: Do not stop recording comms if the trace file is being read Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 57/90] tracing: Do no increment trace_clock_global() by one Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 58/90] PCI: Mark TI C667X to avoid bus reset Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 59/90] PCI: Mark some NVIDIA GPUs " Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 60/90] PCI: aardvark: Dont rely on jiffies while holding spinlock Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 61/90] PCI: aardvark: Fix kernel panic during PIO transfer Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 62/90] PCI: Add ACS quirk for Broadcom BCM57414 NIC Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 63/90] PCI: Work around Huawei Intelligent NIC VF FLR erratum Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 64/90] KVM: x86: Immediately reset the MMU context when the SMM flag is cleared Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 65/90] ARCv2: save ABI registers across signal handling Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 66/90] x86/process: Check PF_KTHREAD and not current->mm for kernel threads Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 67/90] x86/pkru: Write hardware init value to PKRU when xstate is init Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 68/90] x86/fpu: Reset state for all signal restore failures Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 69/90] dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 70/90] cfg80211: make certificate generation more robust Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 71/90] cfg80211: avoid double free of PMSR request Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 72/90] drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 73/90] drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 74/90] net: ll_temac: Make sure to free skb when it is completely used Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 75/90] net: ll_temac: Fix TX BD buffer overwrite Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 76/90] net: bridge: fix vlan tunnel dst null pointer dereference Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 77/90] net: bridge: fix vlan tunnel dst refcnt when egressing Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 78/90] mm/slub: clarify verification reporting Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 79/90] mm/slub: fix redzoning for small allocations Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 80/90] mm/slub.c: include swab.h Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 81/90] net: stmmac: disable clocks in stmmac_remove_config_dt() Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 82/90] net: fec_ptp: add clock rate zero check Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 83/90] tools headers UAPI: Sync linux/in.h copy with the kernel sources Greg Kroah-Hartman
2021-06-21 16:15 ` [PATCH 5.4 84/90] KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 85/90] ARM: OMAP: replace setup_irq() by request_irq() Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 86/90] clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 87/90] clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 88/90] clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 89/90] usb: dwc3: debugfs: Add and remove endpoint dirs dynamically Greg Kroah-Hartman
2021-06-21 16:16 ` [PATCH 5.4 90/90] usb: dwc3: core: fix kernel panic when do reboot Greg Kroah-Hartman
2021-06-21 19:16 ` [PATCH 5.4 00/90] 5.4.128-rc1 review Florian Fainelli
2021-06-22  7:10 ` Naresh Kamboju
2021-06-22  7:31 ` Samuel Zou
2021-06-22  7:58 ` Jon Hunter
2021-06-22 10:45 ` Sudip Mukherjee
2021-06-22 21:34 ` Guenter Roeck
2021-06-22 23:59 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210621154905.799543802@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=odin@uged.al \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.