linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
@ 2023-08-30  0:37 Bagas Sanjaya
  2023-08-30  0:42 ` Bagas Sanjaya
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Bagas Sanjaya @ 2023-08-30  0:37 UTC (permalink / raw)
  To: Hao Jia, Ben Segall, Vincent Guittot, Peter Zijlstra (Intel), Igor Raits
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux Stable

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
> 
> [293790.928007] ------------[ cut here ]------------
> [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
> [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
> [293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E) xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E) libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E) ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E) udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E) ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E) iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E)
> [293790.933738]  binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E) drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E) drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E) i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
> [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
> [293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f> 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00
> [293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086
> [293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
> [293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI: ffff905482c5f700
> [293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09: 00000000ffff7fff
> [293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12: 00000000000317c0
> [293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15: ffff905482c726b8
> [293790.972085] FS:  00007ff3b66fe640(0000) GS:ffff905482c40000(0000) knlGS:0000000000000000
> [293790.973678] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4: 0000000000770ee0
> [293790.976108] PKRU: 55555554
> [293790.977048] Call Trace:
> [293790.978013]  <TASK>
> [293790.978678]  ? __warn+0x80/0x130
> [293790.979727]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.980824]  ? report_bug+0x195/0x1a0
> [293790.981806]  ? handle_bug+0x3c/0x70
> [293790.982884]  ? exc_invalid_op+0x14/0x70
> [293790.983837]  ? asm_exc_invalid_op+0x16/0x20
> [293790.984626]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.985599]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.986583]  unregister_fair_sched_group+0x73/0x1d0
> [293790.987682]  sched_unregister_group_rcu+0x1a/0x40
> [293790.988752]  rcu_do_batch+0x199/0x4d0
> [293790.989643]  rcu_core+0x267/0x420
> [293790.990418]  __do_softirq+0xc8/0x2ab
> [293790.991285]  __irq_exit_rcu+0xb9/0xf0
> [293790.992555]  sysvec_apic_timer_interrupt+0x3c/0x90
> [293790.993477]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> [293790.994171] RIP: 0033:0x7ff4dca91f60
> [293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48> 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a
> [293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246
> [293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX: 0000000000000004
> [293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI: 00007ff4f1180eb7
> [293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09: 000000066cf493a8
> [293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12: 0000000000000000
> [293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15: 00007ff39c033800
> [293791.002966]  </TASK>
> [293791.003489] ---[ end trace 0000000000000000 ]---
> [293791.004440] ------------[ cut here ]------------
> [293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP
> [293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496 update_curr+0x162/0x1d0
> 
> Sadly I don't have more info but hopefully this stacktrace will be enough.
> 

See Bugzilla for the full thread.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217843

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-08-30  0:37 Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Bagas Sanjaya
@ 2023-08-30  0:42 ` Bagas Sanjaya
  2023-08-30 19:16 ` Benjamin Segall
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Bagas Sanjaya @ 2023-08-30  0:42 UTC (permalink / raw)
  To: Hao Jia, Ben Segall, Vincent Guittot, Peter Zijlstra (Intel), Igor Raits
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux Stable

On 30/08/2023 07:37, Bagas Sanjaya wrote:
>> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
>> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023

To Igor: Does vanilla v6.4.12 also exhibit this regression?

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-08-30  0:37 Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Bagas Sanjaya
  2023-08-30  0:42 ` Bagas Sanjaya
@ 2023-08-30 19:16 ` Benjamin Segall
  2023-08-31  8:48   ` [External] " Hao Jia
  2023-10-24  8:52 ` [tip: sched/core] sched/core: Fix RQCF_ACT_SKIP leak tip-bot2 for Hao Jia
  2023-11-01 10:53 ` Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Linux regression tracking #update (Thorsten Leemhuis)
  3 siblings, 1 reply; 10+ messages in thread
From: Benjamin Segall @ 2023-08-30 19:16 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Hao Jia, Vincent Guittot, Peter Zijlstra (Intel),
	Igor Raits, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable

Bagas Sanjaya <bagasdotme@gmail.com> writes:

> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
>> Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
>> 
>> [293790.928007] ------------[ cut here ]------------
>> [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
>> [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
>> [293790.933694] Modules linked in: [...]
>> [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
>> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
>> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
>> [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
>
> See Bugzilla for the full thread.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217843

The code in question is literally "rq_lock; update_rq_clock;
rq_clock_start_loop_update (the warning)", which suggests to me that
RQCF_ACT_SKIP is somehow leaking from somewhere else?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-08-30 19:16 ` Benjamin Segall
@ 2023-08-31  8:48   ` Hao Jia
  2023-09-04 22:23     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Hao Jia @ 2023-08-31  8:48 UTC (permalink / raw)
  To: Benjamin Segall, Bagas Sanjaya
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Igor Raits, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable



On 2023/8/31 Benjamin Segall wrote:
Hi,

> Bagas Sanjaya <bagasdotme@gmail.com> writes:
> 
>> Hi,
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>>> Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.
>>>
>>> [293790.928007] ------------[ cut here ]------------
>>> [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
>>> [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
>>> [293790.933694] Modules linked in: [...]
>>> [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
>>> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
>>> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
>>> [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
>>
>> See Bugzilla for the full thread.
>>
>> Anyway, I'm adding this regression to regzbot:
>>
>> #regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843
>>
>> Thanks.
>>
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217843
> 
> The code in question is literally "rq_lock; update_rq_clock;
> rq_clock_start_loop_update (the warning)", which suggests to me that
> RQCF_ACT_SKIP is somehow leaking from somewhere else?

If I understand correctly, rq->clock_update_flags may be set to 
RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq 
lock may be released briefly in __schedule(), such as newidle_balance(). 
At this time Other CPUs hold this rq lock, and then calling 
rq_clock_start_loop_update() may trigger this warning.

This warning check might be wrong. We need to add assert_clock_updated() 
to check that the rq clock has been updated before calling 
rq_clock_start_loop_update().

Maybe some things can be like this?

From: Hao Jia <jiahao.os@bytedance.com>
Date: Thu, 31 Aug 2023 11:38:54 +0800
Subject: [PATCH] sched/core: Fix wrong warning check in 
rq_clock_start_loop_update()

Commit ebb83d84e49b54 ("sched/core: Avoid multiple
calling update_rq_clock() in __cfsb_csd_unthrottle()")
add "rq->clock_update_flags & RQCF_ACT_SKIP" warning in
rq_clock_start_loop_update().
But this warning is inaccurate and may be triggered
incorrectly in the following situations:

     CPU0                                      CPU1

__schedule()
   *rq->clock_update_flags <<= 1;*   unregister_fair_sched_group()
   pick_next_task_fair+0x4a/0x410      destroy_cfs_bandwidth()
     newidle_balance+0x115/0x3e0       for_each_possible_cpu(i) *i=0*
       rq_unpin_lock(this_rq, rf)      __cfsb_csd_unthrottle()
       raw_spin_rq_unlock(this_rq)
                                       rq_lock(*CPU0_rq*, &rf)
                                       rq_clock_start_loop_update()
                                       rq->clock_update_flags & 
RQCF_ACT_SKIP <--

       raw_spin_rq_lock(this_rq)

So we remove this wrong check. Add assert_clock_updated() to
check that rq clock has been updated before calling
rq_clock_start_loop_update(). And use the variable rq_clock_flags
in rq_clock_start_loop_update() to record the previous state of
rq->clock_update_flags. Correspondingly, restore rq->clock_update_flags
through rq_clock_flags in rq_clock_stop_loop_update() to prevent
losing its previous information.

Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling 
update_rq_clock() in __cfsb_csd_unthrottle()")
Cc: stable@vger.kernel.org
Reported-by: Igor Raits <igor.raits@gmail.com>
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
  kernel/sched/fair.c  | 10 ++++++----
  kernel/sched/sched.h | 12 +++++++-----
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 911d0063763c..0f6557c82a4c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5679,6 +5679,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
  #ifdef CONFIG_SMP
  static void __cfsb_csd_unthrottle(void *arg)
  {
+	unsigned int rq_clock_flags;
  	struct cfs_rq *cursor, *tmp;
  	struct rq *rq = arg;
  	struct rq_flags rf;
@@ -5691,7 +5692,7 @@ static void __cfsb_csd_unthrottle(void *arg)
  	 * Do it once and skip the potential next ones.
  	 */
  	update_rq_clock(rq);
-	rq_clock_start_loop_update(rq);
+	rq_clock_start_loop_update(rq, &rq_clock_flags);

  	/*
  	 * Since we hold rq lock we're safe from concurrent manipulation of
@@ -5712,7 +5713,7 @@ static void __cfsb_csd_unthrottle(void *arg)

  	rcu_read_unlock();

-	rq_clock_stop_loop_update(rq);
+	rq_clock_stop_loop_update(rq, &rq_clock_flags);
  	rq_unlock(rq, &rf);
  }

@@ -6230,6 +6231,7 @@ static void __maybe_unused 
update_runtime_enabled(struct rq *rq)
  /* cpu offline callback */
  static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
  {
+	unsigned int rq_clock_flags;
  	struct task_group *tg;

  	lockdep_assert_rq_held(rq);
@@ -6239,7 +6241,7 @@ static void __maybe_unused 
unthrottle_offline_cfs_rqs(struct rq *rq)
  	 * set_rq_offline(), so we should skip updating
  	 * the rq clock again in unthrottle_cfs_rq().
  	 */
-	rq_clock_start_loop_update(rq);
+	rq_clock_start_loop_update(rq, &rq_clock_flags);

  	rcu_read_lock();
  	list_for_each_entry_rcu(tg, &task_groups, list) {
@@ -6264,7 +6266,7 @@ static void __maybe_unused 
unthrottle_offline_cfs_rqs(struct rq *rq)
  	}
  	rcu_read_unlock();

-	rq_clock_stop_loop_update(rq);
+	rq_clock_stop_loop_update(rq, &rq_clock_flags);
  }

  bool cfs_task_bw_constrained(struct task_struct *p)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 04846272409c..ff2864f202f5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1558,20 +1558,22 @@ static inline void 
rq_clock_cancel_skipupdate(struct rq *rq)
   * when using list_for_each_entry_*)
   * rq_clock_start_loop_update() can be called after updating the clock
   * once and before iterating over the list to prevent multiple update.
+ * And use @rq_clock_flags to record the previous state of 
rq->clock_update_flags.
   * After the iterative traversal, we need to call 
rq_clock_stop_loop_update()
- * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ * to restore rq->clock_update_flags through @rq_clock_flags.
   */
-static inline void rq_clock_start_loop_update(struct rq *rq)
+static inline void rq_clock_start_loop_update(struct rq *rq, unsigned 
int *rq_clock_flags)
  {
  	lockdep_assert_rq_held(rq);
-	SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+	assert_clock_updated(rq);
+	*rq_clock_flags = rq->clock_update_flags;
  	rq->clock_update_flags |= RQCF_ACT_SKIP;
  }

-static inline void rq_clock_stop_loop_update(struct rq *rq)
+static inline void rq_clock_stop_loop_update(struct rq *rq, unsigned 
int *rq_clock_flags)
  {
  	lockdep_assert_rq_held(rq);
-	rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+	rq->clock_update_flags = *rq_clock_flags;
  }

  struct rq_flags {
--
2.20.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-08-31  8:48   ` [External] " Hao Jia
@ 2023-09-04 22:23     ` Peter Zijlstra
  2023-09-07  8:59       ` Hao Jia
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2023-09-04 22:23 UTC (permalink / raw)
  To: Hao Jia
  Cc: Benjamin Segall, Bagas Sanjaya, Vincent Guittot, Igor Raits,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable

On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:

> If I understand correctly, rq->clock_update_flags may be set to
> RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
> lock may be released briefly in __schedule(), such as newidle_balance(). At
> this time Other CPUs hold this rq lock, and then calling
> rq_clock_start_loop_update() may trigger this warning.
> 
> This warning check might be wrong. We need to add assert_clock_updated() to
> check that the rq clock has been updated before calling
> rq_clock_start_loop_update().
> 
> Maybe some things can be like this?

Urgh, aside from it being white space mangled, I think this is entirely
going in the wrong direction.

Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
clearly though, I'll have to try again tomorrow.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-09-04 22:23     ` Peter Zijlstra
@ 2023-09-07  8:59       ` Hao Jia
  2023-09-07 21:01         ` Tim Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Hao Jia @ 2023-09-07  8:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Benjamin Segall, Bagas Sanjaya, Vincent Guittot, Igor Raits,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable



On 2023/9/5 Peter Zijlstra wrote:
> On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
> 
>> If I understand correctly, rq->clock_update_flags may be set to
>> RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
>> lock may be released briefly in __schedule(), such as newidle_balance(). At
>> this time Other CPUs hold this rq lock, and then calling
>> rq_clock_start_loop_update() may trigger this warning.
>>
>> This warning check might be wrong. We need to add assert_clock_updated() to
>> check that the rq clock has been updated before calling
>> rq_clock_start_loop_update().
>>
>> Maybe some things can be like this?
> 
> Urgh, aside from it being white space mangled, I think this is entirely
> going in the wrong direction.
> 
> Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
> clearly though, I'll have to try again tomorrow.

Hi Peter,

Do you think this fix method is correct? Or should we go back to the 
beginning and move update_rq_clock() from unthrottle_cfs_rq()?

Thanks,
Hao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-09-07  8:59       ` Hao Jia
@ 2023-09-07 21:01         ` Tim Chen
  2023-09-08  3:28           ` Hao Jia
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Chen @ 2023-09-07 21:01 UTC (permalink / raw)
  To: Hao Jia, Peter Zijlstra
  Cc: Benjamin Segall, Bagas Sanjaya, Vincent Guittot, Igor Raits,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable

On Thu, 2023-09-07 at 16:59 +0800, Hao Jia wrote:
> 
> On 2023/9/5 Peter Zijlstra wrote:
> > On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
> > 
> > > If I understand correctly, rq->clock_update_flags may be set to
> > > RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
> > > lock may be released briefly in __schedule(), such as newidle_balance(). At
> > > this time Other CPUs hold this rq lock, and then calling
> > > rq_clock_start_loop_update() may trigger this warning.
> > > 
> > > This warning check might be wrong. We need to add assert_clock_updated() to
> > > check that the rq clock has been updated before calling
> > > rq_clock_start_loop_update().
> > > 
> > > Maybe some things can be like this?
> > 
> > Urgh, aside from it being white space mangled, I think this is entirely
> > going in the wrong direction.
> > 
> > Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
> > clearly though, I'll have to try again tomorrow.

I am trying to understand why this is an ACT_SKIP leak.
Before call to __cfsb_csd_unthrottle(), is it possible someone
else lock the runqueue, set ACT_SKIP and release rq_lock?
And then that someone never update the rq_clock? 

> 
> Hi Peter,
> 
> Do you think this fix method is correct? Or should we go back to the 
> beginning and move update_rq_clock() from unthrottle_cfs_rq()?
> 
If anyone who locked the runqueue set ACT_SKIP also will update rq_clock,
I think your change is okay.  Otherwise rq_clock could be missing update.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-09-07 21:01         ` Tim Chen
@ 2023-09-08  3:28           ` Hao Jia
  0 siblings, 0 replies; 10+ messages in thread
From: Hao Jia @ 2023-09-08  3:28 UTC (permalink / raw)
  To: Tim Chen, Peter Zijlstra
  Cc: Benjamin Segall, Bagas Sanjaya, Vincent Guittot, Igor Raits,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable



On 2023/9/8 Tim Chen wrote:
> On Thu, 2023-09-07 at 16:59 +0800, Hao Jia wrote:
>>
>> On 2023/9/5 Peter Zijlstra wrote:
>>> On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:
>>>
>>>> If I understand correctly, rq->clock_update_flags may be set to
>>>> RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
>>>> lock may be released briefly in __schedule(), such as newidle_balance(). At
>>>> this time Other CPUs hold this rq lock, and then calling
>>>> rq_clock_start_loop_update() may trigger this warning.
>>>>
>>>> This warning check might be wrong. We need to add assert_clock_updated() to
>>>> check that the rq clock has been updated before calling
>>>> rq_clock_start_loop_update().
>>>>
>>>> Maybe some things can be like this?
>>>
>>> Urgh, aside from it being white space mangled, I think this is entirely
>>> going in the wrong direction.
>>>
>>> Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
>>> clearly though, I'll have to try again tomorrow.
> 
> I am trying to understand why this is an ACT_SKIP leak.
> Before call to __cfsb_csd_unthrottle(), is it possible someone
> else lock the runqueue, set ACT_SKIP and release rq_lock?
> And then that someone never update the rq_clock?
> 

Yes, we want to set rq->clock_update_flags to RQCF_ACT_SKIP to avoid 
updating the rq clock multiple times in __cfsb_csd_unthrottle().

But now we find ACT_SKIP leak, so we cannot unconditionally set 
rq->clock_update_flags to RQCF_ACT_SKIP in rq_clock_start_loop_update().


>>
>> Hi Peter,
>>
>> Do you think this fix method is correct? Or should we go back to the
>> beginning and move update_rq_clock() from unthrottle_cfs_rq()?
>>
> If anyone who locked the runqueue set ACT_SKIP also will update rq_clock,
> I think your change is okay.  Otherwise rq_clock could be missing update.
> 
> Thanks.
> 
> Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: sched/core] sched/core: Fix RQCF_ACT_SKIP leak
  2023-08-30  0:37 Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Bagas Sanjaya
  2023-08-30  0:42 ` Bagas Sanjaya
  2023-08-30 19:16 ` Benjamin Segall
@ 2023-10-24  8:52 ` tip-bot2 for Hao Jia
  2023-11-01 10:53 ` Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Linux regression tracking #update (Thorsten Leemhuis)
  3 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Hao Jia @ 2023-10-24  8:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Igor Raits, Bagas Sanjaya, Peter Zijlstra (Intel),
	Hao Jia, stable, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5ebde09d91707a4a9bec1e3d213e3c12ffde348f
Gitweb:        https://git.kernel.org/tip/5ebde09d91707a4a9bec1e3d213e3c12ffde348f
Author:        Hao Jia <jiahao.os@bytedance.com>
AuthorDate:    Thu, 12 Oct 2023 17:00:03 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 24 Oct 2023 10:38:42 +02:00

sched/core: Fix RQCF_ACT_SKIP leak

Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.

This warning may be triggered in the following situations:

    CPU0                                      CPU1

__schedule()
  *rq->clock_update_flags <<= 1;*   unregister_fair_sched_group()
  pick_next_task_fair+0x4a/0x410      destroy_cfs_bandwidth()
    newidle_balance+0x115/0x3e0       for_each_possible_cpu(i) *i=0*
      rq_unpin_lock(this_rq, rf)      __cfsb_csd_unthrottle()
      raw_spin_rq_unlock(this_rq)
                                      rq_lock(*CPU0_rq*, &rf)
                                      rq_clock_start_loop_update()
                                      rq->clock_update_flags & RQCF_ACT_SKIP <--
      raw_spin_rq_lock(this_rq)

The purpose of RQCF_ACT_SKIP is to skip the update rq clock,
but the update is very early in __schedule(), but we clear
RQCF_*_SKIP very late, causing it to span that gap above
and triggering this warning.

In __schedule() we can clear the RQCF_*_SKIP flag immediately
after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning.
And set rq->clock_update_flags to RQCF_UPDATED to avoid
rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later.

Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()")
Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com
Reported-by: Igor Raits <igor.raits@gmail.com>
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com
---
 kernel/sched/core.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 264c2eb..dc724f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5361,8 +5361,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	/* switch_mm_cid() requires the memory barriers above. */
 	switch_mm_cid(rq, prev, next);
 
-	rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
-
 	prepare_lock_switch(rq, next, rf);
 
 	/* Here we just switch the register state and the stack. */
@@ -6600,6 +6598,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	/* Promote REQ to ACT */
 	rq->clock_update_flags <<= 1;
 	update_rq_clock(rq);
+	rq->clock_update_flags = RQCF_UPDATED;
 
 	switch_count = &prev->nivcsw;
 
@@ -6679,8 +6678,6 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
 	} else {
-		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
-
 		rq_unpin_lock(rq, &rf);
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
  2023-08-30  0:37 Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Bagas Sanjaya
                   ` (2 preceding siblings ...)
  2023-10-24  8:52 ` [tip: sched/core] sched/core: Fix RQCF_ACT_SKIP leak tip-bot2 for Hao Jia
@ 2023-11-01 10:53 ` Linux regression tracking #update (Thorsten Leemhuis)
  3 siblings, 0 replies; 10+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-11-01 10:53 UTC (permalink / raw)
  To: Linux Regressions; +Cc: Linux Kernel Mailing List, Linux Stable

[TLDR: This mail in primarily relevant for Linux regression tracking. A
change or fix related to the regression discussed in this thread was
posted or applied, but it did not use a Closes: tag to point to the
report, as Linus and the documentation call for. Things happen, no
worries -- but now the regression tracking bot needs to be told manually
about the fix. See link in footer if these mails annoy you.]

On 30.08.23 02:37, Bagas Sanjaya wrote:
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> [...]
> #regzbot introduced: ebb83d84e49b54 https://bugzilla.kernel.org/show_bug.cgi?id=217843

#regzbot fix: 5ebde09d91707a4a9bec
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-01 10:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30  0:37 Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Bagas Sanjaya
2023-08-30  0:42 ` Bagas Sanjaya
2023-08-30 19:16 ` Benjamin Segall
2023-08-31  8:48   ` [External] " Hao Jia
2023-09-04 22:23     ` Peter Zijlstra
2023-09-07  8:59       ` Hao Jia
2023-09-07 21:01         ` Tim Chen
2023-09-08  3:28           ` Hao Jia
2023-10-24  8:52 ` [tip: sched/core] sched/core: Fix RQCF_ACT_SKIP leak tip-bot2 for Hao Jia
2023-11-01 10:53 ` Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).