All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
@ 2023-10-05  7:31 Biju Das
  2023-10-05  7:58 ` Biju Das
  2023-10-05 15:02 ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Biju Das @ 2023-10-05  7:31 UTC (permalink / raw)
  To: m.szyprowski, bsegall, peterz
  Cc: bristot, bsegall, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, peterz, pjt, qperret,
	qyousef, rostedt, tglx, tim.c.chen, timj, vincent.guittot,
	youssefesmat, yu.c.chen, mgorman, linux-renesas-soc

Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
In-Reply-To: <xm261qego72d.fsf_-_@google.com>

Hi,

On 30.09.2023 02:09, Benjamin Segall wrote:
> The old pick_eevdf could fail to find the actual earliest eligible
> deadline when it descended to the right looking for min_deadline, but it
> turned out that that min_deadline wasn't actually eligible. In that case
> we need to go back and search through any left branches we skipped
> looking for the actual best _eligible_ min_deadline.
>
> This is more expensive, but still O(log n), and at worst should only
> involve descending two branches of the rbtree.
>
> I've run this through a userspace stress test (thank you
> tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> corner cases.
>
> Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
> Signed-off-by: Ben Segall <bsegall@google.com>

This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform. Reverting the patch fixes the warning messages

[1]
[   25.550898] EEVDF scheduling fail, picking leftmost

[   15.109634] ======================================================
[   15.109636] WARNING: possible circular locking dependency detected
[   15.109641] 6.6.0-rc4-next-20231005-arm64-renesas-ga03f9ebbbb4c #1165 Not tainted
[   15.109648] ------------------------------------------------------
[   15.109649] migration/0/16 is trying to acquire lock:
[   15.109654] ffff800081713460 (console_owner){..-.}-{0:0}, at: console_flush_all.constprop.0+0x1a0/0x438
[   15.109694]
[   15.109694] but task is already holding lock:
[   15.109697] ffff00007fbd2298 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xd0/0xbe0
[   15.109718]
[   15.109718] which lock already depends on the new lock.
[   15.109718]
[   15.109720]
[   15.109720] the existing dependency chain (in reverse order) is:

   25.551560]  __down_trylock_console_sem+0x34/0xb8
[   25.551567]  console_trylock+0x24/0x74
[   25.551574]  vprintk_emit+0x114/0x388
[   25.551581]  vprintk_default+0x34/0x3c
[   25.551588]  vprintk+0x9c/0xb4
[   25.551594]  _printk+0x58/0x7c
[   25.551600]  pick_next_task_fair+0x274/0x480
[   25.551608]  __schedule+0x154/0xbe0
[   25.551616]  schedule+0x48/0x110
[   25.551623]  worker_thread+0x1b8/0x3f8
[   25.551630]  kthread+0x114/0x118
[   25.551635]  ret_from_fork+0x10/0x20
[  OK  ] Started System Logging Service.
[   26.099203] EEVDF scheduling fail, picking leftmost

Cheers,
Biju


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
@ 2023-10-05  7:58 ` Biju Das
       [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
  2023-10-05 15:02 ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Biju Das @ 2023-10-05  7:58 UTC (permalink / raw)
  To: m.szyprowski, bsegall, peterz
  Cc: bristot, bsegall, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, peterz, pjt, qperret,
	qyousef, rostedt, tglx, tim.c.chen, timj, vincent.guittot,
	youssefesmat, yu.c.chen, mgorman, linux-renesas-soc,
	Geert Uytterhoeven

Hi all,

> -----Original Message-----
> From: Biju Das
> Sent: Thursday, October 5, 2023 8:32 AM
> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
> Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
> In-Reply-To: <xm261qego72d.fsf_-_@google.com>
> 
> Hi,
> 
> On 30.09.2023 02:09, Benjamin Segall wrote:
> > The old pick_eevdf could fail to find the actual earliest eligible
> > deadline when it descended to the right looking for min_deadline, but
> > it turned out that that min_deadline wasn't actually eligible. In that
> > case we need to go back and search through any left branches we
> > skipped looking for the actual best _eligible_ min_deadline.
> >
> > This is more expensive, but still O(log n), and at worst should only
> > involve descending two branches of the rbtree.
> >
> > I've run this through a userspace stress test (thank you
> > tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> > corner cases.
> >
> > Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling
> > policy")
> > Signed-off-by: Ben Segall <bsegall@google.com>
> 
> This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform.
> Reverting the patch fixes the warning messages
> 
> [1]
> [   25.550898] EEVDF scheduling fail, picking leftmost
> 
> [   15.109634] ======================================================
> [   15.109636] WARNING: possible circular locking dependency detected
> [   15.109641] 6.6.0-rc4-next-20231005-arm64-renesas-ga03f9ebbbb4c #1165
> Not tainted
> [   15.109648] ------------------------------------------------------
> [   15.109649] migration/0/16 is trying to acquire lock:
> [   15.109654] ffff800081713460 (console_owner){..-.}-{0:0}, at:
> console_flush_all.constprop.0+0x1a0/0x438
> [   15.109694]
> [   15.109694] but task is already holding lock:
> [   15.109697] ffff00007fbd2298 (&rq->__lock){-.-.}-{2:2}, at:
> __schedule+0xd0/0xbe0
> [   15.109718]
> [   15.109718] which lock already depends on the new lock.
> [   15.109718]
> [   15.109720]
> [   15.109720] the existing dependency chain (in reverse order) is:
> 
>    25.551560]  __down_trylock_console_sem+0x34/0xb8
> [   25.551567]  console_trylock+0x24/0x74
> [   25.551574]  vprintk_emit+0x114/0x388
> [   25.551581]  vprintk_default+0x34/0x3c
> [   25.551588]  vprintk+0x9c/0xb4
> [   25.551594]  _printk+0x58/0x7c
> [   25.551600]  pick_next_task_fair+0x274/0x480
> [   25.551608]  __schedule+0x154/0xbe0
> [   25.551616]  schedule+0x48/0x110
> [   25.551623]  worker_thread+0x1b8/0x3f8
> [   25.551630]  kthread+0x114/0x118
> [   25.551635]  ret_from_fork+0x10/0x20
> [  OK  ] Started System Logging Service.
> [   26.099203] EEVDF scheduling fail, picking leftmost

Complete log can be found here

https://pastebin.com/zZkRFiSf

Cheers,
Biju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
       [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
@ 2023-10-05  9:31     ` Biju Das
  0 siblings, 0 replies; 30+ messages in thread
From: Biju Das @ 2023-10-05  9:31 UTC (permalink / raw)
  To: Chen Yu; +Cc: Geert Uytterhoeven, linux-renesas-soc

> Subject: Re: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> correct se
> 
> Hi Biju,
> 
> On 2023-10-05 at 07:58:04 +0000, Biju Das wrote:
> > Hi all,
> >
> > > -----Original Message-----
> > > From: Biju Das
> > > Sent: Thursday, October 5, 2023 8:32 AM
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> > > correct se
> > >
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> > > correct se
> > > Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
> > > Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
> > > In-Reply-To: <xm261qego72d.fsf_-_@google.com>
> > >
> > > Hi,
> > >
> > > On 30.09.2023 02:09, Benjamin Segall wrote:
> > > > The old pick_eevdf could fail to find the actual earliest eligible
> > > > deadline when it descended to the right looking for min_deadline,
> > > > but it turned out that that min_deadline wasn't actually eligible.
> > > > In that case we need to go back and search through any left
> > > > branches we skipped looking for the actual best _eligible_
> min_deadline.
> > > >
> > > > This is more expensive, but still O(log n), and at worst should
> > > > only involve descending two branches of the rbtree.
> > > >
> > > > I've run this through a userspace stress test (thank you
> > > > tools/lib/rbtree.c), so hopefully this implementation doesn't miss
> > > > any corner cases.
> > > >
> > > > Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like
> > > > scheduling
> > > > policy")
> > > > Signed-off-by: Ben Segall <bsegall@google.com>
> > >
> > > This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform.
> > > Reverting the patch fixes the warning messages
> > >
> 
> Thank you for reporting this. It seems to be directly triggered by the
> pr_err in __pick_eevdf(). May I have the kernel config file you are using?
> I'm trying to reproduce this issue on a server.

It is reproducible with [1] and [2]. The logs I shared from [2]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/configs/defconfig?h=next-20231005

[2] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel.git/tree/arch/arm64/configs/renesas_defconfig

Cheers,
Biju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
  2023-10-05  7:58 ` Biju Das
@ 2023-10-05 15:02 ` Peter Zijlstra
  2023-10-05 15:08   ` Biju Das
  2023-10-05 15:11   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Peter Zijlstra
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-05 15:02 UTC (permalink / raw)
  To: Biju Das
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:

> [   26.099203] EEVDF scheduling fail, picking leftmost

This, that the problem.. the rest is just noise because printk stinks.

Weirdly have not seen that trigger, and I've been running with this
patch on for a few days now :/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:02 ` Peter Zijlstra
@ 2023-10-05 15:08   ` Biju Das
  2023-10-06  8:36     ` Marek Szyprowski
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
  2023-10-05 15:11   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Peter Zijlstra
  1 sibling, 2 replies; 30+ messages in thread
From: Biju Das @ 2023-10-05 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

Hi Peter Zijlstra,

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> 
> > [   26.099203] EEVDF scheduling fail, picking leftmost
> 
> This, that the problem.. the rest is just noise because printk stinks.
> 
> Weirdly have not seen that trigger, and I've been running with this patch
> on for a few days now :/

I agree Original issue is "EEVDF scheduling fail, picking leftmost"
Which is triggering noisy lock warning messages during boot.

2 platforms are affected both ARM platforms(Renesas and Samsung)
Maybe other platforms affected too.

I am happy to test if there is a fix available for this issue.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:02 ` Peter Zijlstra
  2023-10-05 15:08   ` Biju Das
@ 2023-10-05 15:11   ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-05 15:11 UTC (permalink / raw)
  To: Biju Das
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Thu, Oct 05, 2023 at 05:02:58PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> 
> > [   26.099203] EEVDF scheduling fail, picking leftmost
> 
> This, that the problem.. the rest is just noise because printk stinks.
> 
> Weirdly have not seen that trigger, and I've been running with this
> patch on for a few days now :/

I've killed the patch for now, will try again once we figure out what
goes side-ways.


Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:08   ` Biju Das
@ 2023-10-06  8:36     ` Marek Szyprowski
  2023-10-06  9:38       ` Biju Das
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
  1 sibling, 1 reply; 30+ messages in thread
From: Marek Szyprowski @ 2023-10-06  8:36 UTC (permalink / raw)
  To: Biju Das, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, pjt, qperret, qyousef,
	rostedt, tglx, tim.c.chen, timj, vincent.guittot, youssefesmat,
	yu.c.chen, mgorman, linux-renesas-soc

On 05.10.2023 17:08, Biju Das wrote:
> Hi Peter Zijlstra,
>
>> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
>> se
>>
>> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
>>
>>> [   26.099203] EEVDF scheduling fail, picking leftmost
>> This, that the problem.. the rest is just noise because printk stinks.
>>
>> Weirdly have not seen that trigger, and I've been running with this patch
>> on for a few days now :/
> I agree Original issue is "EEVDF scheduling fail, picking leftmost"
> Which is triggering noisy lock warning messages during boot.
>
> 2 platforms are affected both ARM platforms(Renesas and Samsung)
> Maybe other platforms affected too.


Just to note, I've run into this issue on the QEmu's 'arm64/virt' 
platform, not on the Samsung specific hardware.


You can easily reproduce it with the following steps:

# make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig

# ./scripts/config -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PM_DEBUG -e 
PM_ADVANCED_DEBUG

# make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig Image.gz

# wget 
https://cloud.debian.org/images/cloud/buster/20230802-1460/debian-10-nocloud-arm64-20230802-1460.tar.xz

# tar xJfv debian-10-nocloud-arm64-20230802-1460.tar.xz


Then run QEmu a few times until you see the lockdep splat:

# qemu-system-aarch64 -serial stdio -kernel arch/arm64/boot/Image 
-append "console=ttyAMA0 root=/dev/vda1 rootwait" -M virt -cpu 
cortex-a57 -smp 2 -m 1024 -netdev user,id=user -device 
virtio-net-device,netdev=user -display none -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=disk.raw,id=virtio-blk0,if=none,format=raw


I have no idea if this is ARM64-specific anyhow, but this is how I 
reproduced it. I hope this guide helps fixing the bug!

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06  8:36     ` Marek Szyprowski
@ 2023-10-06  9:38       ` Biju Das
  0 siblings, 0 replies; 30+ messages in thread
From: Biju Das @ 2023-10-06  9:38 UTC (permalink / raw)
  To: Marek Szyprowski, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, pjt, qperret, qyousef,
	rostedt, tglx, tim.c.chen, timj, vincent.guittot, youssefesmat,
	yu.c.chen, mgorman, linux-renesas-soc

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On 05.10.2023 17:08, Biju Das wrote:
> > Hi Peter Zijlstra,
> >
> >> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> >> correct se
> >>
> >> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> >>
> >>> [   26.099203] EEVDF scheduling fail, picking leftmost
> >> This, that the problem.. the rest is just noise because printk stinks.
> >>
> >> Weirdly have not seen that trigger, and I've been running with this
> >> patch on for a few days now :/
> > I agree Original issue is "EEVDF scheduling fail, picking leftmost"
> > Which is triggering noisy lock warning messages during boot.
> >
> > 2 platforms are affected both ARM platforms(Renesas and Samsung) Maybe
> > other platforms affected too.
> 
> 
> Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> platform, not on the Samsung specific hardware.

OK, if needed I am happy to test on the real hardware, if a fix found for this issue. It is 100% reproducible Renesas RZ/g2L SMARC EVK platform.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
@ 2023-10-06 10:31       ` Mike Galbraith
  2023-10-06 14:00         ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Galbraith @ 2023-10-06 10:31 UTC (permalink / raw)
  To: Marek Szyprowski, Biju Das, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, yu.c.chen,
	mgorman, linux-renesas-soc

On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
>
>
> Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> platform, not on the Samsung specific hardware. 

It doesn't appear to be arch specific, all I have to do is enable
autogroup, raspberry or x86_64 desktop box, the occasional failure
tripping over task groups, leaving both best and best_left with
identical but not what we're looking for ->min_deadline.

	-Mike

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 10:31       ` Mike Galbraith
@ 2023-10-06 14:00         ` Peter Zijlstra
  2023-10-06 14:39           ` Mike Galbraith
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-06 14:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> >
> >
> > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > platform, not on the Samsung specific hardware. 
> 
> It doesn't appear to be arch specific, all I have to do is enable
> autogroup, raspberry or x86_64 desktop box, the occasional failure
> tripping over task groups, leaving both best and best_left with
> identical but not what we're looking for ->min_deadline.

OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
are indeed in existence.

I've ran hackbench and a kernel build, but no whoopsie yet :-(

I suppose I'll kick some benchmarks and go make a cup 'o tea or
something.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 14:00         ` Peter Zijlstra
@ 2023-10-06 14:39           ` Mike Galbraith
  2023-10-06 15:55             ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Galbraith @ 2023-10-06 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 16:00 +0200, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> > On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> > >
> > >
> > > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > > platform, not on the Samsung specific hardware. 
> >
> > It doesn't appear to be arch specific, all I have to do is enable
> > autogroup, raspberry or x86_64 desktop box, the occasional failure
> > tripping over task groups, leaving both best and best_left with
> > identical but not what we're looking for ->min_deadline.
>
> OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
> are indeed in existence.
>
> I've ran hackbench and a kernel build, but no whoopsie yet :-(
>
> I suppose I'll kick some benchmarks and go make a cup 'o tea or
> something.

Hm, just booting gets me a handful, and generic desktop activity
produces a fairly regular supply.  This is virgin 6.6.0.ga9e6eb3-tip.

[  340.473123] EEVDF scheduling fail, picking leftmost
[  340.473132] EEVDF scheduling fail, picking leftmost
[  343.656549] EEVDF scheduling fail, picking leftmost
[  343.656557] EEVDF scheduling fail, picking leftmost
[  343.690463] EEVDF scheduling fail, picking leftmost
[  343.690472] EEVDF scheduling fail, picking leftmost
[  426.224039] EEVDF scheduling fail, picking leftmost
[  426.224080] EEVDF scheduling fail, picking leftmost
[  426.224084] EEVDF scheduling fail, picking leftmost
[  426.363323] EEVDF scheduling fail, picking leftmost
[  426.759244] EEVDF scheduling fail, picking leftmost
[  426.759256] EEVDF scheduling fail, picking leftmost
[  441.872524] EEVDF scheduling fail, picking leftmost
[  441.872556] EEVDF scheduling fail, picking leftmost


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 14:39           ` Mike Galbraith
@ 2023-10-06 15:55             ` Peter Zijlstra
  2023-10-06 16:54               ` Mike Galbraith
  2023-10-06 19:24               ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-06 15:55 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 04:39:09PM +0200, Mike Galbraith wrote:
> On Fri, 2023-10-06 at 16:00 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> > > On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> > > >
> > > >
> > > > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > > > platform, not on the Samsung specific hardware. 
> > >
> > > It doesn't appear to be arch specific, all I have to do is enable
> > > autogroup, raspberry or x86_64 desktop box, the occasional failure
> > > tripping over task groups, leaving both best and best_left with
> > > identical but not what we're looking for ->min_deadline.
> >
> > OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
> > are indeed in existence.
> >
> > I've ran hackbench and a kernel build, but no whoopsie yet :-(
> >
> > I suppose I'll kick some benchmarks and go make a cup 'o tea or
> > something.
> 
> Hm, just booting gets me a handful, and generic desktop activity
> produces a fairly regular supply.  This is virgin 6.6.0.ga9e6eb3-tip.

You're running that systemd thing, eh?

If I create two groups (or two users with autogroup on) and have them
both build a kernel I do indeed get splat.

And yeah, min_deadline is hosed somehow.

migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256

I need to go make dinner (kids hungry), but I'll see if I can figure out
how this happens...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 15:55             ` Peter Zijlstra
@ 2023-10-06 16:54               ` Mike Galbraith
  2023-10-06 19:24               ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Galbraith @ 2023-10-06 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 17:55 +0200, Peter Zijlstra wrote:


> If I create two groups (or two users with autogroup on) and have them
> both build a kernel I do indeed get splat.

Ah, my desktop setup is two wide konsole instances with 14 tabs each
for old testament /me (dang dinky universe) and internet stuff running
as a mere mortal, so lots of groups.

	-Mike

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 15:55             ` Peter Zijlstra
  2023-10-06 16:54               ` Mike Galbraith
@ 2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
                                   ` (4 more replies)
  1 sibling, 5 replies; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-06 19:24 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:

> And yeah, min_deadline is hosed somehow.
> 
> migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
> migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
> migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
> migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> 
> I need to go make dinner (kids hungry), but I'll see if I can figure out
> how this happens...

*sigh*, does the below help?

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04fbcbda97d5..6a670f119efa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 		 */
 		deadline = div_s64(deadline * old_weight, weight);
 		se->deadline = se->vruntime + deadline;
+		min_deadline_cb_propagate(&se->run_node, NULL);
 	}
 
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
@ 2023-10-06 20:15                 ` Marek Szyprowski
  2023-10-06 23:27                 ` Mike Galbraith
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Marek Szyprowski @ 2023-10-06 20:15 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith
  Cc: Biju Das, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On 06.10.2023 21:24, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
>> And yeah, min_deadline is hosed somehow.
>>
>> migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
>> migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
>> migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
>> migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
>> migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
>>
>> I need to go make dinner (kids hungry), but I'll see if I can figure out
>> how this happens...
> *sigh*, does the below help?

It looks that this fixed the issue. At least I was no longer able to 
reproduce it.

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>   		 */
>   		deadline = div_s64(deadline * old_weight, weight);
>   		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>   	}
>   
>   #ifdef CONFIG_SMP
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
@ 2023-10-06 23:27                 ` Mike Galbraith
  2023-10-07  0:37                 ` Chen Yu
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Mike Galbraith @ 2023-10-06 23:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 21:24 +0200, Peter Zijlstra wrote:
> 
> *sigh*, does the below help?

<invisible ink goggles> ah, so there you... weren't.

Yup, all better.

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq
> *cfs_rq, struct sched_entity *se,
>                  */
>                 deadline = div_s64(deadline * old_weight, weight);
>                 se->deadline = se->vruntime + deadline;
> +               min_deadline_cb_propagate(&se->run_node, NULL);
>         }
>  
>  #ifdef CONFIG_SMP


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
  2023-10-06 23:27                 ` Mike Galbraith
@ 2023-10-07  0:37                 ` Chen Yu
  2023-10-07  6:26                   ` Biju Das
  2023-10-07  6:24                 ` Biju Das
  2023-10-09  7:53                 ` [tip: sched/urgent] sched/eevdf: Fix min_deadline heap integrity tip-bot2 for Peter Zijlstra
  4 siblings, 1 reply; 30+ messages in thread
From: Chen Yu @ 2023-10-07  0:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Marek Szyprowski, Biju Das, bsegall, bristot,
	chris.hyser, corbet, dietmar.eggemann, gregkh, joel, joshdon,
	juri.lelli, kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, mgorman, linux-renesas-soc

On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> 
> > And yeah, min_deadline is hosed somehow.
> > 
> > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
> > migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
> > migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
> > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > 
> > I need to go make dinner (kids hungry), but I'll see if I can figure out
> > how this happens...
> 
> *sigh*, does the below help?
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>  		 */
>  		deadline = div_s64(deadline * old_weight, weight);
>  		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>  	}
>  
>  #ifdef CONFIG_SMP

Is it because without this patch, the se->deadline is not always synced with se->min_deadline,
then in pick_eevdf() the following condition could not be met, thus we get a NULL candidate
and has to pick the leftmost one?
	if (se->deadline == se->min_deadline)

Regarding the circular locking warning triggered by printk, does it mean we should not get a
NULL candidate from __pick_eevdf() in theory? And besides, we should not use printk with rq lock
hold?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2023-10-07  0:37                 ` Chen Yu
@ 2023-10-07  6:24                 ` Biju Das
  2023-10-09  7:53                 ` [tip: sched/urgent] sched/eevdf: Fix min_deadline heap integrity tip-bot2 for Peter Zijlstra
  4 siblings, 0 replies; 30+ messages in thread
From: Biju Das @ 2023-10-07  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith
  Cc: Marek Szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

Hi Peter Zijlstra,


> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> 
> > And yeah, min_deadline is hosed somehow.
> >
> > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > migration/28-185     [028] d..2.    70.264277: __print_se:
> ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> 66302085 E (11372/tr)
> > migration/28-185     [028] d..2.    70.264280: __print_se:
> ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> 66302085 E (-1//autogroup-31)
> > migration/28-185     [028] d..2.    70.264282: __print_se:
> ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> (-1//autogroup-33)
> > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> >
> > I need to go make dinner (kids hungry), but I'll see if I can figure
> > out how this happens...
> 
> *sigh*, does the below help?

I confirm the message "EEVDF scheduling fail, picking leftmost" is not appearing with this patch.

Cheers,
Biju
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> struct sched_entity *se,
>  		 */
>  		deadline = div_s64(deadline * old_weight, weight);
>  		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>  	}
> 
>  #ifdef CONFIG_SMP

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  0:37                 ` Chen Yu
@ 2023-10-07  6:26                   ` Biju Das
  2023-10-07  6:42                     ` Chen Yu
  0 siblings, 1 reply; 30+ messages in thread
From: Biju Das @ 2023-10-07  6:26 UTC (permalink / raw)
  To: Chen Yu, Peter Zijlstra
  Cc: Mike Galbraith, Marek Szyprowski, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, mgorman, linux-renesas-soc

Hi Chen Yu,

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> >
> > > And yeah, min_deadline is hosed somehow.
> > >
> > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > migration/28-185     [028] d..2.    70.264277: __print_se:
> ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> 66302085 E (11372/tr)
> > > migration/28-185     [028] d..2.    70.264280: __print_se:
> ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> 66302085 E (-1//autogroup-31)
> > > migration/28-185     [028] d..2.    70.264282: __print_se:
> ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> (-1//autogroup-33)
> > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > >
> > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > out how this happens...
> >
> > *sigh*, does the below help?
> >
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > 04fbcbda97d5..6a670f119efa 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> struct sched_entity *se,
> >  		 */
> >  		deadline = div_s64(deadline * old_weight, weight);
> >  		se->deadline = se->vruntime + deadline;
> > +		min_deadline_cb_propagate(&se->run_node, NULL);
> >  	}
> >
> >  #ifdef CONFIG_SMP
> 
> Is it because without this patch, the se->deadline is not always synced
> with se->min_deadline, then in pick_eevdf() the following condition could
> not be met, thus we get a NULL candidate and has to pick the leftmost one?
> 	if (se->deadline == se->min_deadline)
> 
> Regarding the circular locking warning triggered by printk, does it mean we
> should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> we should not use printk with rq lock hold?

Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  6:26                   ` Biju Das
@ 2023-10-07  6:42                     ` Chen Yu
  2023-10-09 14:27                       ` Phil Auld
  0 siblings, 1 reply; 30+ messages in thread
From: Chen Yu @ 2023-10-07  6:42 UTC (permalink / raw)
  To: Biju Das
  Cc: Peter Zijlstra, Mike Galbraith, Marek Szyprowski, bsegall,
	bristot, chris.hyser, corbet, dietmar.eggemann, gregkh, joel,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

Hi Biju,

On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> Hi Chen Yu,
> 
> > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > se
> > 
> > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > >
> > > > And yeah, min_deadline is hosed somehow.
> > > >
> > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > 66302085 E (11372/tr)
> > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > 66302085 E (-1//autogroup-31)
> > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > (-1//autogroup-33)
> > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > >
> > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > out how this happens...
> > >
> > > *sigh*, does the below help?
> > >
> > > ---
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > 04fbcbda97d5..6a670f119efa 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > struct sched_entity *se,
> > >  		 */
> > >  		deadline = div_s64(deadline * old_weight, weight);
> > >  		se->deadline = se->vruntime + deadline;
> > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > >  	}
> > >
> > >  #ifdef CONFIG_SMP
> > 
> > Is it because without this patch, the se->deadline is not always synced
> > with se->min_deadline, then in pick_eevdf() the following condition could
> > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > 	if (se->deadline == se->min_deadline)
> > 
> > Regarding the circular locking warning triggered by printk, does it mean we
> > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > we should not use printk with rq lock hold?
> 
> Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> 

Yes, it is a useful log. I was trying to figure out the safe scenario to use
printk.

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip: sched/urgent] sched/eevdf: Fix min_deadline heap integrity
  2023-10-06 19:24               ` Peter Zijlstra
                                   ` (3 preceding siblings ...)
  2023-10-07  6:24                 ` Biju Das
@ 2023-10-09  7:53                 ` tip-bot2 for Peter Zijlstra
  4 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-10-09  7:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Marek Szyprowski, Biju Das, Mike Galbraith,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     8dafa9d0eb1a1550a0f4d462db9354161bc51e0c
Gitweb:        https://git.kernel.org/tip/8dafa9d0eb1a1550a0f4d462db9354161bc51e0c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Fri, 06 Oct 2023 21:24:45 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 09 Oct 2023 09:48:32 +02:00

sched/eevdf: Fix min_deadline heap integrity

Marek and Biju reported instances of:

  "EEVDF scheduling fail, picking leftmost"

which Mike correlated with cgroup scheduling and the min_deadline heap
getting corrupted; some trace output confirms:

> And yeah, min_deadline is hosed somehow:
>
>    validate_cfs_rq: --- /
>    __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
>    __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
>    __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
>    validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256

Turns out that reweight_entity(), which tries really hard to be fast,
does not do the normal dequeue+update+enqueue pattern but *does* scale
the deadline.

However, it then fails to propagate the updated deadline value up the
heap.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Biju Das <biju.das.jz@bp.renesas.com>
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Link: https://lkml.kernel.org/r/20231006192445.GE743@noisy.programming.kicks-ass.net
---
 kernel/sched/fair.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ef7490c..a4b904a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3613,6 +3613,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 		 */
 		deadline = div_s64(deadline * old_weight, weight);
 		se->deadline = se->vruntime + deadline;
+		min_deadline_cb_propagate(&se->run_node, NULL);
 	}
 
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  6:42                     ` Chen Yu
@ 2023-10-09 14:27                       ` Phil Auld
  2023-10-10  2:08                         ` Chen Yu
  0 siblings, 1 reply; 30+ messages in thread
From: Phil Auld @ 2023-10-09 14:27 UTC (permalink / raw)
  To: Chen Yu
  Cc: Biju Das, Peter Zijlstra, Mike Galbraith, Marek Szyprowski,
	bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

On Sat, Oct 07, 2023 at 02:42:10PM +0800 Chen Yu wrote:
> Hi Biju,
> 
> On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> > Hi Chen Yu,
> > 
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > > se
> > > 
> > > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > > >
> > > > > And yeah, min_deadline is hosed somehow.
> > > > >
> > > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > > 66302085 E (11372/tr)
> > > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > > 66302085 E (-1//autogroup-31)
> > > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > > (-1//autogroup-33)
> > > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > > >
> > > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > > out how this happens...
> > > >
> > > > *sigh*, does the below help?
> > > >
> > > > ---
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > > 04fbcbda97d5..6a670f119efa 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > > struct sched_entity *se,
> > > >  		 */
> > > >  		deadline = div_s64(deadline * old_weight, weight);
> > > >  		se->deadline = se->vruntime + deadline;
> > > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > > >  	}
> > > >
> > > >  #ifdef CONFIG_SMP
> > > 
> > > Is it because without this patch, the se->deadline is not always synced
> > > with se->min_deadline, then in pick_eevdf() the following condition could
> > > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > > 	if (se->deadline == se->min_deadline)
> > > 
> > > Regarding the circular locking warning triggered by printk, does it mean we
> > > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > > we should not use printk with rq lock hold?
> > 
> > Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> > 
> 
> Yes, it is a useful log. I was trying to figure out the safe scenario to use
> printk.
>

You should not use printk with a rq lock held as some console implementations
(framebuffer etc) call schedule_work() which loops back into the scheduler
and bad things happen.

Some places in the scheduler use printk_deferred() when needed. 

I think this will be better when the RT printk stuff is fully merged. 

Cheers,
Phil

> thanks,
> Chenyu
> 

-- 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-09 14:27                       ` Phil Auld
@ 2023-10-10  2:08                         ` Chen Yu
  0 siblings, 0 replies; 30+ messages in thread
From: Chen Yu @ 2023-10-10  2:08 UTC (permalink / raw)
  To: Phil Auld
  Cc: Biju Das, Peter Zijlstra, Mike Galbraith, Marek Szyprowski,
	bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

Hi Phil,

On 2023-10-09 at 10:27:40 -0400, Phil Auld wrote:
> On Sat, Oct 07, 2023 at 02:42:10PM +0800 Chen Yu wrote:
> > Hi Biju,
> > 
> > On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> > > Hi Chen Yu,
> > > 
> > > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > > > se
> > > > 
> > > > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > > > >
> > > > > > And yeah, min_deadline is hosed somehow.
> > > > > >
> > > > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > > > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > > > 66302085 E (11372/tr)
> > > > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > > > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > > > 66302085 E (-1//autogroup-31)
> > > > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > > > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > > > (-1//autogroup-33)
> > > > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > > > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > > > >
> > > > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > > > out how this happens...
> > > > >
> > > > > *sigh*, does the below help?
> > > > >
> > > > > ---
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > > > 04fbcbda97d5..6a670f119efa 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > > > struct sched_entity *se,
> > > > >  		 */
> > > > >  		deadline = div_s64(deadline * old_weight, weight);
> > > > >  		se->deadline = se->vruntime + deadline;
> > > > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > > > >  	}
> > > > >
> > > > >  #ifdef CONFIG_SMP
> > > > 
> > > > Is it because without this patch, the se->deadline is not always synced
> > > > with se->min_deadline, then in pick_eevdf() the following condition could
> > > > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > > > 	if (se->deadline == se->min_deadline)
> > > > 
> > > > Regarding the circular locking warning triggered by printk, does it mean we
> > > > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > > > we should not use printk with rq lock hold?
> > > 
> > > Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> > > 
> > 
> > Yes, it is a useful log. I was trying to figure out the safe scenario to use
> > printk.
> >
> 
> You should not use printk with a rq lock held as some console implementations
> (framebuffer etc) call schedule_work() which loops back into the scheduler
> and bad things happen.
> 
> Some places in the scheduler use printk_deferred() when needed. 
> 
> I think this will be better when the RT printk stuff is fully merged. 
>

I see, got it! Thanks for the education.

thanks,
Chenyu 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-13  3:46             ` Abel Wu
@ 2023-10-13 16:51               ` Benjamin Segall
  0 siblings, 0 replies; 30+ messages in thread
From: Benjamin Segall @ 2023-10-13 16:51 UTC (permalink / raw)
  To: Abel Wu
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx

Abel Wu <wuyun.abel@bytedance.com> writes:

> On 10/13/23 1:51 AM, Benjamin Segall Wrote:
>> Abel Wu <wuyun.abel@bytedance.com> writes:
>> 
>>> On 10/12/23 5:01 AM, Benjamin Segall Wrote:
>>>> Abel Wu <wuyun.abel@bytedance.com> writes:
>>>>
>>>>> On 9/30/23 8:09 AM, Benjamin Segall Wrote:
>>>>>> +	/*
>>>>>> +	 * Now best_left and all of its children are eligible, and we are just
>>>>>> +	 * looking for deadline == min_deadline
>>>>>> +	 */
>>>>>> +	node = &best_left->run_node;
>>>>>> +	while (node) {
>>>>>> +		struct sched_entity *se = __node_2_se(node);
>>>>>> +
>>>>>> +		/* min_deadline is the current node */
>>>>>> +		if (se->deadline == se->min_deadline)
>>>>>> +			return se;
>>>>>
>>>>> IMHO it would be better tiebreak on vruntime by moving this hunk to ..
>>>>>
>>>>>> +
>>>>>> +		/* min_deadline is in the left branch */
>>>>>>     		if (node->rb_left &&
>>>>>>     		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
>>>>>>     			node = node->rb_left;
>>>>>>     			continue;
>>>>>>     		}
>>>>>
>>>>> .. here, thoughts?
>>>> Yeah, that should work and be better on the tiebreak (and my test code
>>>> agrees). There's an argument that the tiebreak will never really come up
>>>> and it's better to avoid the potential one extra cache line from
>>>> "__node_2_se(node->rb_left)->min_deadline" though.
>>>
>>> I see. Then probably do the same thing in the first loop?
>>>
>> We effectively do that already sorta by accident almost always -
>> computing best and best_left via deadline_gt rather than gte prioritizes
>> earlier elements, which always have a better vruntime.
>
> Sorry for not clarifying clearly about the 'same thing'. What I meant
> was to avoid touch left if the node itself has the min deadline.
>
> @@ -894,6 +894,9 @@ static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
>                 if (!best || deadline_gt(deadline, best, se))
>                         best = se;
>
> +               if (se->deadline == se->min_deadline)
> +                       break;
> +
>                 /*
>                  * Every se in a left branch is eligible, keep track of the
>                  * branch with the best min_deadline
> @@ -913,10 +916,6 @@ static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
>                                 break;
>                 }
>
> -               /* min_deadline is at this node, no need to look right */
> -               if (se->deadline == se->min_deadline)
> -                       break;
> -
>                 /* else min_deadline is in the right branch. */
>                 node = node->rb_right;
>         }
>
> (But still thanks for the convincing explanation on fairness.)
>

Ah, yes, in terms of optimizing performance rather than marginal
fairness, that would help.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-12 10:25         ` Abel Wu
@ 2023-10-12 17:51           ` Benjamin Segall
  2023-10-13  3:46             ` Abel Wu
  0 siblings, 1 reply; 30+ messages in thread
From: Benjamin Segall @ 2023-10-12 17:51 UTC (permalink / raw)
  To: Abel Wu
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx

Abel Wu <wuyun.abel@bytedance.com> writes:

> On 10/12/23 5:01 AM, Benjamin Segall Wrote:
>> Abel Wu <wuyun.abel@bytedance.com> writes:
>> 
>>> On 9/30/23 8:09 AM, Benjamin Segall Wrote:
>>>> +	/*
>>>> +	 * Now best_left and all of its children are eligible, and we are just
>>>> +	 * looking for deadline == min_deadline
>>>> +	 */
>>>> +	node = &best_left->run_node;
>>>> +	while (node) {
>>>> +		struct sched_entity *se = __node_2_se(node);
>>>> +
>>>> +		/* min_deadline is the current node */
>>>> +		if (se->deadline == se->min_deadline)
>>>> +			return se;
>>>
>>> IMHO it would be better tiebreak on vruntime by moving this hunk to ..
>>>
>>>> +
>>>> +		/* min_deadline is in the left branch */
>>>>    		if (node->rb_left &&
>>>>    		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
>>>>    			node = node->rb_left;
>>>>    			continue;
>>>>    		}
>>>
>>> .. here, thoughts?
>> Yeah, that should work and be better on the tiebreak (and my test code
>> agrees). There's an argument that the tiebreak will never really come up
>> and it's better to avoid the potential one extra cache line from
>> "__node_2_se(node->rb_left)->min_deadline" though.
>
> I see. Then probably do the same thing in the first loop?
>

We effectively do that already sorta by accident almost always -
computing best and best_left via deadline_gt rather than gte prioritizes
earlier elements, which always have a better vruntime.

Then when we do the best_left->min_deadline vs best->deadline
computation, we prioritize best_left, which is the one case it can be
wrong, we'd need an additional
"if (se->min_deadline == best->deadline &&
(s64)(se->vruntime - best->vruntime) > 0) return best;" check at the end
of the second loop.

(Though again I don't know how much this sort of never-going-to-happen
slight fairness improvement is worth compared to the extra bit of
overhead)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-11 12:12     ` Abel Wu
  2023-10-11 13:14       ` Peter Zijlstra
@ 2023-10-11 21:01       ` Benjamin Segall
  2023-10-12 10:25         ` Abel Wu
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Segall @ 2023-10-11 21:01 UTC (permalink / raw)
  To: Abel Wu
  Cc: Peter Zijlstra, mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx

[-- Attachment #1: Type: text/plain, Size: 5024 bytes --]

Abel Wu <wuyun.abel@bytedance.com> writes:

> On 9/30/23 8:09 AM, Benjamin Segall Wrote:
>> +	/*
>> +	 * Now best_left and all of its children are eligible, and we are just
>> +	 * looking for deadline == min_deadline
>> +	 */
>> +	node = &best_left->run_node;
>> +	while (node) {
>> +		struct sched_entity *se = __node_2_se(node);
>> +
>> +		/* min_deadline is the current node */
>> +		if (se->deadline == se->min_deadline)
>> +			return se;
>
> IMHO it would be better tiebreak on vruntime by moving this hunk to ..
>
>> +
>> +		/* min_deadline is in the left branch */
>>   		if (node->rb_left &&
>>   		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
>>   			node = node->rb_left;
>>   			continue;
>>   		}
>
> .. here, thoughts?

Yeah, that should work and be better on the tiebreak (and my test code
agrees). There's an argument that the tiebreak will never really come up
and it's better to avoid the potential one extra cache line from
"__node_2_se(node->rb_left)->min_deadline" though.

>
>>   +		/* else min_deadline is in the right branch */
>>   		node = node->rb_right;
>>   	}
>> +	return NULL;
>
> Why not 'best'? Since ..

The only time this can happen is if the tree is corrupt. We only reach
this case if best_left is set at all (and best_left's min_deadline is
better than "best"'s, which includes curr). In that case getting an
error message is good, and in general I wasn't worrying about it much.

>
>> +}
>>   -	if (!best || (curr && deadline_gt(deadline, best, curr)))
>> -		best = curr;
>> +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
>> +{
>> +	struct sched_entity *se = __pick_eevdf(cfs_rq);
>>   -	if (unlikely(!best)) {
>> +	if (!se) {
>>   		struct sched_entity *left = __pick_first_entity(cfs_rq);
>
> .. cfs_rq->curr isn't considered here.

That said, we should probably consider curr here in the error-case
fallback, if just as a "if (!left) left = cfs_rq->curr;"

>
>>   		if (left) {
>>   			pr_err("EEVDF scheduling fail, picking leftmost\n");
>>   			return left;
>>   		}
>>   	}
>>   -	return best;
>> +	return se;
>>   }
>>     #ifdef CONFIG_SCHED_DEBUG
>>   struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
>>   {
>
> The implementation of __pick_eevdf() now is quite complicated which
> makes it really hard to maintain. I'm trying my best to refactor this
> part, hopefully can do some help. Below is only for illustration, I
> need to test more.
>

A passing version of that single-loop code minus the cfs_rq->curr
handling is here (just pasting the curr handling back on the start/end
should do):

static struct sched_entity *pick_eevdf_abel(struct cfs_rq *cfs_rq)
{
	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
	struct sched_entity *best = NULL;
	struct sched_entity *cand = NULL;
	bool all_eligible = false;

	while (node || cand) {
		struct sched_entity *se = __node_2_se(node);
		if (!node) {
			BUG_ON(!cand);
			node = &cand->run_node;
			se = __node_2_se(node);
			all_eligible = true;
			cand = NULL;

			/*
			 * Our initial pass ran into an eligible node which is
			 * itself the best
			 */
			if (best && (s64)(se->min_deadline - best->deadline) > 0)
				break;
		}

		/*
		 * If this entity is not eligible, try the left subtree.
		 */
		if (!all_eligible && !entity_eligible(cfs_rq, se)) {
			node = node->rb_left;
			continue;
		}

		if (node->rb_left) {
			struct sched_entity *left = __node_2_se(node->rb_left);

			BUG_ON(left->min_deadline < se->min_deadline);

			/* Tiebreak on vruntime */
			if (left->min_deadline == se->min_deadline) {
				node = node->rb_left;
				all_eligible = true;
				continue;
			}

			if (!all_eligible) {
				/*
				 * We're going to search right subtree and the one
				 * with min_deadline can be non-eligible, so record
				 * the left subtree as a candidate.
				 */
				if (!cand || deadline_gt(min_deadline, cand, left))
					cand = left;
			}
		}

		if (!all_eligible && (!best || deadline_gt(deadline, best, se)))
			best = se;

		/* min_deadline is at this node, no need to look right */
		if (se->deadline == se->min_deadline) {
			if (all_eligible && (!best || deadline_gte(deadline, best, se)))
				best = se;
			if (!all_eligible && (!best || deadline_gt(deadline, best, se)))
				best = se;
			node = NULL;
			continue;
		}

		node = node->rb_right;
	}

	return best;
}

This does exactly as well on the tiebreak condition of vruntime as the
improved two-loop version you suggested. If we don't care about the
tiebreak we can replace the ugly if(all_eligible) if(!all_eligible) in
the last section with a single version that just uses deadline_gt (or
gte) throughout.

Personally with all the extra bits I added for correctness this winds up
definitely uglier than the two-loop version, but it might be possible to
clean it up further. (And a bunch of it is just personal style
preference in the first place)

I've also attached my ugly userspace EEVDF tester as an attachment,
hopefully I attached it in a correct mode to go through lkml.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: EEVDF tester --]
[-- Type: text/x-diff, Size: 18135 bytes --]

diff --git a/tools/testing/selftests/sched/Makefile b/tools/testing/selftests/sched/Makefile
index 099ee9213557..10e38c7afe5e 100644
--- a/tools/testing/selftests/sched/Makefile
+++ b/tools/testing/selftests/sched/Makefile
@@ -6,9 +6,11 @@ endif
 
 CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -Wl,-rpath=./ \
 	  $(CLANG_FLAGS)
 LDLIBS += -lpthread
 
-TEST_GEN_FILES := cs_prctl_test
-TEST_PROGS := cs_prctl_test
+TEST_GEN_FILES := cs_prctl_test sim_rbtree
+TEST_PROGS := cs_prctl_test sim_rbtree
 
 include ../lib.mk
+
+CFLAGS += -I$(top_srcdir)/tools/include
diff --git a/tools/testing/selftests/sched/sim_rbtree.c b/tools/testing/selftests/sched/sim_rbtree.c
new file mode 100644
index 000000000000..dad2544e4d9d
--- /dev/null
+++ b/tools/testing/selftests/sched/sim_rbtree.c
@@ -0,0 +1,681 @@
+#include "linux/rbtree_augmented.h"
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+#include <stdlib.h>
+#include <assert.h>
+
+#include "../../../lib/rbtree.c"
+
+static inline s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder)
+{
+	*remainder = dividend % divisor;
+	return dividend / divisor;
+}
+static inline s64 div_s64(s64 dividend, s32 divisor)
+{
+	s32 remainder;
+	return div_s64_rem(dividend, divisor, &remainder);
+}
+
+static __always_inline struct rb_node *
+rb_add_augmented_cached(struct rb_node *node, struct rb_root_cached *tree,
+			bool (*less)(struct rb_node *, const struct rb_node *),
+			const struct rb_augment_callbacks *augment)
+{
+	struct rb_node **link = &tree->rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	bool leftmost = true;
+
+	while (*link) {
+		parent = *link;
+		if (less(node, parent)) {
+			link = &parent->rb_left;
+		} else {
+			link = &parent->rb_right;
+			leftmost = false;
+		}
+	}
+
+	rb_link_node(node, parent, link);
+	augment->propagate(parent, NULL); /* suboptimal */
+	rb_insert_augmented_cached(node, tree, leftmost, augment);
+
+	return leftmost ? node : NULL;
+}
+
+
+# define SCHED_FIXEDPOINT_SHIFT		10
+# define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+# define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)
+# define scale_load_down(w) \
+({ \
+	unsigned long __w = (w); \
+	if (__w) \
+		__w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \
+	__w; \
+})
+
+struct load_weight {
+	unsigned long			weight;
+	u32				inv_weight;
+};
+
+struct sched_entity {
+	char name;
+	struct load_weight		load;
+	struct rb_node			run_node;
+	u64				deadline;
+	u64				min_deadline;
+
+	unsigned int			on_rq;
+
+	u64				vruntime;
+	s64				vlag;
+	u64				slice;
+};
+
+struct cfs_rq {
+	struct load_weight	load;
+	s64			avg_vruntime;
+	u64			avg_load;
+	u64			min_vruntime;
+	struct rb_root_cached	tasks_timeline;
+	struct sched_entity	*curr;
+};
+
+void print_se(char *label, struct sched_entity *se);
+
+static inline bool entity_before(const struct sched_entity *a,
+				 const struct sched_entity *b)
+{
+	return (s64)(a->vruntime - b->vruntime) < 0;
+}
+
+static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	return (s64)(se->vruntime - cfs_rq->min_vruntime);
+}
+
+#define __node_2_se(node) \
+	rb_entry((node), struct sched_entity, run_node)
+
+int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	struct sched_entity *curr = cfs_rq->curr;
+	s64 avg = cfs_rq->avg_vruntime;
+	long load = cfs_rq->avg_load;
+
+	if (curr && curr->on_rq) {
+		unsigned long weight = scale_load_down(curr->load.weight);
+
+		avg += entity_key(cfs_rq, curr) * weight;
+		load += weight;
+	}
+
+	return avg >= entity_key(cfs_rq, se) * load;
+}
+
+static inline bool __entity_less(struct rb_node *a, const struct rb_node *b)
+{
+	return entity_before(__node_2_se(a), __node_2_se(b));
+}
+
+#define deadline_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field) > 0; })
+
+static inline void __update_min_deadline(struct sched_entity *se, struct rb_node *node)
+{
+	if (node) {
+		struct sched_entity *rse = __node_2_se(node);
+		if (deadline_gt(min_deadline, se, rse))
+			se->min_deadline = rse->min_deadline;
+	}
+}
+
+/*
+ * se->min_deadline = min(se->deadline, left->min_deadline, right->min_deadline)
+ */
+static inline bool min_deadline_update(struct sched_entity *se, bool exit)
+{
+	u64 old_min_deadline = se->min_deadline;
+	struct rb_node *node = &se->run_node;
+
+	se->min_deadline = se->deadline;
+	__update_min_deadline(se, node->rb_right);
+	__update_min_deadline(se, node->rb_left);
+
+	return se->min_deadline == old_min_deadline;
+}
+
+static void
+avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	unsigned long weight = scale_load_down(se->load.weight);
+	s64 key = entity_key(cfs_rq, se);
+
+	cfs_rq->avg_vruntime += key * weight;
+	cfs_rq->avg_load += weight;
+}
+
+static void
+avg_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	unsigned long weight = scale_load_down(se->load.weight);
+	s64 key = entity_key(cfs_rq, se);
+
+	cfs_rq->avg_vruntime -= key * weight;
+	cfs_rq->avg_load -= weight;
+}
+
+static inline
+void avg_vruntime_update(struct cfs_rq *cfs_rq, s64 delta)
+{
+	/*
+	 * v' = v + d ==> avg_vruntime' = avg_runtime - d*avg_load
+	 */
+	cfs_rq->avg_vruntime -= cfs_rq->avg_load * delta;
+}
+
+u64 avg_vruntime(struct cfs_rq *cfs_rq)
+{
+	struct sched_entity *curr = cfs_rq->curr;
+	s64 avg = cfs_rq->avg_vruntime;
+	long load = cfs_rq->avg_load;
+
+	if (curr && curr->on_rq) {
+		unsigned long weight = scale_load_down(curr->load.weight);
+
+		avg += entity_key(cfs_rq, curr) * weight;
+		load += weight;
+	}
+
+	if (load)
+		avg = div_s64(avg, load);
+
+	return cfs_rq->min_vruntime + avg;
+}
+
+
+RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct sched_entity,
+		     run_node, min_deadline, min_deadline_update);
+
+/*
+ * Enqueue an entity into the rb-tree:
+ */
+static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	avg_vruntime_add(cfs_rq, se);
+	se->min_deadline = se->deadline;
+	rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
+				__entity_less, &min_deadline_cb);
+}
+
+void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline,
+				  &min_deadline_cb);
+	avg_vruntime_sub(cfs_rq, se);
+}
+
+struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
+{
+	struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
+
+	if (!left)
+		return NULL;
+
+	return __node_2_se(left);
+}
+
+
+static struct sched_entity *pick_eevdf_orig(struct cfs_rq *cfs_rq)
+{
+	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+	struct sched_entity *curr = cfs_rq->curr;
+	struct sched_entity *best = NULL;
+
+	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
+		curr = NULL;
+
+	/*
+	 * Once selected, run a task until it either becomes non-eligible or
+	 * until it gets a new slice. See the HACK in set_next_entity().
+	 */
+	//if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
+	//return curr;
+
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/*
+		 * If this entity is not eligible, try the left subtree.
+		 */
+		if (!entity_eligible(cfs_rq, se)) {
+			node = node->rb_left;
+			continue;
+		}
+
+		/*
+		 * If this entity has an earlier deadline than the previous
+		 * best, take this one. If it also has the earliest deadline
+		 * of its subtree, we're done.
+		 */
+		if (!best || deadline_gt(deadline, best, se)) {
+			best = se;
+			if (best->deadline == best->min_deadline)
+				break;
+		}
+
+		/*
+		 * If the earlest deadline in this subtree is in the fully
+		 * eligible left half of our space, go there.
+		 */
+		if (node->rb_left &&
+		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
+			node = node->rb_left;
+			continue;
+		}
+
+		node = node->rb_right;
+	}
+
+	return best;
+}
+
+#if 1
+#define print_se(...)
+#define printf(...)
+#endif
+
+static struct sched_entity *pick_eevdf_improved(struct cfs_rq *cfs_rq)
+{
+	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+	struct sched_entity *best_left = NULL;
+	struct sched_entity *best = NULL;
+
+	if (!node)
+		return NULL;
+
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+		print_se("search1", se);
+
+		/*
+		 * If this entity is not eligible, try the left subtree.
+		 */
+		if (!entity_eligible(cfs_rq, se)) {
+			node = node->rb_left;
+			printf("not eligible\n");
+			continue;
+		}
+
+		/*
+		 * Now we heap search eligible trees the best (min_)deadline
+		 */
+		if (!best || deadline_gt(deadline, best, se)) {
+			print_se("best found", se);
+			best = se;
+		}
+
+		/*
+		 * Every se in a left branch is eligible, keep track of the one
+		 * with the best min_deadline
+		 */
+		if (node->rb_left) {
+			struct sched_entity *left = __node_2_se(node->rb_left);
+			print_se("going right, has left", left);
+			if (!best_left || deadline_gt(min_deadline, best_left, left)) {
+				printf("new best left\n");
+				best_left = left;
+			}
+
+			/*
+			 * min_deadline is in the left branch. rb_left and all
+			 * descendants are eligible, so immediately switch to the second
+			 * loop.
+			 */
+			if (left->min_deadline == se->min_deadline) {
+				printf("left");
+				break;
+			}
+		}
+
+		/* min_deadline is at node, no need to look right */
+		if (se->deadline == se->min_deadline) {
+			printf("found\n");
+			break;
+		}
+
+		/* else min_deadline is in the right branch. */
+		BUG_ON(__node_2_se(node->rb_right)->min_deadline != se->min_deadline);
+		node = node->rb_right;
+	}
+	BUG_ON(!best);
+	print_se("best_left", best_left);
+	print_se("best", best);
+
+	/* We ran into an eligible node which is itself the best */
+	if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0)
+		return best;
+	
+	/*
+	 * Now best_left and all of its children are eligible, and we are just
+	 * looking for deadline == min_deadline
+	 */
+	node = &best_left->run_node;
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/* min_deadline is the current node */
+		if (se->deadline == se->min_deadline)
+			return se;
+
+		/* min_deadline is in the left branch */
+		if (node->rb_left &&
+		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
+			node = node->rb_left;
+			continue;
+		}
+
+		/* else min_deadline is in the right branch */
+		BUG_ON(__node_2_se(node->rb_right)->min_deadline != se->min_deadline);
+		node = node->rb_right;
+	}
+	BUG();
+	return NULL;
+}
+
+#undef printf
+#undef print_se
+
+static struct sched_entity *pick_eevdf_improved_ndebug(struct cfs_rq *cfs_rq)
+{
+	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+	struct sched_entity *best_left = NULL;
+	struct sched_entity *best = NULL;
+
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/*
+		 * If this entity is not eligible, try the left subtree.
+		 */
+		if (!entity_eligible(cfs_rq, se)) {
+			node = node->rb_left;
+			continue;
+		}
+
+		/*
+		 * Now we heap search eligible trees the best (min_)deadline
+		 */
+		if (!best || deadline_gt(deadline, best, se))
+			best = se;
+
+		/*
+		 * Every se in a left branch is eligible, keep track of the one
+		 * with the best min_deadline
+		 */
+		if (node->rb_left) {
+			struct sched_entity *left = __node_2_se(node->rb_left);
+			if (!best_left || deadline_gt(min_deadline, best_left, left))
+				best_left = left;
+
+			/*
+			 * min_deadline is in the left branch. rb_left and all
+			 * descendants are eligible, so immediately switch to the second
+			 * loop.
+			 */
+			if (left->min_deadline == se->min_deadline)
+				break;
+		}
+
+		/* min_deadline is at node, no need to look right */
+		if (se->deadline == se->min_deadline)
+			break;
+
+		/* else min_deadline is in the right branch. */
+		BUG_ON(__node_2_se(node->rb_right)->min_deadline != se->min_deadline);
+		node = node->rb_right;
+	}
+	BUG_ON(!best && best_left);
+
+	/* We ran into an eligible node which is itself the best */
+	if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0)
+		return best;
+	
+	/*
+	 * Now best_left and all of its children are eligible, and we are just
+	 * looking for deadline == min_deadline
+	 */
+	node = &best_left->run_node;
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/* min_deadline is in the left branch */
+		if (node->rb_left &&
+		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
+			node = node->rb_left;
+			continue;
+		}
+
+		/* min_deadline is the current node */
+		if (se->deadline == se->min_deadline)
+			return se;
+
+		/* else min_deadline is in the right branch */
+		BUG_ON(__node_2_se(node->rb_right)->min_deadline != se->min_deadline);
+		node = node->rb_right;
+	}
+	BUG();
+	return NULL;
+}
+
+static struct sched_entity *pick_eevdf_abel(struct cfs_rq *cfs_rq)
+{
+	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+	struct sched_entity *best = NULL;
+	struct sched_entity *cand = NULL;
+	bool all_eligible = false;
+
+	while (node || cand) {
+		struct sched_entity *se = __node_2_se(node);
+		if (!node) {
+			BUG_ON(!cand);
+			node = &cand->run_node;
+			se = __node_2_se(node);
+			all_eligible = true;
+			cand = NULL;
+
+			/*
+			 * Our initial pass ran into an eligible node which is
+			 * itself the best
+			 */
+			if (best && (s64)(se->min_deadline - best->deadline) > 0)
+				break;
+		}
+
+		/*
+		 * If this entity is not eligible, try the left subtree.
+		 */
+		if (!all_eligible && !entity_eligible(cfs_rq, se)) {
+			node = node->rb_left;
+			continue;
+		}
+		if (!all_eligible && (!best || deadline_gt(deadline, best, se)))
+			best = se;
+
+		if (node->rb_left) {
+			struct sched_entity *left = __node_2_se(node->rb_left);
+
+			BUG_ON(left->min_deadline < se->min_deadline);
+
+			/* Tiebreak on vruntime */
+			if (left->min_deadline == se->min_deadline) {
+				node = node->rb_left;
+				all_eligible = true;
+				continue;
+			}
+
+			if (!all_eligible) {
+				/*
+				 * We're going to search right subtree and the one
+				 * with min_deadline can be non-eligible, so record
+				 * the left subtree as a candidate.
+				 */
+				if (!cand || deadline_gt(min_deadline, cand, left))
+					cand = left;
+			}
+		}
+
+		/* min_deadline is at this node, no need to look right */
+		if (se->deadline == se->min_deadline) {
+			if (!best || deadline_gt(deadline, best, se))
+				best = se;
+			node = NULL;
+			continue;
+		}
+
+		node = node->rb_right;
+	}
+
+	return best;
+}
+
+
+
+void init_se(struct cfs_rq *cfs_rq, struct sched_entity *se, char name, u64 vruntime, u64 deadline) {
+	memset(se, 0, sizeof(*se));
+	se->name = name;
+	se->slice = 1;
+	se->load.weight = 1024;
+	se->vruntime = vruntime;
+	se->deadline = deadline;
+	__enqueue_entity(cfs_rq, se);
+}
+
+void print_se(char *label, struct sched_entity *se) {
+	if (!se) {
+		printf("%s is null\n", label);
+	} else {
+		struct rb_node *parent = rb_parent(&se->run_node);
+		printf("%s(%c) vrun %ld dl %ld, min_dl %ld, parent: %c\n", label, se->name,
+		       se->vruntime, se->deadline, se->min_deadline, parent ? __node_2_se(parent)->name : ' ');
+	}
+}
+
+struct sched_entity *correct_pick_eevdf(struct cfs_rq *cfs_rq) {
+	struct rb_node *node = rb_first_cached(&cfs_rq->tasks_timeline);
+	struct sched_entity *best = NULL;
+
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/*
+		 * If this entity is not eligible, try the left subtree.
+		 */
+		if (!entity_eligible(cfs_rq, se)) {
+			return best;
+		}
+
+		/*
+		 * If this entity has an earlier deadline than the previous
+		 * best, take this one. If it also has the earliest deadline
+		 * of its subtree, we're done.
+		 */
+		if (!best || deadline_gt(deadline, best, se)) {
+			best = se;
+		}
+
+		node = rb_next(node);
+	}
+	return best;
+}
+
+
+void test_pick_function(struct sched_entity *(*pick_fn)(struct cfs_rq *), unsigned long skip_to) {
+#define MAX_SIZE 26
+	int size;
+	unsigned long n = 0;
+	struct sched_entity se[MAX_SIZE];
+	for (size = 0; size < MAX_SIZE; size++) {
+		int runs = 100000;
+		int count;
+		if (size <= 1)
+			runs = 1;
+		else if (size == 2)
+			runs = 100;
+		for (count = 0; count < runs; count++) {
+			int i;
+			struct cfs_rq cfs_rq = {0};
+			struct sched_entity *pick, *correct_pick;
+			cfs_rq.tasks_timeline = RB_ROOT_CACHED;
+			for (i = 0; i < size; i++) {
+				u64 v = (random() % size) * 10;
+				u64 d = v + (random() % size) * 10 + 1;
+				init_se(&cfs_rq, &se[i], 'A'+i, v, d);
+			}
+			n++;
+			if (n < skip_to)
+				continue;
+			pick = pick_fn(&cfs_rq);
+			correct_pick = correct_pick_eevdf(&cfs_rq);
+
+			if (size == 0) {
+				assert(!pick);
+				assert(!correct_pick);
+				continue;
+			}
+			if (!pick ||
+			    pick->deadline != correct_pick->deadline ||
+			    !entity_eligible(&cfs_rq, pick)) {
+
+				printf("Error (run %lu):\n", n);
+				print_se("correct pick", correct_pick);
+				print_se("actual pick ", pick);
+				printf("All ses:\n");
+				for (i = 0; i < size; i++) {
+					print_se("", &se[i]);
+				}
+				return;
+			}
+			//puts("");
+		}
+	}
+}
+
+void orig_check(void) {
+	struct cfs_rq cfs_rq = {0};
+	struct sched_entity sa, sb, sc;
+	struct sched_entity *root;
+
+
+	cfs_rq.tasks_timeline = RB_ROOT_CACHED;
+
+	init_se(&cfs_rq, &sa, 'a', 5, 9);
+	init_se(&cfs_rq, &sb, 'b', 4, 8);
+	init_se(&cfs_rq, &sc, 'c', 6, 7);
+
+	printf("cfs_rq min %ld avg %ld load %ld\n", cfs_rq.min_vruntime, cfs_rq.avg_vruntime, cfs_rq.avg_load);
+
+	root = __node_2_se(cfs_rq.tasks_timeline.rb_root.rb_node);
+	print_se("root", root);
+	if (root->run_node.rb_left)
+		print_se("left", __node_2_se(root->run_node.rb_left));
+	if (root->run_node.rb_right)
+		print_se("right", __node_2_se(root->run_node.rb_right));
+	print_se("picked", pick_eevdf_orig(&cfs_rq));
+}
+
+int main(int argc, char *argv[]) {
+	unsigned int seed = (unsigned int)time(NULL);
+	unsigned long skip_to = 0;
+	if (argc > 1)
+		seed = (unsigned int)atol(argv[1]);
+	srandom(seed);
+	if (argc > 2)
+		skip_to = (unsigned long)atol(argv[2]);
+	printf("Seed: %d\n", seed);
+	
+	test_pick_function(pick_eevdf_improved_ndebug, skip_to);
+	printf("Seed: %d\n", seed);
+}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-11 12:12     ` Abel Wu
@ 2023-10-11 13:14       ` Peter Zijlstra
  2023-10-11 21:01       ` Benjamin Segall
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2023-10-11 13:14 UTC (permalink / raw)
  To: Abel Wu
  Cc: Benjamin Segall, mingo, vincent.guittot, linux-kernel,
	juri.lelli, dietmar.eggemann, rostedt, mgorman, bristot, corbet,
	qyousef, chris.hyser, patrick.bellasi, pjt, pavel, qperret,
	tim.c.chen, joshdon, timj, kprateek.nayak, yu.c.chen,
	youssefesmat, joel, efault, tglx

On Wed, Oct 11, 2023 at 08:12:09PM +0800, Abel Wu wrote:

> The implementation of __pick_eevdf() now is quite complicated which
> makes it really hard to maintain. I'm trying my best to refactor this
> part, hopefully can do some help. Below is only for illustration, I
> need to test more.

Well, the form it has now is somewhat close to the one in the paper. I
think Ben added one early break it doesn't have or something.

As the paper explains, you get two walks, one down the eligibility path,
and then one down the heap. I think the current code structure
represents that fairly well.

Very vague memories seem to suggest I thought to be clever some 10+
years ago when I wrote that other decent loop that Ben showed to be
broken (and woe on me for not verifying it when I resurrected the
patches, I rewrote pretty much everything else, except this lookup :/).


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-09-30  0:09   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Benjamin Segall
       [not found]     ` <CGME20231004203940eucas1p2f73b017497d1f4239a6e236fdb6019e2@eucas1p2.samsung.com>
@ 2023-10-11 12:12     ` Abel Wu
  2023-10-11 13:14       ` Peter Zijlstra
  2023-10-11 21:01       ` Benjamin Segall
  1 sibling, 2 replies; 30+ messages in thread
From: Abel Wu @ 2023-10-11 12:12 UTC (permalink / raw)
  To: Benjamin Segall, Peter Zijlstra
  Cc: mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx

On 9/30/23 8:09 AM, Benjamin Segall Wrote:
> The old pick_eevdf could fail to find the actual earliest eligible
> deadline when it descended to the right looking for min_deadline, but it
> turned out that that min_deadline wasn't actually eligible. In that case
> we need to go back and search through any left branches we skipped
> looking for the actual best _eligible_ min_deadline.
> 
> This is more expensive, but still O(log n), and at worst should only
> involve descending two branches of the rbtree.
> 
> I've run this through a userspace stress test (thank you
> tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> corner cases.
> 
> Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
> Signed-off-by: Ben Segall <bsegall@google.com>
> ---
>   kernel/sched/fair.c | 72 ++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 58 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0c31cda0712f..77e9440b8ab3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -864,18 +864,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
>    *
>    *  se->min_deadline = min(se->deadline, se->{left,right}->min_deadline)
>    *
>    * Which allows an EDF like search on (sub)trees.
>    */
> -static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
> +static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
>   {
>   	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
>   	struct sched_entity *curr = cfs_rq->curr;
>   	struct sched_entity *best = NULL;
> +	struct sched_entity *best_left = NULL;
>   
>   	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>   		curr = NULL;
> +	best = curr;
>   
>   	/*
>   	 * Once selected, run a task until it either becomes non-eligible or
>   	 * until it gets a new slice. See the HACK in set_next_entity().
>   	 */
> @@ -892,45 +894,87 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
>   			node = node->rb_left;
>   			continue;
>   		}
>   
>   		/*
> -		 * If this entity has an earlier deadline than the previous
> -		 * best, take this one. If it also has the earliest deadline
> -		 * of its subtree, we're done.
> +		 * Now we heap search eligible trees for the best (min_)deadline
>   		 */
> -		if (!best || deadline_gt(deadline, best, se)) {
> +		if (!best || deadline_gt(deadline, best, se))
>   			best = se;
> -			if (best->deadline == best->min_deadline)
> -				break;
> -		}
>   
>   		/*
> -		 * If the earlest deadline in this subtree is in the fully
> -		 * eligible left half of our space, go there.
> +		 * Every se in a left branch is eligible, keep track of the
> +		 * branch with the best min_deadline
>   		 */
> +		if (node->rb_left) {
> +			struct sched_entity *left = __node_2_se(node->rb_left);
> +
> +			if (!best_left || deadline_gt(min_deadline, best_left, left))
> +				best_left = left;
> +
> +			/*
> +			 * min_deadline is in the left branch. rb_left and all
> +			 * descendants are eligible, so immediately switch to the second
> +			 * loop.
> +			 */
> +			if (left->min_deadline == se->min_deadline)
> +				break;
> +		}
> +
> +		/* min_deadline is at this node, no need to look right */
> +		if (se->deadline == se->min_deadline)
> +			break;
> +
> +		/* else min_deadline is in the right branch. */
> +		node = node->rb_right;
> +	}
> +
> +	/*
> +	 * We ran into an eligible node which is itself the best.
> +	 * (Or nr_running == 0 and both are NULL)
> +	 */
> +	if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0)
> +		return best;
> +
> +	/*
> +	 * Now best_left and all of its children are eligible, and we are just
> +	 * looking for deadline == min_deadline
> +	 */
> +	node = &best_left->run_node;
> +	while (node) {
> +		struct sched_entity *se = __node_2_se(node);
> +
> +		/* min_deadline is the current node */
> +		if (se->deadline == se->min_deadline)
> +			return se;

IMHO it would be better tiebreak on vruntime by moving this hunk to ..

> +
> +		/* min_deadline is in the left branch */
>   		if (node->rb_left &&
>   		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
>   			node = node->rb_left;
>   			continue;
>   		}

.. here, thoughts?

>   
> +		/* else min_deadline is in the right branch */
>   		node = node->rb_right;
>   	}
> +	return NULL;

Why not 'best'? Since ..

> +}
>   
> -	if (!best || (curr && deadline_gt(deadline, best, curr)))
> -		best = curr;
> +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
> +{
> +	struct sched_entity *se = __pick_eevdf(cfs_rq);
>   
> -	if (unlikely(!best)) {
> +	if (!se) {
>   		struct sched_entity *left = __pick_first_entity(cfs_rq);

.. cfs_rq->curr isn't considered here.

>   		if (left) {
>   			pr_err("EEVDF scheduling fail, picking leftmost\n");
>   			return left;
>   		}
>   	}
>   
> -	return best;
> +	return se;
>   }
>   
>   #ifdef CONFIG_SCHED_DEBUG
>   struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
>   {

The implementation of __pick_eevdf() now is quite complicated which
makes it really hard to maintain. I'm trying my best to refactor this
part, hopefully can do some help. Below is only for illustration, I
need to test more.

static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
{
	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
	struct sched_entity *curr = cfs_rq->curr;
	struct sched_entity *best = NULL;
	struct sched_entity *cand = NULL;
	bool all_eligible = false;

	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
		curr = NULL;

	/*
	 * Once selected, run a task until it either becomes non-eligible or
	 * until it gets a new slice. See the HACK in set_next_entity().
	 */
	if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
		return curr;

	while (node) {
		struct sched_entity *se = __node_2_se(node);

		/*
		 * If this entity is not eligible, try the left subtree.
		 */
		if (!all_eligible && !entity_eligible(cfs_rq, se)) {
			node = node->rb_left;
			continue;
		}

		if (node->rb_left) {
			struct sched_entity *left = __node_2_se(node->rb_left);

			BUG_ON(left->min_deadline < se->min_deadline);

			/* Tiebreak on vruntime */
			if (left->min_deadline == se->min_deadline) {
				node = node->rb_left;
				all_eligible = true;
				continue;
			}

			if (!all_eligible) {
				/*
				 * We're going to search right subtree and the one
				 * with min_deadline can be non-eligible, so record
				 * the left subtree as a candidate.
				 */
				if (!cand || deadline_gt(min_deadline, cand, left))
					cand = left;
			}
		}

		/* min_deadline is at this node, no need to look right */
		if (se->deadline == se->min_deadline) {
			best = se;
			break;
		}

		node = node->rb_right;

		if (!node && cand) {
			node = cand;
			all_eligible = true;
			cand = NULL;
		}
	}

	if (!best || (curr && deadline_gt(deadline, best, curr)))
		best = curr;

	return best;
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
       [not found]     ` <CGME20231004203940eucas1p2f73b017497d1f4239a6e236fdb6019e2@eucas1p2.samsung.com>
@ 2023-10-04 20:39       ` Marek Szyprowski
  0 siblings, 0 replies; 30+ messages in thread
From: Marek Szyprowski @ 2023-10-04 20:39 UTC (permalink / raw)
  To: Benjamin Segall, Peter Zijlstra
  Cc: mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx, Greg Kroah-Hartman

Hi,

On 30.09.2023 02:09, Benjamin Segall wrote:
> The old pick_eevdf could fail to find the actual earliest eligible
> deadline when it descended to the right looking for min_deadline, but it
> turned out that that min_deadline wasn't actually eligible. In that case
> we need to go back and search through any left branches we skipped
> looking for the actual best _eligible_ min_deadline.
>
> This is more expensive, but still O(log n), and at worst should only
> involve descending two branches of the rbtree.
>
> I've run this through a userspace stress test (thank you
> tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> corner cases.
>
> Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
> Signed-off-by: Ben Segall <bsegall@google.com>

This patch landed in today's linux-next as commit 561c58efd239 
("sched/fair: Fix pick_eevdf()"). Surprisingly it introduced a warning 
about circular locking dependency. It can be easily observed during boot 
from time to time on on qemu/arm64 'virt' machine:

======================================================
WARNING: possible circular locking dependency detected
6.6.0-rc4+ #7222 Not tainted
------------------------------------------------------
systemd-udevd/1187 is trying to acquire lock:
ffffbcc2be0c4de0 (console_owner){..-.}-{0:0}, at: 
console_flush_all+0x1b0/0x500

but task is already holding lock:
ffff5535ffdd2b18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xe0/0xc40

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (&rq->__lock){-.-.}-{2:2}:
        _raw_spin_lock_nested+0x44/0x5c
        raw_spin_rq_lock_nested+0x24/0x40
        task_fork_fair+0x3c/0xac
        sched_cgroup_fork+0xe8/0x14c
        copy_process+0x11c4/0x1a14
        kernel_clone+0x8c/0x400
        user_mode_thread+0x70/0x98
        rest_init+0x28/0x190
        arch_post_acpi_subsys_init+0x0/0x8
        start_kernel+0x594/0x684
        __primary_switched+0xbc/0xc4

-> #3 (&p->pi_lock){-.-.}-{2:2}:
        _raw_spin_lock_irqsave+0x60/0x88
        try_to_wake_up+0x58/0x468
        default_wake_function+0x14/0x20
        woken_wake_function+0x20/0x2c
        __wake_up_common+0x94/0x170
        __wake_up_common_lock+0x7c/0xcc
        __wake_up+0x18/0x24
        tty_wakeup+0x34/0x70
        tty_port_default_wakeup+0x20/0x38
        tty_port_tty_wakeup+0x18/0x24
        uart_write_wakeup+0x18/0x28
        pl011_tx_chars+0x140/0x2b4
        pl011_start_tx+0xe8/0x190
        serial_port_runtime_resume+0x90/0xc0
        __rpm_callback+0x48/0x1a8
        rpm_callback+0x6c/0x78
        rpm_resume+0x438/0x6d8
        pm_runtime_work+0x84/0xc8
        process_one_work+0x1ec/0x53c
        worker_thread+0x298/0x408
        kthread+0x124/0x128
        ret_from_fork+0x10/0x20

-> #2 (&tty->write_wait){....}-{2:2}:
        _raw_spin_lock_irqsave+0x60/0x88
        __wake_up_common_lock+0x5c/0xcc
        __wake_up+0x18/0x24
        tty_wakeup+0x34/0x70
        tty_port_default_wakeup+0x20/0x38
        tty_port_tty_wakeup+0x18/0x24
        uart_write_wakeup+0x18/0x28
        pl011_tx_chars+0x140/0x2b4
        pl011_start_tx+0xe8/0x190
        serial_port_runtime_resume+0x90/0xc0
        __rpm_callback+0x48/0x1a8
        rpm_callback+0x6c/0x78
        rpm_resume+0x438/0x6d8
        pm_runtime_work+0x84/0xc8
        process_one_work+0x1ec/0x53c
        worker_thread+0x298/0x408
        kthread+0x124/0x128
        ret_from_fork+0x10/0x20

-> #1 (&port_lock_key){..-.}-{2:2}:
        _raw_spin_lock+0x48/0x60
        pl011_console_write+0x13c/0x1b0
        console_flush_all+0x20c/0x500
        console_unlock+0x6c/0x130
        vprintk_emit+0x228/0x3a0
        vprintk_default+0x38/0x44
        vprintk+0xa4/0xc0
        _printk+0x5c/0x84
        register_console+0x1f4/0x420
        serial_core_register_port+0x5a4/0x5d8
        serial_ctrl_register_port+0x10/0x1c
        uart_add_one_port+0x10/0x1c
        pl011_register_port+0x70/0x12c
        pl011_probe+0x1bc/0x1fc
        amba_probe+0x110/0x1c8
        really_probe+0x148/0x2b4
        __driver_probe_device+0x78/0x12c
        driver_probe_device+0xd8/0x160
        __device_attach_driver+0xb8/0x138
        bus_for_each_drv+0x84/0xe0
        __device_attach+0xa8/0x1b0
        device_initial_probe+0x14/0x20
        bus_probe_device+0xb0/0xb4
        device_add+0x574/0x738
        amba_device_add+0x40/0xac
        of_platform_bus_create+0x2b4/0x378
        of_platform_populate+0x50/0xfc
        of_platform_default_populate_init+0xd0/0xf0
        do_one_initcall+0x74/0x2f0
        kernel_init_freeable+0x28c/0x4dc
        kernel_init+0x24/0x1dc
        ret_from_fork+0x10/0x20

-> #0 (console_owner){..-.}-{0:0}:
        __lock_acquire+0x1318/0x20c4
        lock_acquire+0x1e8/0x318
        console_flush_all+0x1f8/0x500
        console_unlock+0x6c/0x130
        vprintk_emit+0x228/0x3a0
        vprintk_default+0x38/0x44
        vprintk+0xa4/0xc0
        _printk+0x5c/0x84
        pick_next_task_fair+0x28c/0x498
        __schedule+0x164/0xc40
        do_task_dead+0x54/0x58
        do_exit+0x61c/0x9e8
        do_group_exit+0x34/0x90
        __wake_up_parent+0x0/0x30
        invoke_syscall+0x48/0x114
        el0_svc_common.constprop.0+0x40/0xe0
        do_el0_svc_compat+0x1c/0x38
        el0_svc_compat+0x48/0xb4
        el0t_32_sync_handler+0x90/0x140
        el0t_32_sync+0x194/0x198

other info that might help us debug this:

Chain exists of:
   console_owner --> &p->pi_lock --> &rq->__lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->__lock);
                                lock(&p->pi_lock);
                                lock(&rq->__lock);
   lock(console_owner);

  *** DEADLOCK ***

3 locks held by systemd-udevd/1187:
  #0: ffff5535ffdd2b18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xe0/0xc40
  #1: ffffbcc2be0c4c30 (console_lock){+.+.}-{0:0}, at: 
vprintk_emit+0x11c/0x3a0
  #2: ffffbcc2be0c4c88 (console_srcu){....}-{0:0}, at: 
console_flush_all+0x7c/0x500

stack backtrace:
CPU: 1 PID: 1187 Comm: systemd-udevd Not tainted 6.6.0-rc4+ #7222
Hardware name: linux,dummy-virt (DT)
Call trace:
  dump_backtrace+0x98/0xf0
  show_stack+0x18/0x24
  dump_stack_lvl+0x60/0xac
  dump_stack+0x18/0x24
  print_circular_bug+0x290/0x370
  check_noncircular+0x15c/0x170
  __lock_acquire+0x1318/0x20c4
  lock_acquire+0x1e8/0x318
  console_flush_all+0x1f8/0x500
  console_unlock+0x6c/0x130
  vprintk_emit+0x228/0x3a0
  vprintk_default+0x38/0x44
  vprintk+0xa4/0xc0
  _printk+0x5c/0x84
  pick_next_task_fair+0x28c/0x498
  __schedule+0x164/0xc40
  do_task_dead+0x54/0x58
  do_exit+0x61c/0x9e8
  do_group_exit+0x34/0x90
  __wake_up_parent+0x0/0x30
  invoke_syscall+0x48/0x114
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc_compat+0x1c/0x38
  el0_svc_compat+0x48/0xb4
  el0t_32_sync_handler+0x90/0x140
  el0t_32_sync+0x194/0x198

The problem is probably elsewhere, but this scheduler change only 
revealed it in a fully reproducible way. Reverting $subject on top of 
linux-next hides the problem deep enough that I was not able to 
reproduce it. Let me know if there is anything I can do to help fixing 
this issue.

> ---
>   kernel/sched/fair.c | 72 ++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 58 insertions(+), 14 deletions(-)
>
> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-05-31 11:58 ` [PATCH 05/15] sched/fair: Implement an EEVDF like policy Peter Zijlstra
@ 2023-09-30  0:09   ` Benjamin Segall
       [not found]     ` <CGME20231004203940eucas1p2f73b017497d1f4239a6e236fdb6019e2@eucas1p2.samsung.com>
  2023-10-11 12:12     ` Abel Wu
  0 siblings, 2 replies; 30+ messages in thread
From: Benjamin Segall @ 2023-09-30  0:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, linux-kernel, juri.lelli,
	dietmar.eggemann, rostedt, mgorman, bristot, corbet, qyousef,
	chris.hyser, patrick.bellasi, pjt, pavel, qperret, tim.c.chen,
	joshdon, timj, kprateek.nayak, yu.c.chen, youssefesmat, joel,
	efault, tglx

The old pick_eevdf could fail to find the actual earliest eligible
deadline when it descended to the right looking for min_deadline, but it
turned out that that min_deadline wasn't actually eligible. In that case
we need to go back and search through any left branches we skipped
looking for the actual best _eligible_ min_deadline.

This is more expensive, but still O(log n), and at worst should only
involve descending two branches of the rbtree.

I've run this through a userspace stress test (thank you
tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
corner cases.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Ben Segall <bsegall@google.com>
---
 kernel/sched/fair.c | 72 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0c31cda0712f..77e9440b8ab3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -864,18 +864,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
  *
  *  se->min_deadline = min(se->deadline, se->{left,right}->min_deadline)
  *
  * Which allows an EDF like search on (sub)trees.
  */
-static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
 	struct sched_entity *curr = cfs_rq->curr;
 	struct sched_entity *best = NULL;
+	struct sched_entity *best_left = NULL;
 
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;
+	best = curr;
 
 	/*
 	 * Once selected, run a task until it either becomes non-eligible or
 	 * until it gets a new slice. See the HACK in set_next_entity().
 	 */
@@ -892,45 +894,87 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 			node = node->rb_left;
 			continue;
 		}
 
 		/*
-		 * If this entity has an earlier deadline than the previous
-		 * best, take this one. If it also has the earliest deadline
-		 * of its subtree, we're done.
+		 * Now we heap search eligible trees for the best (min_)deadline
 		 */
-		if (!best || deadline_gt(deadline, best, se)) {
+		if (!best || deadline_gt(deadline, best, se))
 			best = se;
-			if (best->deadline == best->min_deadline)
-				break;
-		}
 
 		/*
-		 * If the earlest deadline in this subtree is in the fully
-		 * eligible left half of our space, go there.
+		 * Every se in a left branch is eligible, keep track of the
+		 * branch with the best min_deadline
 		 */
+		if (node->rb_left) {
+			struct sched_entity *left = __node_2_se(node->rb_left);
+
+			if (!best_left || deadline_gt(min_deadline, best_left, left))
+				best_left = left;
+
+			/*
+			 * min_deadline is in the left branch. rb_left and all
+			 * descendants are eligible, so immediately switch to the second
+			 * loop.
+			 */
+			if (left->min_deadline == se->min_deadline)
+				break;
+		}
+
+		/* min_deadline is at this node, no need to look right */
+		if (se->deadline == se->min_deadline)
+			break;
+
+		/* else min_deadline is in the right branch. */
+		node = node->rb_right;
+	}
+
+	/*
+	 * We ran into an eligible node which is itself the best.
+	 * (Or nr_running == 0 and both are NULL)
+	 */
+	if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0)
+		return best;
+
+	/*
+	 * Now best_left and all of its children are eligible, and we are just
+	 * looking for deadline == min_deadline
+	 */
+	node = &best_left->run_node;
+	while (node) {
+		struct sched_entity *se = __node_2_se(node);
+
+		/* min_deadline is the current node */
+		if (se->deadline == se->min_deadline)
+			return se;
+
+		/* min_deadline is in the left branch */
 		if (node->rb_left &&
 		    __node_2_se(node->rb_left)->min_deadline == se->min_deadline) {
 			node = node->rb_left;
 			continue;
 		}
 
+		/* else min_deadline is in the right branch */
 		node = node->rb_right;
 	}
+	return NULL;
+}
 
-	if (!best || (curr && deadline_gt(deadline, best, curr)))
-		best = curr;
+static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+{
+	struct sched_entity *se = __pick_eevdf(cfs_rq);
 
-	if (unlikely(!best)) {
+	if (!se) {
 		struct sched_entity *left = __pick_first_entity(cfs_rq);
 		if (left) {
 			pr_err("EEVDF scheduling fail, picking leftmost\n");
 			return left;
 		}
 	}
 
-	return best;
+	return se;
 }
 
 #ifdef CONFIG_SCHED_DEBUG
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-10-13 16:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
2023-10-05  7:58 ` Biju Das
     [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
2023-10-05  9:31     ` Biju Das
2023-10-05 15:02 ` Peter Zijlstra
2023-10-05 15:08   ` Biju Das
2023-10-06  8:36     ` Marek Szyprowski
2023-10-06  9:38       ` Biju Das
     [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
2023-10-06 10:31       ` Mike Galbraith
2023-10-06 14:00         ` Peter Zijlstra
2023-10-06 14:39           ` Mike Galbraith
2023-10-06 15:55             ` Peter Zijlstra
2023-10-06 16:54               ` Mike Galbraith
2023-10-06 19:24               ` Peter Zijlstra
2023-10-06 20:15                 ` Marek Szyprowski
2023-10-06 23:27                 ` Mike Galbraith
2023-10-07  0:37                 ` Chen Yu
2023-10-07  6:26                   ` Biju Das
2023-10-07  6:42                     ` Chen Yu
2023-10-09 14:27                       ` Phil Auld
2023-10-10  2:08                         ` Chen Yu
2023-10-07  6:24                 ` Biju Das
2023-10-09  7:53                 ` [tip: sched/urgent] sched/eevdf: Fix min_deadline heap integrity tip-bot2 for Peter Zijlstra
2023-10-05 15:11   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2023-05-31 11:58 [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr Peter Zijlstra
2023-05-31 11:58 ` [PATCH 05/15] sched/fair: Implement an EEVDF like policy Peter Zijlstra
2023-09-30  0:09   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Benjamin Segall
     [not found]     ` <CGME20231004203940eucas1p2f73b017497d1f4239a6e236fdb6019e2@eucas1p2.samsung.com>
2023-10-04 20:39       ` Marek Szyprowski
2023-10-11 12:12     ` Abel Wu
2023-10-11 13:14       ` Peter Zijlstra
2023-10-11 21:01       ` Benjamin Segall
2023-10-12 10:25         ` Abel Wu
2023-10-12 17:51           ` Benjamin Segall
2023-10-13  3:46             ` Abel Wu
2023-10-13 16:51               ` Benjamin Segall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.