linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
@ 2023-10-05  7:31 Biju Das
  2023-10-05  7:58 ` Biju Das
  2023-10-05 15:02 ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Biju Das @ 2023-10-05  7:31 UTC (permalink / raw)
  To: m.szyprowski, bsegall, peterz
  Cc: bristot, bsegall, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, peterz, pjt, qperret,
	qyousef, rostedt, tglx, tim.c.chen, timj, vincent.guittot,
	youssefesmat, yu.c.chen, mgorman, linux-renesas-soc

Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
In-Reply-To: <xm261qego72d.fsf_-_@google.com>

Hi,

On 30.09.2023 02:09, Benjamin Segall wrote:
> The old pick_eevdf could fail to find the actual earliest eligible
> deadline when it descended to the right looking for min_deadline, but it
> turned out that that min_deadline wasn't actually eligible. In that case
> we need to go back and search through any left branches we skipped
> looking for the actual best _eligible_ min_deadline.
>
> This is more expensive, but still O(log n), and at worst should only
> involve descending two branches of the rbtree.
>
> I've run this through a userspace stress test (thank you
> tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> corner cases.
>
> Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
> Signed-off-by: Ben Segall <bsegall@google.com>

This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform. Reverting the patch fixes the warning messages

[1]
[   25.550898] EEVDF scheduling fail, picking leftmost

[   15.109634] ======================================================
[   15.109636] WARNING: possible circular locking dependency detected
[   15.109641] 6.6.0-rc4-next-20231005-arm64-renesas-ga03f9ebbbb4c #1165 Not tainted
[   15.109648] ------------------------------------------------------
[   15.109649] migration/0/16 is trying to acquire lock:
[   15.109654] ffff800081713460 (console_owner){..-.}-{0:0}, at: console_flush_all.constprop.0+0x1a0/0x438
[   15.109694]
[   15.109694] but task is already holding lock:
[   15.109697] ffff00007fbd2298 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xd0/0xbe0
[   15.109718]
[   15.109718] which lock already depends on the new lock.
[   15.109718]
[   15.109720]
[   15.109720] the existing dependency chain (in reverse order) is:

   25.551560]  __down_trylock_console_sem+0x34/0xb8
[   25.551567]  console_trylock+0x24/0x74
[   25.551574]  vprintk_emit+0x114/0x388
[   25.551581]  vprintk_default+0x34/0x3c
[   25.551588]  vprintk+0x9c/0xb4
[   25.551594]  _printk+0x58/0x7c
[   25.551600]  pick_next_task_fair+0x274/0x480
[   25.551608]  __schedule+0x154/0xbe0
[   25.551616]  schedule+0x48/0x110
[   25.551623]  worker_thread+0x1b8/0x3f8
[   25.551630]  kthread+0x114/0x118
[   25.551635]  ret_from_fork+0x10/0x20
[  OK  ] Started System Logging Service.
[   26.099203] EEVDF scheduling fail, picking leftmost

Cheers,
Biju


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
@ 2023-10-05  7:58 ` Biju Das
       [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
  2023-10-05 15:02 ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Biju Das @ 2023-10-05  7:58 UTC (permalink / raw)
  To: m.szyprowski, bsegall, peterz
  Cc: bristot, bsegall, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, peterz, pjt, qperret,
	qyousef, rostedt, tglx, tim.c.chen, timj, vincent.guittot,
	youssefesmat, yu.c.chen, mgorman, linux-renesas-soc,
	Geert Uytterhoeven

Hi all,

> -----Original Message-----
> From: Biju Das
> Sent: Thursday, October 5, 2023 8:32 AM
> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
> Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
> In-Reply-To: <xm261qego72d.fsf_-_@google.com>
> 
> Hi,
> 
> On 30.09.2023 02:09, Benjamin Segall wrote:
> > The old pick_eevdf could fail to find the actual earliest eligible
> > deadline when it descended to the right looking for min_deadline, but
> > it turned out that that min_deadline wasn't actually eligible. In that
> > case we need to go back and search through any left branches we
> > skipped looking for the actual best _eligible_ min_deadline.
> >
> > This is more expensive, but still O(log n), and at worst should only
> > involve descending two branches of the rbtree.
> >
> > I've run this through a userspace stress test (thank you
> > tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
> > corner cases.
> >
> > Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling
> > policy")
> > Signed-off-by: Ben Segall <bsegall@google.com>
> 
> This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform.
> Reverting the patch fixes the warning messages
> 
> [1]
> [   25.550898] EEVDF scheduling fail, picking leftmost
> 
> [   15.109634] ======================================================
> [   15.109636] WARNING: possible circular locking dependency detected
> [   15.109641] 6.6.0-rc4-next-20231005-arm64-renesas-ga03f9ebbbb4c #1165
> Not tainted
> [   15.109648] ------------------------------------------------------
> [   15.109649] migration/0/16 is trying to acquire lock:
> [   15.109654] ffff800081713460 (console_owner){..-.}-{0:0}, at:
> console_flush_all.constprop.0+0x1a0/0x438
> [   15.109694]
> [   15.109694] but task is already holding lock:
> [   15.109697] ffff00007fbd2298 (&rq->__lock){-.-.}-{2:2}, at:
> __schedule+0xd0/0xbe0
> [   15.109718]
> [   15.109718] which lock already depends on the new lock.
> [   15.109718]
> [   15.109720]
> [   15.109720] the existing dependency chain (in reverse order) is:
> 
>    25.551560]  __down_trylock_console_sem+0x34/0xb8
> [   25.551567]  console_trylock+0x24/0x74
> [   25.551574]  vprintk_emit+0x114/0x388
> [   25.551581]  vprintk_default+0x34/0x3c
> [   25.551588]  vprintk+0x9c/0xb4
> [   25.551594]  _printk+0x58/0x7c
> [   25.551600]  pick_next_task_fair+0x274/0x480
> [   25.551608]  __schedule+0x154/0xbe0
> [   25.551616]  schedule+0x48/0x110
> [   25.551623]  worker_thread+0x1b8/0x3f8
> [   25.551630]  kthread+0x114/0x118
> [   25.551635]  ret_from_fork+0x10/0x20
> [  OK  ] Started System Logging Service.
> [   26.099203] EEVDF scheduling fail, picking leftmost

Complete log can be found here

https://pastebin.com/zZkRFiSf

Cheers,
Biju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
       [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
@ 2023-10-05  9:31     ` Biju Das
  0 siblings, 0 replies; 22+ messages in thread
From: Biju Das @ 2023-10-05  9:31 UTC (permalink / raw)
  To: Chen Yu; +Cc: Geert Uytterhoeven, linux-renesas-soc

> Subject: Re: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> correct se
> 
> Hi Biju,
> 
> On 2023-10-05 at 07:58:04 +0000, Biju Das wrote:
> > Hi all,
> >
> > > -----Original Message-----
> > > From: Biju Das
> > > Sent: Thursday, October 5, 2023 8:32 AM
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> > > correct se
> > >
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> > > correct se
> > > Date: Wed, 4 Oct 2023 22:39:39 +0200	[thread overview]
> > > Message-ID: <c92bc8a6-225d-4fd2-88b5-8994090fb2de@samsung.com> (raw)
> > > In-Reply-To: <xm261qego72d.fsf_-_@google.com>
> > >
> > > Hi,
> > >
> > > On 30.09.2023 02:09, Benjamin Segall wrote:
> > > > The old pick_eevdf could fail to find the actual earliest eligible
> > > > deadline when it descended to the right looking for min_deadline,
> > > > but it turned out that that min_deadline wasn't actually eligible.
> > > > In that case we need to go back and search through any left
> > > > branches we skipped looking for the actual best _eligible_
> min_deadline.
> > > >
> > > > This is more expensive, but still O(log n), and at worst should
> > > > only involve descending two branches of the rbtree.
> > > >
> > > > I've run this through a userspace stress test (thank you
> > > > tools/lib/rbtree.c), so hopefully this implementation doesn't miss
> > > > any corner cases.
> > > >
> > > > Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like
> > > > scheduling
> > > > policy")
> > > > Signed-off-by: Ben Segall <bsegall@google.com>
> > >
> > > This patch causing issues [1] in Renesas RZ/G2L SMARC EVK platform.
> > > Reverting the patch fixes the warning messages
> > >
> 
> Thank you for reporting this. It seems to be directly triggered by the
> pr_err in __pick_eevdf(). May I have the kernel config file you are using?
> I'm trying to reproduce this issue on a server.

It is reproducible with [1] and [2]. The logs I shared from [2]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/configs/defconfig?h=next-20231005

[2] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel.git/tree/arch/arm64/configs/renesas_defconfig

Cheers,
Biju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
  2023-10-05  7:58 ` Biju Das
@ 2023-10-05 15:02 ` Peter Zijlstra
  2023-10-05 15:08   ` Biju Das
  2023-10-05 15:11   ` Peter Zijlstra
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-10-05 15:02 UTC (permalink / raw)
  To: Biju Das
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:

> [   26.099203] EEVDF scheduling fail, picking leftmost

This, that the problem.. the rest is just noise because printk stinks.

Weirdly have not seen that trigger, and I've been running with this
patch on for a few days now :/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:02 ` Peter Zijlstra
@ 2023-10-05 15:08   ` Biju Das
  2023-10-06  8:36     ` Marek Szyprowski
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
  2023-10-05 15:11   ` Peter Zijlstra
  1 sibling, 2 replies; 22+ messages in thread
From: Biju Das @ 2023-10-05 15:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

Hi Peter Zijlstra,

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> 
> > [   26.099203] EEVDF scheduling fail, picking leftmost
> 
> This, that the problem.. the rest is just noise because printk stinks.
> 
> Weirdly have not seen that trigger, and I've been running with this patch
> on for a few days now :/

I agree Original issue is "EEVDF scheduling fail, picking leftmost"
Which is triggering noisy lock warning messages during boot.

2 platforms are affected both ARM platforms(Renesas and Samsung)
Maybe other platforms affected too.

I am happy to test if there is a fix available for this issue.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:02 ` Peter Zijlstra
  2023-10-05 15:08   ` Biju Das
@ 2023-10-05 15:11   ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-10-05 15:11 UTC (permalink / raw)
  To: Biju Das
  Cc: m.szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, efault, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Thu, Oct 05, 2023 at 05:02:58PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> 
> > [   26.099203] EEVDF scheduling fail, picking leftmost
> 
> This, that the problem.. the rest is just noise because printk stinks.
> 
> Weirdly have not seen that trigger, and I've been running with this
> patch on for a few days now :/

I've killed the patch for now, will try again once we figure out what
goes side-ways.


Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-05 15:08   ` Biju Das
@ 2023-10-06  8:36     ` Marek Szyprowski
  2023-10-06  9:38       ` Biju Das
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
  1 sibling, 1 reply; 22+ messages in thread
From: Marek Szyprowski @ 2023-10-06  8:36 UTC (permalink / raw)
  To: Biju Das, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, pjt, qperret, qyousef,
	rostedt, tglx, tim.c.chen, timj, vincent.guittot, youssefesmat,
	yu.c.chen, mgorman, linux-renesas-soc

On 05.10.2023 17:08, Biju Das wrote:
> Hi Peter Zijlstra,
>
>> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
>> se
>>
>> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
>>
>>> [   26.099203] EEVDF scheduling fail, picking leftmost
>> This, that the problem.. the rest is just noise because printk stinks.
>>
>> Weirdly have not seen that trigger, and I've been running with this patch
>> on for a few days now :/
> I agree Original issue is "EEVDF scheduling fail, picking leftmost"
> Which is triggering noisy lock warning messages during boot.
>
> 2 platforms are affected both ARM platforms(Renesas and Samsung)
> Maybe other platforms affected too.


Just to note, I've run into this issue on the QEmu's 'arm64/virt' 
platform, not on the Samsung specific hardware.


You can easily reproduce it with the following steps:

# make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig

# ./scripts/config -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PM_DEBUG -e 
PM_ADVANCED_DEBUG

# make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig Image.gz

# wget 
https://cloud.debian.org/images/cloud/buster/20230802-1460/debian-10-nocloud-arm64-20230802-1460.tar.xz

# tar xJfv debian-10-nocloud-arm64-20230802-1460.tar.xz


Then run QEmu a few times until you see the lockdep splat:

# qemu-system-aarch64 -serial stdio -kernel arch/arm64/boot/Image 
-append "console=ttyAMA0 root=/dev/vda1 rootwait" -M virt -cpu 
cortex-a57 -smp 2 -m 1024 -netdev user,id=user -device 
virtio-net-device,netdev=user -display none -device 
virtio-blk-device,drive=virtio-blk0 -drive 
file=disk.raw,id=virtio-blk0,if=none,format=raw


I have no idea if this is ARM64-specific anyhow, but this is how I 
reproduced it. I hope this guide helps fixing the bug!

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06  8:36     ` Marek Szyprowski
@ 2023-10-06  9:38       ` Biju Das
  0 siblings, 0 replies; 22+ messages in thread
From: Biju Das @ 2023-10-06  9:38 UTC (permalink / raw)
  To: Marek Szyprowski, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, efault,
	gregkh, joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel,
	mingo, patrick.bellasi, Pavel Machek, pjt, qperret, qyousef,
	rostedt, tglx, tim.c.chen, timj, vincent.guittot, youssefesmat,
	yu.c.chen, mgorman, linux-renesas-soc

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On 05.10.2023 17:08, Biju Das wrote:
> > Hi Peter Zijlstra,
> >
> >> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the
> >> correct se
> >>
> >> On Thu, Oct 05, 2023 at 07:31:34AM +0000, Biju Das wrote:
> >>
> >>> [   26.099203] EEVDF scheduling fail, picking leftmost
> >> This, that the problem.. the rest is just noise because printk stinks.
> >>
> >> Weirdly have not seen that trigger, and I've been running with this
> >> patch on for a few days now :/
> > I agree Original issue is "EEVDF scheduling fail, picking leftmost"
> > Which is triggering noisy lock warning messages during boot.
> >
> > 2 platforms are affected both ARM platforms(Renesas and Samsung) Maybe
> > other platforms affected too.
> 
> 
> Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> platform, not on the Samsung specific hardware.

OK, if needed I am happy to test on the real hardware, if a fix found for this issue. It is 100% reproducible Renesas RZ/g2L SMARC EVK platform.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
       [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
@ 2023-10-06 10:31       ` Mike Galbraith
  2023-10-06 14:00         ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Galbraith @ 2023-10-06 10:31 UTC (permalink / raw)
  To: Marek Szyprowski, Biju Das, Peter Zijlstra
  Cc: bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, yu.c.chen,
	mgorman, linux-renesas-soc

On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
>
>
> Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> platform, not on the Samsung specific hardware. 

It doesn't appear to be arch specific, all I have to do is enable
autogroup, raspberry or x86_64 desktop box, the occasional failure
tripping over task groups, leaving both best and best_left with
identical but not what we're looking for ->min_deadline.

	-Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 10:31       ` Mike Galbraith
@ 2023-10-06 14:00         ` Peter Zijlstra
  2023-10-06 14:39           ` Mike Galbraith
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-10-06 14:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> >
> >
> > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > platform, not on the Samsung specific hardware. 
> 
> It doesn't appear to be arch specific, all I have to do is enable
> autogroup, raspberry or x86_64 desktop box, the occasional failure
> tripping over task groups, leaving both best and best_left with
> identical but not what we're looking for ->min_deadline.

OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
are indeed in existence.

I've ran hackbench and a kernel build, but no whoopsie yet :-(

I suppose I'll kick some benchmarks and go make a cup 'o tea or
something.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 14:00         ` Peter Zijlstra
@ 2023-10-06 14:39           ` Mike Galbraith
  2023-10-06 15:55             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Galbraith @ 2023-10-06 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 16:00 +0200, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> > On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> > >
> > >
> > > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > > platform, not on the Samsung specific hardware. 
> >
> > It doesn't appear to be arch specific, all I have to do is enable
> > autogroup, raspberry or x86_64 desktop box, the occasional failure
> > tripping over task groups, leaving both best and best_left with
> > identical but not what we're looking for ->min_deadline.
>
> OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
> are indeed in existence.
>
> I've ran hackbench and a kernel build, but no whoopsie yet :-(
>
> I suppose I'll kick some benchmarks and go make a cup 'o tea or
> something.

Hm, just booting gets me a handful, and generic desktop activity
produces a fairly regular supply.  This is virgin 6.6.0.ga9e6eb3-tip.

[  340.473123] EEVDF scheduling fail, picking leftmost
[  340.473132] EEVDF scheduling fail, picking leftmost
[  343.656549] EEVDF scheduling fail, picking leftmost
[  343.656557] EEVDF scheduling fail, picking leftmost
[  343.690463] EEVDF scheduling fail, picking leftmost
[  343.690472] EEVDF scheduling fail, picking leftmost
[  426.224039] EEVDF scheduling fail, picking leftmost
[  426.224080] EEVDF scheduling fail, picking leftmost
[  426.224084] EEVDF scheduling fail, picking leftmost
[  426.363323] EEVDF scheduling fail, picking leftmost
[  426.759244] EEVDF scheduling fail, picking leftmost
[  426.759256] EEVDF scheduling fail, picking leftmost
[  441.872524] EEVDF scheduling fail, picking leftmost
[  441.872556] EEVDF scheduling fail, picking leftmost


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 14:39           ` Mike Galbraith
@ 2023-10-06 15:55             ` Peter Zijlstra
  2023-10-06 16:54               ` Mike Galbraith
  2023-10-06 19:24               ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-10-06 15:55 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 04:39:09PM +0200, Mike Galbraith wrote:
> On Fri, 2023-10-06 at 16:00 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 06, 2023 at 12:31:28PM +0200, Mike Galbraith wrote:
> > > On Fri, 2023-10-06 at 10:35 +0200, Marek Szyprowski wrote:
> > > >
> > > >
> > > > Just to note, I've run into this issue on the QEmu's 'arm64/virt'
> > > > platform, not on the Samsung specific hardware. 
> > >
> > > It doesn't appear to be arch specific, all I have to do is enable
> > > autogroup, raspberry or x86_64 desktop box, the occasional failure
> > > tripping over task groups, leaving both best and best_left with
> > > identical but not what we're looking for ->min_deadline.
> >
> > OK, autogroups enabled and booted, /debug/sched/debug shows autogroups
> > are indeed in existence.
> >
> > I've ran hackbench and a kernel build, but no whoopsie yet :-(
> >
> > I suppose I'll kick some benchmarks and go make a cup 'o tea or
> > something.
> 
> Hm, just booting gets me a handful, and generic desktop activity
> produces a fairly regular supply.  This is virgin 6.6.0.ga9e6eb3-tip.

You're running that systemd thing, eh?

If I create two groups (or two users with autogroup on) and have them
both build a kernel I do indeed get splat.

And yeah, min_deadline is hosed somehow.

migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256

I need to go make dinner (kids hungry), but I'll see if I can figure out
how this happens...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 15:55             ` Peter Zijlstra
@ 2023-10-06 16:54               ` Mike Galbraith
  2023-10-06 19:24               ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Galbraith @ 2023-10-06 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 17:55 +0200, Peter Zijlstra wrote:


> If I create two groups (or two users with autogroup on) and have them
> both build a kernel I do indeed get splat.

Ah, my desktop setup is two wide konsole instances with 14 tabs each
for old testament /me (dang dinky universe) and internet stuff running
as a mere mortal, so lots of groups.

	-Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 15:55             ` Peter Zijlstra
  2023-10-06 16:54               ` Mike Galbraith
@ 2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
                                   ` (3 more replies)
  1 sibling, 4 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-10-06 19:24 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:

> And yeah, min_deadline is hosed somehow.
> 
> migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
> migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
> migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
> migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> 
> I need to go make dinner (kids hungry), but I'll see if I can figure out
> how this happens...

*sigh*, does the below help?

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04fbcbda97d5..6a670f119efa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 		 */
 		deadline = div_s64(deadline * old_weight, weight);
 		se->deadline = se->vruntime + deadline;
+		min_deadline_cb_propagate(&se->run_node, NULL);
 	}
 
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
@ 2023-10-06 20:15                 ` Marek Szyprowski
  2023-10-06 23:27                 ` Mike Galbraith
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Marek Szyprowski @ 2023-10-06 20:15 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith
  Cc: Biju Das, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On 06.10.2023 21:24, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
>> And yeah, min_deadline is hosed somehow.
>>
>> migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
>> migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
>> migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
>> migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
>> migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
>>
>> I need to go make dinner (kids hungry), but I'll see if I can figure out
>> how this happens...
> *sigh*, does the below help?

It looks that this fixed the issue. At least I was no longer able to 
reproduce it.

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>   		 */
>   		deadline = div_s64(deadline * old_weight, weight);
>   		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>   	}
>   
>   #ifdef CONFIG_SMP
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
@ 2023-10-06 23:27                 ` Mike Galbraith
  2023-10-07  0:37                 ` Chen Yu
  2023-10-07  6:24                 ` Biju Das
  3 siblings, 0 replies; 22+ messages in thread
From: Mike Galbraith @ 2023-10-06 23:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Biju Das, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

On Fri, 2023-10-06 at 21:24 +0200, Peter Zijlstra wrote:
> 
> *sigh*, does the below help?

<invisible ink goggles> ah, so there you... weren't.

Yup, all better.

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq
> *cfs_rq, struct sched_entity *se,
>                  */
>                 deadline = div_s64(deadline * old_weight, weight);
>                 se->deadline = se->vruntime + deadline;
> +               min_deadline_cb_propagate(&se->run_node, NULL);
>         }
>  
>  #ifdef CONFIG_SMP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
  2023-10-06 20:15                 ` Marek Szyprowski
  2023-10-06 23:27                 ` Mike Galbraith
@ 2023-10-07  0:37                 ` Chen Yu
  2023-10-07  6:26                   ` Biju Das
  2023-10-07  6:24                 ` Biju Das
  3 siblings, 1 reply; 22+ messages in thread
From: Chen Yu @ 2023-10-07  0:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Marek Szyprowski, Biju Das, bsegall, bristot,
	chris.hyser, corbet, dietmar.eggemann, gregkh, joel, joshdon,
	juri.lelli, kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, mgorman, linux-renesas-soc

On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> 
> > And yeah, min_deadline is hosed somehow.
> > 
> > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > migration/28-185     [028] d..2.    70.264277: __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -66302085 E (11372/tr)
> > migration/28-185     [028] d..2.    70.264280: __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
> > migration/28-185     [028] d..2.    70.264282: __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N (-1//autogroup-33)
> > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq: min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > 
> > I need to go make dinner (kids hungry), but I'll see if I can figure out
> > how this happens...
> 
> *sigh*, does the below help?
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>  		 */
>  		deadline = div_s64(deadline * old_weight, weight);
>  		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>  	}
>  
>  #ifdef CONFIG_SMP

Is it because without this patch, the se->deadline is not always synced with se->min_deadline,
then in pick_eevdf() the following condition could not be met, thus we get a NULL candidate
and has to pick the leftmost one?
	if (se->deadline == se->min_deadline)

Regarding the circular locking warning triggered by printk, does it mean we should not get a
NULL candidate from __pick_eevdf() in theory? And besides, we should not use printk with rq lock
hold?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-06 19:24               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2023-10-07  0:37                 ` Chen Yu
@ 2023-10-07  6:24                 ` Biju Das
  3 siblings, 0 replies; 22+ messages in thread
From: Biju Das @ 2023-10-07  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith
  Cc: Marek Szyprowski, bsegall, bristot, chris.hyser, corbet,
	dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, yu.c.chen, mgorman,
	linux-renesas-soc

Hi Peter Zijlstra,


> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> 
> > And yeah, min_deadline is hosed somehow.
> >
> > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > migration/28-185     [028] d..2.    70.264277: __print_se:
> ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> 66302085 E (11372/tr)
> > migration/28-185     [028] d..2.    70.264280: __print_se:
> ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> 66302085 E (-1//autogroup-31)
> > migration/28-185     [028] d..2.    70.264282: __print_se:
> ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> (-1//autogroup-33)
> > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> >
> > I need to go make dinner (kids hungry), but I'll see if I can figure
> > out how this happens...
> 
> *sigh*, does the below help?

I confirm the message "EEVDF scheduling fail, picking leftmost" is not appearing with this patch.

Cheers,
Biju
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> 04fbcbda97d5..6a670f119efa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> struct sched_entity *se,
>  		 */
>  		deadline = div_s64(deadline * old_weight, weight);
>  		se->deadline = se->vruntime + deadline;
> +		min_deadline_cb_propagate(&se->run_node, NULL);
>  	}
> 
>  #ifdef CONFIG_SMP

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  0:37                 ` Chen Yu
@ 2023-10-07  6:26                   ` Biju Das
  2023-10-07  6:42                     ` Chen Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Biju Das @ 2023-10-07  6:26 UTC (permalink / raw)
  To: Chen Yu, Peter Zijlstra
  Cc: Mike Galbraith, Marek Szyprowski, bsegall, bristot, chris.hyser,
	corbet, dietmar.eggemann, gregkh, joel, joshdon, juri.lelli,
	kprateek.nayak, linux-kernel, mingo, patrick.bellasi,
	Pavel Machek, pjt, qperret, qyousef, rostedt, tglx, tim.c.chen,
	timj, vincent.guittot, youssefesmat, mgorman, linux-renesas-soc

Hi Chen Yu,

> Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> se
> 
> On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> >
> > > And yeah, min_deadline is hosed somehow.
> > >
> > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > migration/28-185     [028] d..2.    70.264277: __print_se:
> ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> 66302085 E (11372/tr)
> > > migration/28-185     [028] d..2.    70.264280: __print_se:
> ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> 66302085 E (-1//autogroup-31)
> > > migration/28-185     [028] d..2.    70.264282: __print_se:
> ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> (-1//autogroup-33)
> > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > >
> > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > out how this happens...
> >
> > *sigh*, does the below help?
> >
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > 04fbcbda97d5..6a670f119efa 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> struct sched_entity *se,
> >  		 */
> >  		deadline = div_s64(deadline * old_weight, weight);
> >  		se->deadline = se->vruntime + deadline;
> > +		min_deadline_cb_propagate(&se->run_node, NULL);
> >  	}
> >
> >  #ifdef CONFIG_SMP
> 
> Is it because without this patch, the se->deadline is not always synced
> with se->min_deadline, then in pick_eevdf() the following condition could
> not be met, thus we get a NULL candidate and has to pick the leftmost one?
> 	if (se->deadline == se->min_deadline)
> 
> Regarding the circular locking warning triggered by printk, does it mean we
> should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> we should not use printk with rq lock hold?

Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.

Cheers,
Biju

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  6:26                   ` Biju Das
@ 2023-10-07  6:42                     ` Chen Yu
  2023-10-09 14:27                       ` Phil Auld
  0 siblings, 1 reply; 22+ messages in thread
From: Chen Yu @ 2023-10-07  6:42 UTC (permalink / raw)
  To: Biju Das
  Cc: Peter Zijlstra, Mike Galbraith, Marek Szyprowski, bsegall,
	bristot, chris.hyser, corbet, dietmar.eggemann, gregkh, joel,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

Hi Biju,

On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> Hi Chen Yu,
> 
> > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > se
> > 
> > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > >
> > > > And yeah, min_deadline is hosed somehow.
> > > >
> > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > 66302085 E (11372/tr)
> > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > 66302085 E (-1//autogroup-31)
> > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > (-1//autogroup-33)
> > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > >
> > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > out how this happens...
> > >
> > > *sigh*, does the below help?
> > >
> > > ---
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > 04fbcbda97d5..6a670f119efa 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > struct sched_entity *se,
> > >  		 */
> > >  		deadline = div_s64(deadline * old_weight, weight);
> > >  		se->deadline = se->vruntime + deadline;
> > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > >  	}
> > >
> > >  #ifdef CONFIG_SMP
> > 
> > Is it because without this patch, the se->deadline is not always synced
> > with se->min_deadline, then in pick_eevdf() the following condition could
> > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > 	if (se->deadline == se->min_deadline)
> > 
> > Regarding the circular locking warning triggered by printk, does it mean we
> > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > we should not use printk with rq lock hold?
> 
> Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> 

Yes, it is a useful log. I was trying to figure out the safe scenario to use
printk.

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-07  6:42                     ` Chen Yu
@ 2023-10-09 14:27                       ` Phil Auld
  2023-10-10  2:08                         ` Chen Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Phil Auld @ 2023-10-09 14:27 UTC (permalink / raw)
  To: Chen Yu
  Cc: Biju Das, Peter Zijlstra, Mike Galbraith, Marek Szyprowski,
	bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

On Sat, Oct 07, 2023 at 02:42:10PM +0800 Chen Yu wrote:
> Hi Biju,
> 
> On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> > Hi Chen Yu,
> > 
> > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > > se
> > > 
> > > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > > >
> > > > > And yeah, min_deadline is hosed somehow.
> > > > >
> > > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > > 66302085 E (11372/tr)
> > > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > > 66302085 E (-1//autogroup-31)
> > > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > > (-1//autogroup-33)
> > > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > > >
> > > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > > out how this happens...
> > > >
> > > > *sigh*, does the below help?
> > > >
> > > > ---
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > > 04fbcbda97d5..6a670f119efa 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > > struct sched_entity *se,
> > > >  		 */
> > > >  		deadline = div_s64(deadline * old_weight, weight);
> > > >  		se->deadline = se->vruntime + deadline;
> > > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > > >  	}
> > > >
> > > >  #ifdef CONFIG_SMP
> > > 
> > > Is it because without this patch, the se->deadline is not always synced
> > > with se->min_deadline, then in pick_eevdf() the following condition could
> > > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > > 	if (se->deadline == se->min_deadline)
> > > 
> > > Regarding the circular locking warning triggered by printk, does it mean we
> > > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > > we should not use printk with rq lock hold?
> > 
> > Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> > 
> 
> Yes, it is a useful log. I was trying to figure out the safe scenario to use
> printk.
>

You should not use printk with a rq lock held as some console implementations
(framebuffer etc) call schedule_work() which loops back into the scheduler
and bad things happen.

Some places in the scheduler use printk_deferred() when needed. 

I think this will be better when the RT printk stuff is fully merged. 

Cheers,
Phil

> thanks,
> Chenyu
> 

-- 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct se
  2023-10-09 14:27                       ` Phil Auld
@ 2023-10-10  2:08                         ` Chen Yu
  0 siblings, 0 replies; 22+ messages in thread
From: Chen Yu @ 2023-10-10  2:08 UTC (permalink / raw)
  To: Phil Auld
  Cc: Biju Das, Peter Zijlstra, Mike Galbraith, Marek Szyprowski,
	bsegall, bristot, chris.hyser, corbet, dietmar.eggemann, gregkh,
	joel, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mingo,
	patrick.bellasi, Pavel Machek, pjt, qperret, qyousef, rostedt,
	tglx, tim.c.chen, timj, vincent.guittot, youssefesmat, mgorman,
	linux-renesas-soc

Hi Phil,

On 2023-10-09 at 10:27:40 -0400, Phil Auld wrote:
> On Sat, Oct 07, 2023 at 02:42:10PM +0800 Chen Yu wrote:
> > Hi Biju,
> > 
> > On 2023-10-07 at 06:26:05 +0000, Biju Das wrote:
> > > Hi Chen Yu,
> > > 
> > > > Subject: Re: [PATCH] sched/fair: fix pick_eevdf to always find the correct
> > > > se
> > > > 
> > > > On 2023-10-06 at 21:24:45 +0200, Peter Zijlstra wrote:
> > > > > On Fri, Oct 06, 2023 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > > > >
> > > > > > And yeah, min_deadline is hosed somehow.
> > > > > >
> > > > > > migration/28-185     [028] d..2.    70.264274: validate_cfs_rq: --- /
> > > > > > migration/28-185     [028] d..2.    70.264277: __print_se:
> > > > ffff88845cf48080 w: 1024 ve: -58857638 lag: 870381 vd: -55861854 vmd: -
> > > > 66302085 E (11372/tr)
> > > > > > migration/28-185     [028] d..2.    70.264280: __print_se:
> > > > ffff88810d165800 w: 25 ve: -80323686 lag: 22336429 vd: -41496434 vmd: -
> > > > 66302085 E (-1//autogroup-31)
> > > > > > migration/28-185     [028] d..2.    70.264282: __print_se:
> > > > ffff888108379000 w: 25 ve: 0 lag: -57987257 vd: 114632828 vmd: 114632828 N
> > > > (-1//autogroup-33)
> > > > > > migration/28-185     [028] d..2.    70.264283: validate_cfs_rq:
> > > > min_deadline: -55861854 avg_vruntime: -62278313462 / 1074 = -57987256
> > > > > >
> > > > > > I need to go make dinner (kids hungry), but I'll see if I can figure
> > > > > > out how this happens...
> > > > >
> > > > > *sigh*, does the below help?
> > > > >
> > > > > ---
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > > > 04fbcbda97d5..6a670f119efa 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -3632,6 +3747,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq,
> > > > struct sched_entity *se,
> > > > >  		 */
> > > > >  		deadline = div_s64(deadline * old_weight, weight);
> > > > >  		se->deadline = se->vruntime + deadline;
> > > > > +		min_deadline_cb_propagate(&se->run_node, NULL);
> > > > >  	}
> > > > >
> > > > >  #ifdef CONFIG_SMP
> > > > 
> > > > Is it because without this patch, the se->deadline is not always synced
> > > > with se->min_deadline, then in pick_eevdf() the following condition could
> > > > not be met, thus we get a NULL candidate and has to pick the leftmost one?
> > > > 	if (se->deadline == se->min_deadline)
> > > > 
> > > > Regarding the circular locking warning triggered by printk, does it mean we
> > > > should not get a NULL candidate from __pick_eevdf() in theory? And besides,
> > > > we should not use printk with rq lock hold?
> > > 
> > > Is it not a useful error log? At least from the initial report Marek Szyprowski doesn't see "EEVDF scheduling fail, picking leftmost" but seen only warning triggered by this in the logs.
> > > 
> > 
> > Yes, it is a useful log. I was trying to figure out the safe scenario to use
> > printk.
> >
> 
> You should not use printk with a rq lock held as some console implementations
> (framebuffer etc) call schedule_work() which loops back into the scheduler
> and bad things happen.
> 
> Some places in the scheduler use printk_deferred() when needed. 
> 
> I think this will be better when the RT printk stuff is fully merged. 
>

I see, got it! Thanks for the education.

thanks,
Chenyu 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-10-10  2:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05  7:31 [PATCH] sched/fair: fix pick_eevdf to always find the correct se Biju Das
2023-10-05  7:58 ` Biju Das
     [not found]   ` <ZR6A/lsQMP+ymD1f@chenyu5-mobl2.ccr.corp.intel.com>
2023-10-05  9:31     ` Biju Das
2023-10-05 15:02 ` Peter Zijlstra
2023-10-05 15:08   ` Biju Das
2023-10-06  8:36     ` Marek Szyprowski
2023-10-06  9:38       ` Biju Das
     [not found]     ` <553e2ee4-ab3a-4635-a74f-0ba4cc03f3f9@samsung.com>
2023-10-06 10:31       ` Mike Galbraith
2023-10-06 14:00         ` Peter Zijlstra
2023-10-06 14:39           ` Mike Galbraith
2023-10-06 15:55             ` Peter Zijlstra
2023-10-06 16:54               ` Mike Galbraith
2023-10-06 19:24               ` Peter Zijlstra
2023-10-06 20:15                 ` Marek Szyprowski
2023-10-06 23:27                 ` Mike Galbraith
2023-10-07  0:37                 ` Chen Yu
2023-10-07  6:26                   ` Biju Das
2023-10-07  6:42                     ` Chen Yu
2023-10-09 14:27                       ` Phil Auld
2023-10-10  2:08                         ` Chen Yu
2023-10-07  6:24                 ` Biju Das
2023-10-05 15:11   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).