On Tue, 2022-12-13 at 08:12 -1000, Tejun Heo wrote: > Hello, > > On Tue, Dec 13, 2022 at 11:55:10AM +0100, Peter Zijlstra wrote: > > On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote: > > > > > Here, the way it's handled is a bit different, SCX has > > > a watchdog mechanism implemented in "[PATCH 18/31] sched_ext: > > > Implement > > > runnable task stall watchdog", so if SCX tasks hang for whatever > > > reason > > > including being starved by CFS, it will get aborted and all tasks > > > will be > > > handed back to CFS. IOW, it's treated like any other BPF > > > scheduler errors > > > that can lead to stalls and recovered the same way. > > > > That all sounds quite terrible.. :/ > > The main source of difference is that we can't implicitly trust the > BPF > scheduler and if it malfunctions or on user request, the system > should > always be recoverable, so there are some extra things which are > inherently > necessary to support that. > That makes me wonder whether loading an SCX policy should just have that policy take over all of the SCHED_OTHER tasks by default, and have a failure of the policy just return those tasks to CFS? Having the two be operative at the same time seems to be a cause of hard to resolve issues, while simply running all non-RT tasks under the loadable policy could simplify both internal kernel interfaces, as well as externally visible effects? -- All Rights Reversed.