From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753963AbcIPSUM (ORCPT ); Fri, 16 Sep 2016 14:20:12 -0400 Received: from mail-vk0-f42.google.com ([209.85.213.42]:35835 "EHLO mail-vk0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750982AbcIPSUB (ORCPT ); Fri, 16 Sep 2016 14:20:01 -0400 MIME-Version: 1.0 In-Reply-To: <20160916165045.GJ5016@twins.programming.kicks-ass.net> References: <20160903220526.GA20784@mtj.duckdns.org> <20160909225747.GA30105@mtj.duckdns.org> <20160914200041.GB6832@htj.duckdns.org> <20160916075137.GK5012@twins.programming.kicks-ass.net> <20160916161951.GH5016@twins.programming.kicks-ass.net> <20160916165045.GJ5016@twins.programming.kicks-ass.net> From: Andy Lutomirski Date: Fri, 16 Sep 2016 11:19:38 -0700 Message-ID: Subject: Re: [Documentation] State of CPU controller in cgroup v2 To: Peter Zijlstra Cc: Ingo Molnar , Mike Galbraith , kernel-team@fb.com, Andrew Morton , "open list:CONTROL GROUP (CGROUP)" , Paul Turner , Li Zefan , Linux API , "linux-kernel@vger.kernel.org" , Tejun Heo , Johannes Weiner , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 16, 2016 at 9:50 AM, Peter Zijlstra wrote: > On Fri, Sep 16, 2016 at 09:29:06AM -0700, Andy Lutomirski wrote: > >> > SCHED_DEADLINE, its a 'Global'-EDF like scheduler that doesn't support >> > CPU affinities (because that doesn't make sense). The only way to >> > restrict it is to partition. >> > >> > 'Global' because you can partition it. If you reduce your system to >> > single CPU partitions you'll reduce to P-EDF. >> > >> > (The same is true of SCHED_FIFO, that's a 'Global'-FIFO on the same >> > partition scheme, it however does support sched_affinity, but using it >> > gives 'interesting' schedulability results -- call it a historic >> > accident). >> >> Hmm, I didn't realize that the deadline scheduler was global. But >> ISTM requiring the use of "exclusive" to get this working is >> unfortunate. What if a user wants two separate partitions, one using >> CPUs 1 and 2 and the other using CPUs 3 and 4 (with 5 reserved for >> non-RT stuff)? > > {1,2} {3,4} {5} seem exclusive, did I miss something? (other than that 5 > cpu parts are 'rare'). There's no overlap, so they're logically exclusive, but it avoids needing the "cpu_exclusive" parameter. It always seemed confusing to me that a setting on a child cgroup would strictly remove a resource from the parent. (To be clear: I don't have any particularly strong objection to cpu_exclusive. It just always seemed like a bit of a hack that mostly duplicated what you could get by just setting the cpusets appropriately throughout the hierarchy.) >> > Note that related, but differently, we have the isolcpus boot parameter >> > which creates single CPU partitions for all listed CPUs and gives the >> > rest to the root cpuset. Ideally we'd kill this option given its a boot >> > time setting (for something which is trivially to do at runtime). >> > >> > But this cannot be done, because that would mean we'd have to start with >> > a !0 cpuset layout: >> > >> > '/' >> > load_balance=0 >> > / \ >> > 'system' 'isolated' >> > cpus=~isolcpus cpus=isolcpus >> > load_balance=0 >> > >> > And start with _everything_ in the /system group (inclding default IRQ >> > affinities). >> > >> > Of course, that will break everything cgroup :-( >> > >> >> I would actually *much* prefer this over the status quo. I'm tired of >> my crappy, partially-working script that sits there and creates >> exactly this configuration (minus the isolcpus part because I actually >> want migration to work) on boot. (Actually, it could have two >> automatic cgroups: /kernel and /init -- init and UMH would go in init >> and kernel threads and such would go in /kernel. Userspace would be >> able to request that a different cgroup be used for newly-created >> kernel threads.) > > So there's a problem with sticking kernel threads (and esp. kthreadd) > into !root groups. For example if you place it in a cpuset that doesn't > have all cpus, then binding your shiny new kthread to a cpu will fail. > > You can fix that of course, and we used to do exactly that, but we kept > running into 'fun' cases like that. Blech. But may this *should* have that effect. I'm sick of random kernel crap being scheduled on my RT CPUs and on the CPUs that I intend to be kept forcibly idle. > > The unbound workqueue stuff is totally arbitrary borkage though, that > can be made to work just fine, TJ didn't like it for some reason which I > really cannot remember. > > Also, UMH? User mode helper. Fortunately most users are gone now, but it still exists.