From: Michal Hocko <mhocko@suse.com> To: CGEL <cgel.zte@gmail.com> Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, willy@infradead.org, shy828301@gmail.com, roman.gushchin@linux.dev, shakeelb@google.com, linmiaohe@huawei.com, william.kucharski@oracle.com, peterx@redhat.com, hughd@google.com, vbabka@suse.cz, songmuchun@bytedance.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yang Yang <yang.yang29@zte.com.cn> Subject: Re: [PATCH] mm/memcg: support control THP behaviour in cgroup Date: Wed, 11 May 2022 09:21:53 +0200 [thread overview] Message-ID: <YntkEUKPquTbBjMu@dhcp22.suse.cz> (raw) In-Reply-To: <627b1899.1c69fb81.cd831.12d9@mx.google.com> On Wed 11-05-22 01:59:52, CGEL wrote: > On Tue, May 10, 2022 at 03:36:34PM +0200, Michal Hocko wrote: [...] > > Can you come up with a sane hierarchical behavior? > > > > I think this new interface better be independent not hierarchical anyway. Especially > when we treat container as lightweight virtual machine. I suspect you are focusing too much on your usecase and do not realize wider consequences of this being an user interface that still has to be sensible for other usecases. Take a delagation of the control to subgroups as an example. If this is a per memcg knob (like swappiness) then children can override parent's THP policy. This might be a less of the deal for swappiness because the anon/file reclaim balancing should be mostly an internal thing. But THP policy is different because it has other effects to workloads running outside of the said cgroup - higher memory demand, higher contention for high-order memory etc. I do not really see how this could be a sensible per-memcg policy without being fully hierarchical. > > > [...] > > > > > For micro-service architecture, the application in one container is not a > > > > > set of loosely tight processes, it's aim at provide one certain service, > > > > > so different containers means different service, and different service > > > > > has different QoS demand. > > > > > > > > OK, if they are tightly coupled you could apply the same THP policy by > > > > an existing prctl interface. Why is that not feasible. As you are noting > > > > below... > > > > > > > > > 5.containers usually managed by compose software, which treats container as > > > > > base management unit; > > > > > > > > ..so the compose software can easily start up the workload by using prctl > > > > to disable THP for whatever workloads it is not suitable for. > > > > > > prctl(PR_SET_THP_DISABLE..) can not be elegance to support the semantic we > > > need. If only some containers needs THP, other containers and host do not need > > > THP. We must set host THP to always first, and call prctl() to close THP for > > > host tasks and other containers one by one, > > > > It might not be the most elegant solution but it should work. > > So you agree it's reasonable to set THP policy for process in container, right? Yes, like in any other processes. > If so, IMHO, when there are thousands of processes launch and die on the machine, > it will be horrible to do so by calling prctl(), I don't see the reasonability. Could you be more specific? The usual prctl use would be normally handled by the launcher and rely on the per-process policy to be inherited down the road. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> To: CGEL <cgel.zte-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org, shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linmiaohe-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, william.kucharski-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, vbabka-AlSwsSmVLrQ@public.gmane.org, songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org, surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Yang Yang <yang.yang29-Th6q7B73Y6EnDS1+zs4M5A@public.gmane.org> Subject: Re: [PATCH] mm/memcg: support control THP behaviour in cgroup Date: Wed, 11 May 2022 09:21:53 +0200 [thread overview] Message-ID: <YntkEUKPquTbBjMu@dhcp22.suse.cz> (raw) In-Reply-To: <627b1899.1c69fb81.cd831.12d9-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org> On Wed 11-05-22 01:59:52, CGEL wrote: > On Tue, May 10, 2022 at 03:36:34PM +0200, Michal Hocko wrote: [...] > > Can you come up with a sane hierarchical behavior? > > > > I think this new interface better be independent not hierarchical anyway. Especially > when we treat container as lightweight virtual machine. I suspect you are focusing too much on your usecase and do not realize wider consequences of this being an user interface that still has to be sensible for other usecases. Take a delagation of the control to subgroups as an example. If this is a per memcg knob (like swappiness) then children can override parent's THP policy. This might be a less of the deal for swappiness because the anon/file reclaim balancing should be mostly an internal thing. But THP policy is different because it has other effects to workloads running outside of the said cgroup - higher memory demand, higher contention for high-order memory etc. I do not really see how this could be a sensible per-memcg policy without being fully hierarchical. > > > [...] > > > > > For micro-service architecture, the application in one container is not a > > > > > set of loosely tight processes, it's aim at provide one certain service, > > > > > so different containers means different service, and different service > > > > > has different QoS demand. > > > > > > > > OK, if they are tightly coupled you could apply the same THP policy by > > > > an existing prctl interface. Why is that not feasible. As you are noting > > > > below... > > > > > > > > > 5.containers usually managed by compose software, which treats container as > > > > > base management unit; > > > > > > > > ..so the compose software can easily start up the workload by using prctl > > > > to disable THP for whatever workloads it is not suitable for. > > > > > > prctl(PR_SET_THP_DISABLE..) can not be elegance to support the semantic we > > > need. If only some containers needs THP, other containers and host do not need > > > THP. We must set host THP to always first, and call prctl() to close THP for > > > host tasks and other containers one by one, > > > > It might not be the most elegant solution but it should work. > > So you agree it's reasonable to set THP policy for process in container, right? Yes, like in any other processes. > If so, IMHO, when there are thousands of processes launch and die on the machine, > it will be horrible to do so by calling prctl(), I don't see the reasonability. Could you be more specific? The usual prctl use would be normally handled by the launcher and rely on the per-process policy to be inherited down the road. -- Michal Hocko SUSE Labs
next prev parent reply other threads:[~2022-05-11 7:22 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-05 3:38 [PATCH] mm/memcg: support control THP behaviour in cgroup cgel.zte 2022-05-05 3:38 ` cgel.zte-Re5JQEeQqe8AvxtiuMwx3w 2022-05-05 12:49 ` kernel test robot 2022-05-05 12:49 ` kernel test robot 2022-05-05 13:31 ` kernel test robot 2022-05-05 13:31 ` kernel test robot 2022-05-05 16:09 ` kernel test robot 2022-05-05 16:09 ` kernel test robot 2022-05-06 13:41 ` Michal Hocko 2022-05-06 13:41 ` Michal Hocko 2022-05-07 2:05 ` CGEL 2022-05-07 2:05 ` CGEL 2022-05-09 10:00 ` Michal Hocko 2022-05-09 10:00 ` Michal Hocko 2022-05-09 11:26 ` CGEL 2022-05-09 11:26 ` CGEL 2022-05-09 11:48 ` Michal Hocko 2022-05-09 11:48 ` Michal Hocko 2022-05-10 1:43 ` CGEL 2022-05-10 1:43 ` CGEL 2022-05-10 10:00 ` Michal Hocko 2022-05-10 10:00 ` Michal Hocko 2022-05-10 11:52 ` CGEL 2022-05-10 11:52 ` CGEL 2022-05-10 13:36 ` Michal Hocko 2022-05-10 13:36 ` Michal Hocko 2022-05-11 1:59 ` CGEL 2022-05-11 1:59 ` CGEL 2022-05-11 7:21 ` Michal Hocko [this message] 2022-05-11 7:21 ` Michal Hocko 2022-05-11 9:47 ` CGEL 2022-05-18 5:58 ` CGEL 2022-05-18 5:58 ` CGEL 2022-05-10 19:34 ` Yang Shi 2022-05-10 19:34 ` Yang Shi 2022-05-11 2:19 ` CGEL 2022-05-11 2:19 ` CGEL 2022-05-11 2:47 ` Shakeel Butt 2022-05-11 2:47 ` Shakeel Butt 2022-05-11 3:11 ` Roman Gushchin 2022-05-11 3:11 ` Roman Gushchin 2022-05-11 3:31 ` CGEL 2022-05-11 3:31 ` CGEL 2022-05-18 8:14 ` Balbir Singh 2022-05-18 8:14 ` Balbir Singh 2022-05-11 3:17 ` CGEL 2022-05-11 3:17 ` CGEL
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YntkEUKPquTbBjMu@dhcp22.suse.cz \ --to=mhocko@suse.com \ --cc=akpm@linux-foundation.org \ --cc=cgel.zte@gmail.com \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=hughd@google.com \ --cc=linmiaohe@huawei.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=peterx@redhat.com \ --cc=roman.gushchin@linux.dev \ --cc=shakeelb@google.com \ --cc=shy828301@gmail.com \ --cc=songmuchun@bytedance.com \ --cc=surenb@google.com \ --cc=vbabka@suse.cz \ --cc=william.kucharski@oracle.com \ --cc=willy@infradead.org \ --cc=yang.yang29@zte.com.cn \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.