* [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage @ 2016-03-10 20:50 Johannes Weiner 2016-03-11 7:55 ` Michal Hocko 2016-03-11 8:34 ` Vladimir Davydov 0 siblings, 2 replies; 13+ messages in thread From: Johannes Weiner @ 2016-03-10 20:50 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel, kernel-team When setting memory.high below usage, nothing happens until the next charge comes along, and then it will only reclaim its own charge and not the now potentially huge excess of the new memory.high. This can cause groups to stay in excess of their memory.high indefinitely. To fix that, when shrinking memory.high, kick off a reclaim cycle that goes after the delta. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/memcontrol.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8615b066b642..f7c9b4cbdf01 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4992,6 +4992,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long nr_pages; unsigned long high; int err; @@ -5002,6 +5003,11 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, memcg->high = high; + nr_pages = page_counter_read(&memcg->memory); + if (nr_pages > high) + try_to_free_mem_cgroup_pages(memcg, nr_pages - high, + GFP_KERNEL, true); + memcg_wb_domain_size_changed(memcg); return nbytes; } -- 2.7.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-10 20:50 [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage Johannes Weiner @ 2016-03-11 7:55 ` Michal Hocko 2016-03-11 8:34 ` Vladimir Davydov 1 sibling, 0 replies; 13+ messages in thread From: Michal Hocko @ 2016-03-11 7:55 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, linux-mm, cgroups, linux-kernel, kernel-team On Thu 10-03-16 15:50:13, Johannes Weiner wrote: > When setting memory.high below usage, nothing happens until the next > charge comes along, and then it will only reclaim its own charge and > not the now potentially huge excess of the new memory.high. This can > cause groups to stay in excess of their memory.high indefinitely. > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > goes after the delta. This has been the case since the knob was introduce but I wouldn't bother with the CC: stable # 4.0+ as this was still in experimental mode. I guess we want to have it in 4.5 or put it to 4.5 stable. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/memcontrol.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8615b066b642..f7c9b4cbdf01 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -4992,6 +4992,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > char *buf, size_t nbytes, loff_t off) > { > struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > + unsigned long nr_pages; > unsigned long high; > int err; > > @@ -5002,6 +5003,11 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > > memcg->high = high; > > + nr_pages = page_counter_read(&memcg->memory); > + if (nr_pages > high) > + try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > + GFP_KERNEL, true); > + > memcg_wb_domain_size_changed(memcg); > return nbytes; > } > -- > 2.7.2 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-10 20:50 [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage Johannes Weiner 2016-03-11 7:55 ` Michal Hocko @ 2016-03-11 8:34 ` Vladimir Davydov 2016-03-11 8:42 ` Michal Hocko 2016-03-16 5:41 ` Johannes Weiner 1 sibling, 2 replies; 13+ messages in thread From: Vladimir Davydov @ 2016-03-11 8:34 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > When setting memory.high below usage, nothing happens until the next > charge comes along, and then it will only reclaim its own charge and > not the now potentially huge excess of the new memory.high. This can > cause groups to stay in excess of their memory.high indefinitely. > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > goes after the delta. I agree that we should reclaim the high excess, but I don't think it's a good idea to do it synchronously. Currently, memory.low and memory.high knobs can be easily used by a single-threaded load manager implemented in userspace, because it doesn't need to care about potential stalls caused by writes to these files. After this change it might happen that a write to memory.high would take long, seconds perhaps, so in order to react quickly to changes in other cgroups, a load manager would have to spawn a thread per each write to memory.high, which would complicate its implementation significantly. Since, in contrast to memory.max, memory.high definition allows cgroup to breach it, I believe it would be better if we spawned an asynchronous reclaim work from the kernel on write to memory.high instead of doing this synchronously. I guess we could reuse mem_cgroup->high_work for that. Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 8:34 ` Vladimir Davydov @ 2016-03-11 8:42 ` Michal Hocko 2016-03-11 9:13 ` Vladimir Davydov 2016-03-16 5:41 ` Johannes Weiner 1 sibling, 1 reply; 13+ messages in thread From: Michal Hocko @ 2016-03-11 8:42 UTC (permalink / raw) To: Vladimir Davydov Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri 11-03-16 11:34:40, Vladimir Davydov wrote: > On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > > When setting memory.high below usage, nothing happens until the next > > charge comes along, and then it will only reclaim its own charge and > > not the now potentially huge excess of the new memory.high. This can > > cause groups to stay in excess of their memory.high indefinitely. > > > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > > goes after the delta. > > I agree that we should reclaim the high excess, but I don't think it's a > good idea to do it synchronously. Currently, memory.low and memory.high > knobs can be easily used by a single-threaded load manager implemented > in userspace, because it doesn't need to care about potential stalls > caused by writes to these files. After this change it might happen that > a write to memory.high would take long, seconds perhaps, so in order to > react quickly to changes in other cgroups, a load manager would have to > spawn a thread per each write to memory.high, which would complicate its > implementation significantly. Is the complication on the managing part really an issue though. Such a manager would have to spawn a process/thread to change the .max already. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 8:42 ` Michal Hocko @ 2016-03-11 9:13 ` Vladimir Davydov 2016-03-11 9:53 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Vladimir Davydov @ 2016-03-11 9:13 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri, Mar 11, 2016 at 09:42:39AM +0100, Michal Hocko wrote: > On Fri 11-03-16 11:34:40, Vladimir Davydov wrote: > > On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > > > When setting memory.high below usage, nothing happens until the next > > > charge comes along, and then it will only reclaim its own charge and > > > not the now potentially huge excess of the new memory.high. This can > > > cause groups to stay in excess of their memory.high indefinitely. > > > > > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > > > goes after the delta. > > > > I agree that we should reclaim the high excess, but I don't think it's a > > good idea to do it synchronously. Currently, memory.low and memory.high > > knobs can be easily used by a single-threaded load manager implemented > > in userspace, because it doesn't need to care about potential stalls > > caused by writes to these files. After this change it might happen that > > a write to memory.high would take long, seconds perhaps, so in order to > > react quickly to changes in other cgroups, a load manager would have to > > spawn a thread per each write to memory.high, which would complicate its > > implementation significantly. > > Is the complication on the managing part really an issue though. Such a > manager would have to spawn a process/thread to change the .max already. IMO memory.max is not something that has to be changed often. In most cases it will be set on container start and stay put throughout container lifetime. I can also imagine a case when memory.max will be changed for all containers when a container starts or stops, so as to guarantee that if <= N containers of M go mad, the system will survive. In any case, memory.max is reconfigured rarely, it rather belongs to the static configuration. OTOH memory.low and memory.high are perfect to be changed dynamically, basing on containers' memory demand/pressure. A load manager might want to reconfigure these knobs say every 5 seconds. Spawning a thread per each container that often would look unnecessarily overcomplicated IMO. Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 9:13 ` Vladimir Davydov @ 2016-03-11 9:53 ` Michal Hocko 2016-03-11 11:49 ` Vladimir Davydov 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2016-03-11 9:53 UTC (permalink / raw) To: Vladimir Davydov Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri 11-03-16 12:13:04, Vladimir Davydov wrote: > On Fri, Mar 11, 2016 at 09:42:39AM +0100, Michal Hocko wrote: > > On Fri 11-03-16 11:34:40, Vladimir Davydov wrote: > > > On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > > > > When setting memory.high below usage, nothing happens until the next > > > > charge comes along, and then it will only reclaim its own charge and > > > > not the now potentially huge excess of the new memory.high. This can > > > > cause groups to stay in excess of their memory.high indefinitely. > > > > > > > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > > > > goes after the delta. > > > > > > I agree that we should reclaim the high excess, but I don't think it's a > > > good idea to do it synchronously. Currently, memory.low and memory.high > > > knobs can be easily used by a single-threaded load manager implemented > > > in userspace, because it doesn't need to care about potential stalls > > > caused by writes to these files. After this change it might happen that > > > a write to memory.high would take long, seconds perhaps, so in order to > > > react quickly to changes in other cgroups, a load manager would have to > > > spawn a thread per each write to memory.high, which would complicate its > > > implementation significantly. > > > > Is the complication on the managing part really an issue though. Such a > > manager would have to spawn a process/thread to change the .max already. > > IMO memory.max is not something that has to be changed often. In most > cases it will be set on container start and stay put throughout > container lifetime. I can also imagine a case when memory.max will be > changed for all containers when a container starts or stops, so as to > guarantee that if <= N containers of M go mad, the system will survive. > In any case, memory.max is reconfigured rarely, it rather belongs to the > static configuration. I see > OTOH memory.low and memory.high are perfect to be changed dynamically, > basing on containers' memory demand/pressure. A load manager might want > to reconfigure these knobs say every 5 seconds. Spawning a thread per > each container that often would look unnecessarily overcomplicated IMO. The question however is whether we want to hide a potentially costly operation and have it unaccounted and hidden in the kworker context. I mean fork() + write() doesn't sound terribly complicated to me to have a rather subtle behavior in the kernel. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 9:53 ` Michal Hocko @ 2016-03-11 11:49 ` Vladimir Davydov 2016-03-11 13:39 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Vladimir Davydov @ 2016-03-11 11:49 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri, Mar 11, 2016 at 10:53:09AM +0100, Michal Hocko wrote: > > OTOH memory.low and memory.high are perfect to be changed dynamically, > > basing on containers' memory demand/pressure. A load manager might want > > to reconfigure these knobs say every 5 seconds. Spawning a thread per > > each container that often would look unnecessarily overcomplicated IMO. > > The question however is whether we want to hide a potentially costly > operation and have it unaccounted and hidden in the kworker context. There's already mem_cgroup->high_work doing reclaim in an unaccounted context quite often if tcp accounting is enabled. And there's kswapd. memory.high knob is for the root only so it can't be abused by an unprivileged user. Regarding a privileged user, e.g. load manager, it can screw things up anyway, e.g. by configuring sum of memory.low to be greater than total RAM on the host and hence driving kswapd mad. > I mean fork() + write() doesn't sound terribly complicated to me to have > a rather subtle behavior in the kernel. It'd be just a dubious API IMHO. With memory.max everything's clear: it tries to reclaim memory hard, may stall for several seconds, may invoke OOM, but if it finishes successfully we have memory.current less than memory.max. With this patch memory.high knob behaves rather strangely: it might stall, but there's no guarantee you'll have memory.current less than memory.high; moreover, according to the documentation it's OK to have memory.current greater than memory.high, so what's the point in calling synchronous reclaim blocking the caller? Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 11:49 ` Vladimir Davydov @ 2016-03-11 13:39 ` Michal Hocko 2016-03-11 14:01 ` Vladimir Davydov 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2016-03-11 13:39 UTC (permalink / raw) To: Vladimir Davydov Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri 11-03-16 14:49:34, Vladimir Davydov wrote: > On Fri, Mar 11, 2016 at 10:53:09AM +0100, Michal Hocko wrote: > > > OTOH memory.low and memory.high are perfect to be changed dynamically, > > > basing on containers' memory demand/pressure. A load manager might want > > > to reconfigure these knobs say every 5 seconds. Spawning a thread per > > > each container that often would look unnecessarily overcomplicated IMO. > > > > The question however is whether we want to hide a potentially costly > > operation and have it unaccounted and hidden in the kworker context. > > There's already mem_cgroup->high_work doing reclaim in an unaccounted > context quite often if tcp accounting is enabled. I suspect this is done because the charging context cannot do much better. > And there's kswapd. > memory.high knob is for the root only so it can't be abused by an > unprivileged user. Regarding a privileged user, e.g. load manager, it > can screw things up anyway, e.g. by configuring sum of memory.low to be > greater than total RAM on the host and hence driving kswapd mad. I am not worried about abuse. It is just weird to move something which can be perfectly sync to an async mode. > > I mean fork() + write() doesn't sound terribly complicated to me to have > > a rather subtle behavior in the kernel. > > It'd be just a dubious API IMHO. With memory.max everything's clear: it > tries to reclaim memory hard, may stall for several seconds, may invoke > OOM, but if it finishes successfully we have memory.current less than > memory.max. With this patch memory.high knob behaves rather strangely: > it might stall, but there's no guarantee you'll have memory.current less > than memory.high; moreover, according to the documentation it's OK to > have memory.current greater than memory.high, so what's the point in > calling synchronous reclaim blocking the caller? Even if the reclaim is best effort it doesn't mean we should hide it into an async context. There is simply no reason to do so. We do the some for other knobs which are performing a potentially expensive operation and do not guarantee the result. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 13:39 ` Michal Hocko @ 2016-03-11 14:01 ` Vladimir Davydov 2016-03-11 14:22 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Vladimir Davydov @ 2016-03-11 14:01 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri, Mar 11, 2016 at 02:39:36PM +0100, Michal Hocko wrote: > On Fri 11-03-16 14:49:34, Vladimir Davydov wrote: > > On Fri, Mar 11, 2016 at 10:53:09AM +0100, Michal Hocko wrote: > > > > OTOH memory.low and memory.high are perfect to be changed dynamically, > > > > basing on containers' memory demand/pressure. A load manager might want > > > > to reconfigure these knobs say every 5 seconds. Spawning a thread per > > > > each container that often would look unnecessarily overcomplicated IMO. > > > > > > The question however is whether we want to hide a potentially costly > > > operation and have it unaccounted and hidden in the kworker context. > > > > There's already mem_cgroup->high_work doing reclaim in an unaccounted > > context quite often if tcp accounting is enabled. > > I suspect this is done because the charging context cannot do much > better. > > > And there's kswapd. > > memory.high knob is for the root only so it can't be abused by an > > unprivileged user. Regarding a privileged user, e.g. load manager, it > > can screw things up anyway, e.g. by configuring sum of memory.low to be > > greater than total RAM on the host and hence driving kswapd mad. > > I am not worried about abuse. It is just weird to move something which > can be perfectly sync to an async mode. > > > > I mean fork() + write() doesn't sound terribly complicated to me to have > > > a rather subtle behavior in the kernel. > > > > It'd be just a dubious API IMHO. With memory.max everything's clear: it > > tries to reclaim memory hard, may stall for several seconds, may invoke > > OOM, but if it finishes successfully we have memory.current less than > > memory.max. With this patch memory.high knob behaves rather strangely: > > it might stall, but there's no guarantee you'll have memory.current less > > than memory.high; moreover, according to the documentation it's OK to > > have memory.current greater than memory.high, so what's the point in > > calling synchronous reclaim blocking the caller? > > Even if the reclaim is best effort it doesn't mean we should hide it > into an async context. There is simply no reason to do so. We do the > some for other knobs which are performing a potentially expensive > operation and do not guarantee the result. IMO it depends on what a knob is used for. If it's for testing or debugging or recovering the system (e.g. manual oom, compact, drop_caches), this must be synchronous, but memory.high is going to be tweaked at runtime during normal system operation every several seconds or so, at least in my understanding. I understand your concern, and may be you're right in the end, but think about userspace that will probably have to spawn thousands threads every 5 seconds or so just to write to a file. It's painful IMO. Are there any hidden non-obvious implications of handing over reclaim to a kernel worker on adjusting memory.high? May be, I'm just missing something obvious, and it can be really dangerous or sub-optimal. Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 14:01 ` Vladimir Davydov @ 2016-03-11 14:22 ` Michal Hocko 2016-03-11 14:46 ` Vladimir Davydov 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2016-03-11 14:22 UTC (permalink / raw) To: Vladimir Davydov Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri 11-03-16 17:01:46, Vladimir Davydov wrote: > On Fri, Mar 11, 2016 at 02:39:36PM +0100, Michal Hocko wrote: > > On Fri 11-03-16 14:49:34, Vladimir Davydov wrote: > > > On Fri, Mar 11, 2016 at 10:53:09AM +0100, Michal Hocko wrote: > > > > > OTOH memory.low and memory.high are perfect to be changed dynamically, > > > > > basing on containers' memory demand/pressure. A load manager might want > > > > > to reconfigure these knobs say every 5 seconds. Spawning a thread per > > > > > each container that often would look unnecessarily overcomplicated IMO. > > > > > > > > The question however is whether we want to hide a potentially costly > > > > operation and have it unaccounted and hidden in the kworker context. > > > > > > There's already mem_cgroup->high_work doing reclaim in an unaccounted > > > context quite often if tcp accounting is enabled. > > > > I suspect this is done because the charging context cannot do much > > better. > > > > > And there's kswapd. > > > memory.high knob is for the root only so it can't be abused by an > > > unprivileged user. Regarding a privileged user, e.g. load manager, it > > > can screw things up anyway, e.g. by configuring sum of memory.low to be > > > greater than total RAM on the host and hence driving kswapd mad. > > > > I am not worried about abuse. It is just weird to move something which > > can be perfectly sync to an async mode. > > > > > > I mean fork() + write() doesn't sound terribly complicated to me to have > > > > a rather subtle behavior in the kernel. > > > > > > It'd be just a dubious API IMHO. With memory.max everything's clear: it > > > tries to reclaim memory hard, may stall for several seconds, may invoke > > > OOM, but if it finishes successfully we have memory.current less than > > > memory.max. With this patch memory.high knob behaves rather strangely: > > > it might stall, but there's no guarantee you'll have memory.current less > > > than memory.high; moreover, according to the documentation it's OK to > > > have memory.current greater than memory.high, so what's the point in > > > calling synchronous reclaim blocking the caller? > > > > Even if the reclaim is best effort it doesn't mean we should hide it > > into an async context. There is simply no reason to do so. We do the > > some for other knobs which are performing a potentially expensive > > operation and do not guarantee the result. > > IMO it depends on what a knob is used for. If it's for testing or > debugging or recovering the system (e.g. manual oom, compact, > drop_caches), this must be synchronous, but memory.high is going to be > tweaked at runtime during normal system operation every several seconds > or so, Is this really going to happen in the real life? And if yes is it really probable that such an adjustment would cause such a large disruption? > at least in my understanding. I understand your concern, and may > be you're right in the end, but think about userspace that will probably > have to spawn thousands threads every 5 seconds or so just to write to a > file. It's painful IMO. > > Are there any hidden non-obvious implications of handing over reclaim to > a kernel worker on adjusting memory.high? May be, I'm just missing > something obvious, and it can be really dangerous or sub-optimal. I am just thinking about what would happen if workers just start stacking up because they couldn't be processed and then would race with each other. I mean this all would be fixable but I really fail to see how that makes sense from the very beginning. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 14:22 ` Michal Hocko @ 2016-03-11 14:46 ` Vladimir Davydov 0 siblings, 0 replies; 13+ messages in thread From: Vladimir Davydov @ 2016-03-11 14:46 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Andrew Morton, linux-mm, cgroups, linux-kernel, kernel-team On Fri, Mar 11, 2016 at 03:22:27PM +0100, Michal Hocko wrote: > On Fri 11-03-16 17:01:46, Vladimir Davydov wrote: > > On Fri, Mar 11, 2016 at 02:39:36PM +0100, Michal Hocko wrote: > > > On Fri 11-03-16 14:49:34, Vladimir Davydov wrote: > > > > On Fri, Mar 11, 2016 at 10:53:09AM +0100, Michal Hocko wrote: > > > > > > OTOH memory.low and memory.high are perfect to be changed dynamically, > > > > > > basing on containers' memory demand/pressure. A load manager might want > > > > > > to reconfigure these knobs say every 5 seconds. Spawning a thread per > > > > > > each container that often would look unnecessarily overcomplicated IMO. > > > > > > > > > > The question however is whether we want to hide a potentially costly > > > > > operation and have it unaccounted and hidden in the kworker context. > > > > > > > > There's already mem_cgroup->high_work doing reclaim in an unaccounted > > > > context quite often if tcp accounting is enabled. > > > > > > I suspect this is done because the charging context cannot do much > > > better. > > > > > > > And there's kswapd. > > > > memory.high knob is for the root only so it can't be abused by an > > > > unprivileged user. Regarding a privileged user, e.g. load manager, it > > > > can screw things up anyway, e.g. by configuring sum of memory.low to be > > > > greater than total RAM on the host and hence driving kswapd mad. > > > > > > I am not worried about abuse. It is just weird to move something which > > > can be perfectly sync to an async mode. > > > > > > > > I mean fork() + write() doesn't sound terribly complicated to me to have > > > > > a rather subtle behavior in the kernel. > > > > > > > > It'd be just a dubious API IMHO. With memory.max everything's clear: it > > > > tries to reclaim memory hard, may stall for several seconds, may invoke > > > > OOM, but if it finishes successfully we have memory.current less than > > > > memory.max. With this patch memory.high knob behaves rather strangely: > > > > it might stall, but there's no guarantee you'll have memory.current less > > > > than memory.high; moreover, according to the documentation it's OK to > > > > have memory.current greater than memory.high, so what's the point in > > > > calling synchronous reclaim blocking the caller? > > > > > > Even if the reclaim is best effort it doesn't mean we should hide it > > > into an async context. There is simply no reason to do so. We do the > > > some for other knobs which are performing a potentially expensive > > > operation and do not guarantee the result. > > > > IMO it depends on what a knob is used for. If it's for testing or > > debugging or recovering the system (e.g. manual oom, compact, > > drop_caches), this must be synchronous, but memory.high is going to be > > tweaked at runtime during normal system operation every several seconds > > or so, > > Is this really going to happen in the real life? I don't think anyone can say for sure now, but I think it's possible. > And if yes is it really probable that such an adjustment would cause > such a large disruption? Dunno. > > > at least in my understanding. I understand your concern, and may > > be you're right in the end, but think about userspace that will probably > > have to spawn thousands threads every 5 seconds or so just to write to a > > file. It's painful IMO. > > > > Are there any hidden non-obvious implications of handing over reclaim to > > a kernel worker on adjusting memory.high? May be, I'm just missing > > something obvious, and it can be really dangerous or sub-optimal. > > I am just thinking about what would happen if workers just start > stacking up because they couldn't be processed and then would race with > each other. We can use one work per memcg (high_work?) and make it requeue itself if it finds that memory usage is above memory.high and there are reclaimable pages, so no stacking should happen. > I mean this all would be fixable but I really fail to see > how that makes sense from the very beginning. This is a user API we're defining right now. Changing it in future will be troublesome. Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-11 8:34 ` Vladimir Davydov 2016-03-11 8:42 ` Michal Hocko @ 2016-03-16 5:41 ` Johannes Weiner 2016-03-16 14:47 ` Vladimir Davydov 1 sibling, 1 reply; 13+ messages in thread From: Johannes Weiner @ 2016-03-16 5:41 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team On Fri, Mar 11, 2016 at 11:34:40AM +0300, Vladimir Davydov wrote: > On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > > When setting memory.high below usage, nothing happens until the next > > charge comes along, and then it will only reclaim its own charge and > > not the now potentially huge excess of the new memory.high. This can > > cause groups to stay in excess of their memory.high indefinitely. > > > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > > goes after the delta. > > I agree that we should reclaim the high excess, but I don't think it's a > good idea to do it synchronously. Currently, memory.low and memory.high > knobs can be easily used by a single-threaded load manager implemented > in userspace, because it doesn't need to care about potential stalls > caused by writes to these files. After this change it might happen that > a write to memory.high would take long, seconds perhaps, so in order to > react quickly to changes in other cgroups, a load manager would have to > spawn a thread per each write to memory.high, which would complicate its > implementation significantly. While I do expect memory.high to be adjusted every once in a while, I can't see anybody doing it by a significant fraction of the cgroup every couple of seconds - or tighter than the workingset; and dropping use-once cache is cheap. What kind of usecase would that be? But even if we're wrong about it and this becomes a scalability issue, the knob - even when reclaiming synchroneously - makes no guarantees about the target being met once the write finishes. It's a best effort mechanism. What would break if we made it async later on? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage 2016-03-16 5:41 ` Johannes Weiner @ 2016-03-16 14:47 ` Vladimir Davydov 0 siblings, 0 replies; 13+ messages in thread From: Vladimir Davydov @ 2016-03-16 14:47 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team On Tue, Mar 15, 2016 at 10:41:57PM -0700, Johannes Weiner wrote: > On Fri, Mar 11, 2016 at 11:34:40AM +0300, Vladimir Davydov wrote: > > On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > > > When setting memory.high below usage, nothing happens until the next > > > charge comes along, and then it will only reclaim its own charge and > > > not the now potentially huge excess of the new memory.high. This can > > > cause groups to stay in excess of their memory.high indefinitely. > > > > > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > > > goes after the delta. > > > > I agree that we should reclaim the high excess, but I don't think it's a > > good idea to do it synchronously. Currently, memory.low and memory.high > > knobs can be easily used by a single-threaded load manager implemented > > in userspace, because it doesn't need to care about potential stalls > > caused by writes to these files. After this change it might happen that > > a write to memory.high would take long, seconds perhaps, so in order to > > react quickly to changes in other cgroups, a load manager would have to > > spawn a thread per each write to memory.high, which would complicate its > > implementation significantly. > > While I do expect memory.high to be adjusted every once in a while, I > can't see anybody doing it by a significant fraction of the cgroup > every couple of seconds - or tighter than the workingset; and dropping > use-once cache is cheap. What kind of usecase would that be? I agree that a load manager won't need to adjust memory.high by a significant amount often, but there can be a lot of containers running, so even if it takes 10 ms to adjust memory.high for one container, it will take up to a second for 100 containers. I expect that a load manager implementation will just blindly spawn a thread per each memory.high update to be sure it won't be stalled for too long. > > But even if we're wrong about it and this becomes a scalability issue, > the knob - even when reclaiming synchroneously - makes no guarantees > about the target being met once the write finishes. It's a best effort > mechanism. What would break if we made it async later on? You're right of course - we wouldn't be able to change async to sync, but not the other way round. However, I'm afraid that by making it sync from the very beginning we effectively enforce userspace applications that need to update memory.{low,high} often to use multi-threading. Not sure if it's that bad though. Thanks, Vladimir ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-03-16 15:21 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-10 20:50 [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage Johannes Weiner 2016-03-11 7:55 ` Michal Hocko 2016-03-11 8:34 ` Vladimir Davydov 2016-03-11 8:42 ` Michal Hocko 2016-03-11 9:13 ` Vladimir Davydov 2016-03-11 9:53 ` Michal Hocko 2016-03-11 11:49 ` Vladimir Davydov 2016-03-11 13:39 ` Michal Hocko 2016-03-11 14:01 ` Vladimir Davydov 2016-03-11 14:22 ` Michal Hocko 2016-03-11 14:46 ` Vladimir Davydov 2016-03-16 5:41 ` Johannes Weiner 2016-03-16 14:47 ` Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).