Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Shakeel Butt <shakeelb@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes
Date: Thu, 21 Dec 2017 13:00:46 +0300	[thread overview]
Message-ID: <5db8aef5-2d5e-1e3b-d121-778fc4bd6875@virtuozzo.com> (raw)
In-Reply-To: <CALvZod7ED3qaqekGTd-2PHmbTjY+D_NcFP1bE5_AgP8OF=jXJw@mail.gmail.com>



On 12/20/2017 09:15 PM, Shakeel Butt wrote:
> On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote:
>>> On 12/20/2017 01:33 PM, Michal Hocko wrote:
>>>> On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote:
>>>>> mem_cgroup_resize_[memsw]_limit() tries to free only 32 (SWAP_CLUSTER_MAX)
>>>>> pages on each iteration. This makes practically impossible to decrease
>>>>> limit of memory cgroup. Tasks could easily allocate back 32 pages,
>>>>> so we can't reduce memory usage, and once retry_count reaches zero we return
>>>>> -EBUSY.
>>>>>
>>>>> It's easy to reproduce the problem by running the following commands:
>>>>>
>>>>>   mkdir /sys/fs/cgroup/memory/test
>>>>>   echo $$ >> /sys/fs/cgroup/memory/test/tasks
>>>>>   cat big_file > /dev/null &
>>>>>   sleep 1 && echo $((100*1024*1024)) > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>>>>>   -bash: echo: write error: Device or resource busy
>>>>>
>>>>> Instead of trying to free small amount of pages, it's much more
>>>>> reasonable to free 'usage - limit' pages.
>>>>
>>>> But that only makes the issue less probable. It doesn't fix it because
>>>>             if (curusage >= oldusage)
>>>>                     retry_count--;
>>>> can still be true because allocator might be faster than the reclaimer.
>>>> Wouldn't it be more reasonable to simply remove the retry count and keep
>>>> trying until interrupted or we manage to update the limit.
>>>
>>> But does it makes sense to continue reclaiming even if reclaimer can't
>>> make any progress? I'd say no. "Allocator is faster than reclaimer"
>>> may be not the only reason for failed reclaim. E.g. we could try to
>>> set limit lower than amount of mlock()ed memory in cgroup, retrying
>>> reclaim would be just a waste of machine's resources.  Or we simply
>>> don't have any swap, and anon > new_limit. Should be burn the cpu in
>>> that case?
>>
>> We can check the number of reclaimed pages and go EBUSY if it is 0.
>>
>>>> Another option would be to commit the new limit and allow temporal overcommit
>>>> of the hard limit. New allocations and the limit update paths would
>>>> reclaim to the hard limit.
>>>>
>>>
>>> It sounds a bit fragile and tricky to me. I wouldn't go that way
>>> without unless we have a very good reason for this.
>>
>> I haven't explored this, to be honest, so there may be dragons that way.
>> I've just mentioned that option for completness.
>>
> 
> We already do this for cgroup-v2's memory.max. So, I don't think it is
> fragile or tricky.
> 

It has a potential to break userspace expectation. Userspace might expect that lowering 
limit_in_bytes too much fails with EBUSY and doesn't trigger OOM killer.

WARNING: multiple messages have this Message-ID (diff)

From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Shakeel Butt <shakeelb@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes
Date: Thu, 21 Dec 2017 13:00:46 +0300	[thread overview]
Message-ID: <5db8aef5-2d5e-1e3b-d121-778fc4bd6875@virtuozzo.com> (raw)
In-Reply-To: <CALvZod7ED3qaqekGTd-2PHmbTjY+D_NcFP1bE5_AgP8OF=jXJw@mail.gmail.com>



On 12/20/2017 09:15 PM, Shakeel Butt wrote:
> On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote:
>>> On 12/20/2017 01:33 PM, Michal Hocko wrote:
>>>> On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote:
>>>>> mem_cgroup_resize_[memsw]_limit() tries to free only 32 (SWAP_CLUSTER_MAX)
>>>>> pages on each iteration. This makes practically impossible to decrease
>>>>> limit of memory cgroup. Tasks could easily allocate back 32 pages,
>>>>> so we can't reduce memory usage, and once retry_count reaches zero we return
>>>>> -EBUSY.
>>>>>
>>>>> It's easy to reproduce the problem by running the following commands:
>>>>>
>>>>>   mkdir /sys/fs/cgroup/memory/test
>>>>>   echo $$ >> /sys/fs/cgroup/memory/test/tasks
>>>>>   cat big_file > /dev/null &
>>>>>   sleep 1 && echo $((100*1024*1024)) > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>>>>>   -bash: echo: write error: Device or resource busy
>>>>>
>>>>> Instead of trying to free small amount of pages, it's much more
>>>>> reasonable to free 'usage - limit' pages.
>>>>
>>>> But that only makes the issue less probable. It doesn't fix it because
>>>>             if (curusage >= oldusage)
>>>>                     retry_count--;
>>>> can still be true because allocator might be faster than the reclaimer.
>>>> Wouldn't it be more reasonable to simply remove the retry count and keep
>>>> trying until interrupted or we manage to update the limit.
>>>
>>> But does it makes sense to continue reclaiming even if reclaimer can't
>>> make any progress? I'd say no. "Allocator is faster than reclaimer"
>>> may be not the only reason for failed reclaim. E.g. we could try to
>>> set limit lower than amount of mlock()ed memory in cgroup, retrying
>>> reclaim would be just a waste of machine's resources.  Or we simply
>>> don't have any swap, and anon > new_limit. Should be burn the cpu in
>>> that case?
>>
>> We can check the number of reclaimed pages and go EBUSY if it is 0.
>>
>>>> Another option would be to commit the new limit and allow temporal overcommit
>>>> of the hard limit. New allocations and the limit update paths would
>>>> reclaim to the hard limit.
>>>>
>>>
>>> It sounds a bit fragile and tricky to me. I wouldn't go that way
>>> without unless we have a very good reason for this.
>>
>> I haven't explored this, to be honest, so there may be dragons that way.
>> I've just mentioned that option for completness.
>>
> 
> We already do this for cgroup-v2's memory.max. So, I don't think it is
> fragile or tricky.
> 

It has a potential to break userspace expectation. Userspace might expect that lowering 
limit_in_bytes too much fails with EBUSY and doesn't trigger OOM killer.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Andrey Ryabinin <aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes
Date: Thu, 21 Dec 2017 13:00:46 +0300	[thread overview]
Message-ID: <5db8aef5-2d5e-1e3b-d121-778fc4bd6875@virtuozzo.com> (raw)
In-Reply-To: <CALvZod7ED3qaqekGTd-2PHmbTjY+D_NcFP1bE5_AgP8OF=jXJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>



On 12/20/2017 09:15 PM, Shakeel Butt wrote:
> On Wed, Dec 20, 2017 at 3:34 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> On Wed 20-12-17 14:32:19, Andrey Ryabinin wrote:
>>> On 12/20/2017 01:33 PM, Michal Hocko wrote:
>>>> On Wed 20-12-17 13:24:28, Andrey Ryabinin wrote:
>>>>> mem_cgroup_resize_[memsw]_limit() tries to free only 32 (SWAP_CLUSTER_MAX)
>>>>> pages on each iteration. This makes practically impossible to decrease
>>>>> limit of memory cgroup. Tasks could easily allocate back 32 pages,
>>>>> so we can't reduce memory usage, and once retry_count reaches zero we return
>>>>> -EBUSY.
>>>>>
>>>>> It's easy to reproduce the problem by running the following commands:
>>>>>
>>>>>   mkdir /sys/fs/cgroup/memory/test
>>>>>   echo $$ >> /sys/fs/cgroup/memory/test/tasks
>>>>>   cat big_file > /dev/null &
>>>>>   sleep 1 && echo $((100*1024*1024)) > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
>>>>>   -bash: echo: write error: Device or resource busy
>>>>>
>>>>> Instead of trying to free small amount of pages, it's much more
>>>>> reasonable to free 'usage - limit' pages.
>>>>
>>>> But that only makes the issue less probable. It doesn't fix it because
>>>>             if (curusage >= oldusage)
>>>>                     retry_count--;
>>>> can still be true because allocator might be faster than the reclaimer.
>>>> Wouldn't it be more reasonable to simply remove the retry count and keep
>>>> trying until interrupted or we manage to update the limit.
>>>
>>> But does it makes sense to continue reclaiming even if reclaimer can't
>>> make any progress? I'd say no. "Allocator is faster than reclaimer"
>>> may be not the only reason for failed reclaim. E.g. we could try to
>>> set limit lower than amount of mlock()ed memory in cgroup, retrying
>>> reclaim would be just a waste of machine's resources.  Or we simply
>>> don't have any swap, and anon > new_limit. Should be burn the cpu in
>>> that case?
>>
>> We can check the number of reclaimed pages and go EBUSY if it is 0.
>>
>>>> Another option would be to commit the new limit and allow temporal overcommit
>>>> of the hard limit. New allocations and the limit update paths would
>>>> reclaim to the hard limit.
>>>>
>>>
>>> It sounds a bit fragile and tricky to me. I wouldn't go that way
>>> without unless we have a very good reason for this.
>>
>> I haven't explored this, to be honest, so there may be dragons that way.
>> I've just mentioned that option for completness.
>>
> 
> We already do this for cgroup-v2's memory.max. So, I don't think it is
> fragile or tricky.
> 

It has a potential to break userspace expectation. Userspace might expect that lowering 
limit_in_bytes too much fails with EBUSY and doesn't trigger OOM killer.

next prev parent reply	other threads:[~2017-12-21  9:57 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-20 10:24 [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes Andrey Ryabinin
2017-12-20 10:24 ` Andrey Ryabinin
2017-12-20 10:24 ` [PATCH 2/2] mm/memcg: Consolidate mem_cgroup_resize_[memsw]_limit() functions Andrey Ryabinin
2017-12-20 10:24   ` Andrey Ryabinin
2017-12-20 10:33 ` [PATCH 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes Michal Hocko
2017-12-20 10:33   ` Michal Hocko
2017-12-20 11:32   ` Andrey Ryabinin
2017-12-20 11:32     ` Andrey Ryabinin
2017-12-20 11:34     ` Michal Hocko
2017-12-20 11:34       ` Michal Hocko
2017-12-20 18:15       ` Shakeel Butt
2017-12-20 18:15         ` Shakeel Butt
2017-12-21 10:00         ` Andrey Ryabinin [this message]
2017-12-21 10:00           ` Andrey Ryabinin
2017-12-21 10:00           ` Andrey Ryabinin
2017-12-20 13:21 ` [PATCH v2 " Andrey Ryabinin
2017-12-20 13:21   ` Andrey Ryabinin
2017-12-20 13:21   ` [PATCH v2 2/2] mm/memcg: Consolidate mem_cgroup_resize_[memsw]_limit() functions Andrey Ryabinin
2017-12-20 13:21     ` Andrey Ryabinin
2017-12-20 13:53   ` [PATCH v2 1/2] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes Michal Hocko
2017-12-20 13:53     ` Michal Hocko
2018-01-09 16:58     ` [PATCH v3 " Andrey Ryabinin
2018-01-09 16:58       ` Andrey Ryabinin
2018-01-09 16:58       ` [PATCH v3 2/2] mm/memcg: Consolidate mem_cgroup_resize_[memsw]_limit() functions Andrey Ryabinin
2018-01-09 16:58         ` Andrey Ryabinin
2018-01-09 17:10         ` Shakeel Butt
2018-01-09 17:10           ` Shakeel Butt
2018-01-09 17:10           ` Shakeel Butt
2018-01-09 17:26           ` Andrey Ryabinin
2018-01-09 17:26             ` Andrey Ryabinin
2018-01-09 23:26             ` Andrew Morton
2018-01-09 23:26               ` Andrew Morton
2018-01-10 12:43               ` [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes Andrey Ryabinin
2018-01-10 12:43                 ` Andrey Ryabinin
2018-01-10 12:43                 ` Andrey Ryabinin
2018-01-10 22:31                 ` Andrew Morton
2018-01-10 22:31                   ` Andrew Morton
2018-01-11 11:59                   ` Andrey Ryabinin
2018-01-11 11:59                     ` Andrey Ryabinin
2018-01-12  0:21                     ` Andrew Morton
2018-01-12  0:21                       ` Andrew Morton
2018-01-12  0:21                       ` Andrew Morton
2018-01-12  9:08                       ` Andrey Ryabinin
2018-01-12  9:08                         ` Andrey Ryabinin
2018-01-11 10:42                 ` Michal Hocko
2018-01-11 10:42                   ` Michal Hocko
2018-01-11 10:42                   ` Michal Hocko
2018-01-11 12:21                   ` Andrey Ryabinin
2018-01-11 12:21                     ` Andrey Ryabinin
2018-01-11 12:21                     ` Andrey Ryabinin
2018-01-11 12:46                     ` Michal Hocko
2018-01-11 12:46                       ` Michal Hocko
2018-01-11 15:23                       ` Andrey Ryabinin
2018-01-11 15:23                         ` Andrey Ryabinin
2018-01-11 15:23                         ` Andrey Ryabinin
2018-01-11 16:29                         ` Michal Hocko
2018-01-11 16:29                           ` Michal Hocko
2018-01-11 16:29                           ` Michal Hocko
2018-01-11 21:59                           ` Andrey Ryabinin
2018-01-11 21:59                             ` Andrey Ryabinin
2018-01-11 21:59                             ` Andrey Ryabinin
2018-01-12 12:24                             ` Michal Hocko
2018-01-12 12:24                               ` Michal Hocko
2018-01-12 22:57                               ` Shakeel Butt
2018-01-12 22:57                                 ` Shakeel Butt
2018-01-12 22:57                                 ` Shakeel Butt
2018-01-15 12:29                                 ` Andrey Ryabinin
2018-01-15 12:29                                   ` Andrey Ryabinin
2018-01-15 17:04                                   ` Shakeel Butt
2018-01-15 17:04                                     ` Shakeel Butt
2018-01-15 17:04                                     ` Shakeel Butt
2018-01-15 12:30                               ` Andrey Ryabinin
2018-01-15 12:30                                 ` Andrey Ryabinin
2018-01-15 12:46                                 ` Michal Hocko
2018-01-15 12:46                                   ` Michal Hocko
2018-01-15 12:53                                   ` Andrey Ryabinin
2018-01-15 12:53                                     ` Andrey Ryabinin
2018-01-15 12:58                                     ` Michal Hocko
2018-01-15 12:58                                       ` Michal Hocko
2018-01-09 17:08       ` [PATCH v3 1/2] " Andrey Ryabinin
2018-01-09 17:08         ` Andrey Ryabinin
2018-01-09 17:08         ` Andrey Ryabinin
2018-01-09 17:22       ` Shakeel Butt
2018-01-09 17:22         ` Shakeel Butt
2018-01-19 13:25 ` [PATCH v5 1/2] mm/memcontrol.c: " Andrey Ryabinin
2018-01-19 13:25   ` Andrey Ryabinin
2018-01-19 13:25   ` Andrey Ryabinin
2018-01-19 13:25   ` [PATCH v5 2/2] mm/memcontrol.c: Reduce reclaim retries in mem_cgroup_resize_limit() Andrey Ryabinin
2018-01-19 13:25     ` Andrey Ryabinin
2018-01-19 13:35     ` Michal Hocko
2018-01-19 13:35       ` Michal Hocko
2018-01-19 14:49       ` Shakeel Butt
2018-01-19 14:49         ` Shakeel Butt
2018-01-19 14:49         ` Shakeel Butt
2018-01-19 15:11         ` Michal Hocko
2018-01-19 15:11           ` Michal Hocko
2018-01-19 15:11           ` Michal Hocko
2018-01-19 15:24           ` Shakeel Butt
2018-01-19 15:24             ` Shakeel Butt
2018-01-19 15:31             ` Michal Hocko
2018-01-19 15:31               ` Michal Hocko
2018-01-19 15:31               ` Michal Hocko
2018-02-21 20:17           ` Andrew Morton
2018-02-21 20:17             ` Andrew Morton
2018-02-22 13:50             ` Andrey Ryabinin
2018-02-22 13:50               ` Andrey Ryabinin
2018-02-22 14:09               ` Michal Hocko
2018-02-22 14:09                 ` Michal Hocko
2018-02-22 15:13                 ` Andrey Ryabinin
2018-02-22 15:13                   ` Andrey Ryabinin
2018-02-22 15:33                   ` Michal Hocko
2018-02-22 15:33                     ` Michal Hocko
2018-02-22 15:38                     ` Andrey Ryabinin
2018-02-22 15:38                       ` Andrey Ryabinin
2018-02-22 15:44                       ` Michal Hocko
2018-02-22 15:44                         ` Michal Hocko
2018-02-22 16:01                         ` Andrey Ryabinin
2018-02-22 16:01                           ` Andrey Ryabinin
2018-02-22 16:30                           ` Michal Hocko
2018-02-22 16:30                             ` Michal Hocko
2018-01-19 13:32   ` [PATCH v5 1/2] mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes Michal Hocko
2018-01-19 13:32     ` Michal Hocko
2018-01-19 13:32     ` Michal Hocko
2018-01-25 19:44   ` Andrey Ryabinin
2018-01-25 19:44     ` Andrey Ryabinin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5db8aef5-2d5e-1e3b-d121-778fc4bd6875@virtuozzo.com \
    --to=aryabinin@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.