* [PATCH] memcg: schedule high reclaim for remote memcgs on high_work
@ 2019-01-03 1:56 Shakeel Butt
2019-01-08 14:59 ` Michal Hocko
0 siblings, 1 reply; 3+ messages in thread
From: Shakeel Butt @ 2019-01-03 1:56 UTC (permalink / raw)
To: Johannes Weiner, Vladimir Davydov, Michal Hocko, Andrew Morton
Cc: linux-mm, cgroups, linux-kernel, Shakeel Butt
If a memcg is over high limit, memory reclaim is scheduled to run on
return-to-userland. However it is assumed that the memcg is the current
process's memcg. With remote memcg charging for kmem or swapping in a
page charged to remote memcg, current process can trigger reclaim on
remote memcg. So, schduling reclaim on return-to-userland for remote
memcgs will ignore the high reclaim altogether. So, punt the high
reclaim of remote memcgs to high_work.
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
mm/memcontrol.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e9db1160ccbc..47439c84667a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2302,19 +2302,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
* reclaim on returning to userland. We can perform reclaim here
* if __GFP_RECLAIM but let's always punt for simplicity and so that
* GFP_KERNEL can consistently be used during reclaim. @memcg is
- * not recorded as it most likely matches current's and won't
- * change in the meantime. As high limit is checked again before
- * reclaim, the cost of mismatch is negligible.
+ * not recorded as the return-to-userland high reclaim will only reclaim
+ * from current's memcg (or its ancestor). For other memcgs we punt them
+ * to work queue.
*/
do {
if (page_counter_read(&memcg->memory) > memcg->high) {
- /* Don't bother a random interrupted task */
- if (in_interrupt()) {
+ /*
+ * Don't bother a random interrupted task or if the
+ * memcg is not current's memcg's ancestor.
+ */
+ if (in_interrupt() ||
+ !mm_match_cgroup(current->mm, memcg)) {
schedule_work(&memcg->high_work);
- break;
+ } else {
+ current->memcg_nr_pages_over_high += batch;
+ set_notify_resume(current);
}
- current->memcg_nr_pages_over_high += batch;
- set_notify_resume(current);
break;
}
} while ((memcg = parent_mem_cgroup(memcg)));
--
2.20.1.415.g653613c723-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] memcg: schedule high reclaim for remote memcgs on high_work
2019-01-03 1:56 [PATCH] memcg: schedule high reclaim for remote memcgs on high_work Shakeel Butt
@ 2019-01-08 14:59 ` Michal Hocko
2019-01-08 17:24 ` Shakeel Butt
0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2019-01-08 14:59 UTC (permalink / raw)
To: Shakeel Butt
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, linux-mm,
cgroups, linux-kernel
On Wed 02-01-19 17:56:38, Shakeel Butt wrote:
> If a memcg is over high limit, memory reclaim is scheduled to run on
> return-to-userland. However it is assumed that the memcg is the current
> process's memcg. With remote memcg charging for kmem or swapping in a
> page charged to remote memcg, current process can trigger reclaim on
> remote memcg. So, schduling reclaim on return-to-userland for remote
> memcgs will ignore the high reclaim altogether. So, punt the high
> reclaim of remote memcgs to high_work.
Have you seen this happening in real life workloads? And is this
offloading what we really want to do? I mean it is clearly the current
task that has triggered the remote charge so why should we offload that
work to a system? Is there any reason we cannot reclaim on the remote
memcg from the return-to-userland path?
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
> mm/memcontrol.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e9db1160ccbc..47439c84667a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2302,19 +2302,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> * reclaim on returning to userland. We can perform reclaim here
> * if __GFP_RECLAIM but let's always punt for simplicity and so that
> * GFP_KERNEL can consistently be used during reclaim. @memcg is
> - * not recorded as it most likely matches current's and won't
> - * change in the meantime. As high limit is checked again before
> - * reclaim, the cost of mismatch is negligible.
> + * not recorded as the return-to-userland high reclaim will only reclaim
> + * from current's memcg (or its ancestor). For other memcgs we punt them
> + * to work queue.
> */
> do {
> if (page_counter_read(&memcg->memory) > memcg->high) {
> - /* Don't bother a random interrupted task */
> - if (in_interrupt()) {
> + /*
> + * Don't bother a random interrupted task or if the
> + * memcg is not current's memcg's ancestor.
> + */
> + if (in_interrupt() ||
> + !mm_match_cgroup(current->mm, memcg)) {
> schedule_work(&memcg->high_work);
> - break;
> + } else {
> + current->memcg_nr_pages_over_high += batch;
> + set_notify_resume(current);
> }
> - current->memcg_nr_pages_over_high += batch;
> - set_notify_resume(current);
> break;
> }
> } while ((memcg = parent_mem_cgroup(memcg)));
> --
> 2.20.1.415.g653613c723-goog
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] memcg: schedule high reclaim for remote memcgs on high_work
2019-01-08 14:59 ` Michal Hocko
@ 2019-01-08 17:24 ` Shakeel Butt
0 siblings, 0 replies; 3+ messages in thread
From: Shakeel Butt @ 2019-01-08 17:24 UTC (permalink / raw)
To: Michal Hocko
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Linux MM,
Cgroups, LKML
On Tue, Jan 8, 2019 at 6:59 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 02-01-19 17:56:38, Shakeel Butt wrote:
> > If a memcg is over high limit, memory reclaim is scheduled to run on
> > return-to-userland. However it is assumed that the memcg is the current
> > process's memcg. With remote memcg charging for kmem or swapping in a
> > page charged to remote memcg, current process can trigger reclaim on
> > remote memcg. So, schduling reclaim on return-to-userland for remote
> > memcgs will ignore the high reclaim altogether. So, punt the high
> > reclaim of remote memcgs to high_work.
>
> Have you seen this happening in real life workloads?
No, just during code review.
> And is this offloading what we really want to do?
That's the question I am brainstorming nowadays. More generally how
memcg-oom-kill should work in the remote memcg charging case.
> I mean it is clearly the current
> task that has triggered the remote charge so why should we offload that
> work to a system? Is there any reason we cannot reclaim on the remote
> memcg from the return-to-userland path?
>
The only reason I did this was the code was much simpler but I see
that the current is charging the given memcg and maybe even
reclaiming, so, why not do the high reclaim as well. I will update the
patch.
thanks,
Shakeel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-01-08 17:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 1:56 [PATCH] memcg: schedule high reclaim for remote memcgs on high_work Shakeel Butt
2019-01-08 14:59 ` Michal Hocko
2019-01-08 17:24 ` Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).