* [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes, mhocko, vdavydov.dev, akpm Cc: cgroups, linux-mm, linux-kernel, Muchun Song The memcg_oom_recover() almost never do anything but the test (because oom_disabled is a rarely used) is just waste of cycles in some hot paths (e.g. kmem uncharge). And it is very small, so it is better to make it inline. Also, the parameter of memcg cannot be NULL, so removing the check can reduce useless check. Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8c035846c7a4..7afca9677693 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1925,7 +1925,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -static void memcg_oom_recover(struct mem_cgroup *memcg) +static inline void memcg_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required @@ -1935,7 +1935,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) * achieved by invoking mem_cgroup_mark_under_oom() before * triggering notification. */ - if (memcg && memcg->under_oom) + if (memcg->under_oom) __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Muchun Song The memcg_oom_recover() almost never do anything but the test (because oom_disabled is a rarely used) is just waste of cycles in some hot paths (e.g. kmem uncharge). And it is very small, so it is better to make it inline. Also, the parameter of memcg cannot be NULL, so removing the check can reduce useless check. Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8c035846c7a4..7afca9677693 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1925,7 +1925,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -static void memcg_oom_recover(struct mem_cgroup *memcg) +static inline void memcg_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required @@ -1935,7 +1935,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) * achieved by invoking mem_cgroup_mark_under_oom() before * triggering notification. */ - if (memcg && memcg->under_oom) + if (memcg->under_oom) __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes, mhocko, vdavydov.dev, akpm Cc: cgroups, linux-mm, linux-kernel, Muchun Song When we uncharge a page, we wake up oom victims when the memcg oom handling is outsourced to the userspace. The uncharge_batch do that for normal and kmem pages but not slab pages. It is likely an omission. So add the missing memcg_oom_recover() to __memcg_kmem_uncharge(). And the function of memory.oom_control is only suitable for cgroup v1. So guard this test (memcg->under_oom) by the cgroup_subsys_on_dfl(memory_cgrp_subsys). Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/memcontrol.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7afca9677693..a3f26522765a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3126,8 +3126,10 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, */ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) { - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { page_counter_uncharge(&memcg->kmem, nr_pages); + memcg_oom_recover(memcg); + } refill_stock(memcg, nr_pages); } @@ -6806,11 +6808,15 @@ static void uncharge_batch(const struct uncharge_gather *ug) if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); - if (do_memsw_account()) - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); - memcg_oom_recover(ug->memcg); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + if (!cgroup_memory_noswap) + page_counter_uncharge(&ug->memcg->memsw, + ug->nr_pages); + if (ug->nr_kmem) + page_counter_uncharge(&ug->memcg->kmem, + ug->nr_kmem); + memcg_oom_recover(ug->memcg); + } } local_irq_save(flags); -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Muchun Song When we uncharge a page, we wake up oom victims when the memcg oom handling is outsourced to the userspace. The uncharge_batch do that for normal and kmem pages but not slab pages. It is likely an omission. So add the missing memcg_oom_recover() to __memcg_kmem_uncharge(). And the function of memory.oom_control is only suitable for cgroup v1. So guard this test (memcg->under_oom) by the cgroup_subsys_on_dfl(memory_cgrp_subsys). Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> --- mm/memcontrol.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7afca9677693..a3f26522765a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3126,8 +3126,10 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, */ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) { - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { page_counter_uncharge(&memcg->kmem, nr_pages); + memcg_oom_recover(memcg); + } refill_stock(memcg, nr_pages); } @@ -6806,11 +6808,15 @@ static void uncharge_batch(const struct uncharge_gather *ug) if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); - if (do_memsw_account()) - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); - memcg_oom_recover(ug->memcg); + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + if (!cgroup_memory_noswap) + page_counter_uncharge(&ug->memcg->memsw, + ug->nr_pages); + if (ug->nr_kmem) + page_counter_uncharge(&ug->memcg->kmem, + ug->nr_kmem); + memcg_oom_recover(ug->memcg); + } } local_irq_save(flags); -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page @ 2021-02-15 9:37 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:37 UTC (permalink / raw) To: Muchun Song; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Sat 13-02-21 01:01:57, Muchun Song wrote: > When we uncharge a page, we wake up oom victims when the memcg oom > handling is outsourced to the userspace. The uncharge_batch do that > for normal and kmem pages but not slab pages. It is likely an > omission. So add the missing memcg_oom_recover() to > __memcg_kmem_uncharge(). And the function of memory.oom_control > is only suitable for cgroup v1. So guard this test (memcg->under_oom) > by the cgroup_subsys_on_dfl(memory_cgrp_subsys). User visible effects please? I believe there are unlikely any. I do not have any data at hands but I would expect that slab pages freeing wouldn't really contribute much to help a memcg out of oom without an external intervention for oom_disabled case. If that is the case then make it explicit to the changelog. If you have workloads which do see a suboptimal behavior then please mention that as well. This is important for future readers to understand the code and motivation behind it. Also, now I guess I can see why you have decided to not do cgroup v2 check directly in memcg_oom_recover. You do not want to repeat the check in paths which already do do check for you. That is fine. Appart from the uncharging path, none of the others is really a hot path so this is likely a reasonable decision. I have a minor comment below. > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memcontrol.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7afca9677693..a3f26522765a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3126,8 +3126,10 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, > */ > static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) > { > - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > page_counter_uncharge(&memcg->kmem, nr_pages); > + memcg_oom_recover(memcg); > + } > > refill_stock(memcg, nr_pages); > } > @@ -6806,11 +6808,15 @@ static void uncharge_batch(const struct uncharge_gather *ug) > > if (!mem_cgroup_is_root(ug->memcg)) { > page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); > - if (do_memsw_account()) > - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); > - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) > - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); > - memcg_oom_recover(ug->memcg); > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > + if (!cgroup_memory_noswap) > + page_counter_uncharge(&ug->memcg->memsw, > + ug->nr_pages); This is functionally equivalent but I am not sure this will make a further maintainability easier. do_memsw_account check is used at many other places and it is a general helper which you have split into its current implementation. This makes any future changes more tricky. Is this miro-optimization worth it? > + if (ug->nr_kmem) > + page_counter_uncharge(&ug->memcg->kmem, > + ug->nr_kmem); > + memcg_oom_recover(ug->memcg); > + } > } > > local_irq_save(flags); > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page @ 2021-02-15 9:37 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:37 UTC (permalink / raw) To: Muchun Song Cc: hannes-druUgvl0LCNAfugRpC6u6w, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Sat 13-02-21 01:01:57, Muchun Song wrote: > When we uncharge a page, we wake up oom victims when the memcg oom > handling is outsourced to the userspace. The uncharge_batch do that > for normal and kmem pages but not slab pages. It is likely an > omission. So add the missing memcg_oom_recover() to > __memcg_kmem_uncharge(). And the function of memory.oom_control > is only suitable for cgroup v1. So guard this test (memcg->under_oom) > by the cgroup_subsys_on_dfl(memory_cgrp_subsys). User visible effects please? I believe there are unlikely any. I do not have any data at hands but I would expect that slab pages freeing wouldn't really contribute much to help a memcg out of oom without an external intervention for oom_disabled case. If that is the case then make it explicit to the changelog. If you have workloads which do see a suboptimal behavior then please mention that as well. This is important for future readers to understand the code and motivation behind it. Also, now I guess I can see why you have decided to not do cgroup v2 check directly in memcg_oom_recover. You do not want to repeat the check in paths which already do do check for you. That is fine. Appart from the uncharging path, none of the others is really a hot path so this is likely a reasonable decision. I have a minor comment below. > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > --- > mm/memcontrol.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7afca9677693..a3f26522765a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3126,8 +3126,10 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, > */ > static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) > { > - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > page_counter_uncharge(&memcg->kmem, nr_pages); > + memcg_oom_recover(memcg); > + } > > refill_stock(memcg, nr_pages); > } > @@ -6806,11 +6808,15 @@ static void uncharge_batch(const struct uncharge_gather *ug) > > if (!mem_cgroup_is_root(ug->memcg)) { > page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); > - if (do_memsw_account()) > - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); > - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) > - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); > - memcg_oom_recover(ug->memcg); > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > + if (!cgroup_memory_noswap) > + page_counter_uncharge(&ug->memcg->memsw, > + ug->nr_pages); This is functionally equivalent but I am not sure this will make a further maintainability easier. do_memsw_account check is used at many other places and it is a general helper which you have split into its current implementation. This makes any future changes more tricky. Is this miro-optimization worth it? > + if (ug->nr_kmem) > + page_counter_uncharge(&ug->memcg->kmem, > + ug->nr_kmem); > + memcg_oom_recover(ug->memcg); > + } > } > > local_irq_save(flags); > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes, mhocko, vdavydov.dev, akpm Cc: cgroups, linux-mm, linux-kernel, Muchun Song The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, so idr_find() is pointless and wastes CPU cycles. Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a3f26522765a..68ed4b297c13 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_id(unsigned short id) { WARN_ON_ONCE(!rcu_read_lock_held()); + /* The memcg ID cannot be zero. */ + if (id == 0) + return NULL; return idr_find(&mem_cgroup_idr, id); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Muchun Song The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, so idr_find() is pointless and wastes CPU cycles. Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a3f26522765a..68ed4b297c13 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_id(unsigned short id) { WARN_ON_ONCE(!rcu_read_lock_held()); + /* The memcg ID cannot be zero. */ + if (id == 0) + return NULL; return idr_find(&mem_cgroup_idr, id); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 9:39 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:39 UTC (permalink / raw) To: Muchun Song; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Sat 13-02-21 01:01:58, Muchun Song wrote: > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > so idr_find() is pointless and wastes CPU cycles. Is this possible at all to happen? If not why should we add a test for _all_ invocations? > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memcontrol.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a3f26522765a..68ed4b297c13 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > { > WARN_ON_ONCE(!rcu_read_lock_held()); > + /* The memcg ID cannot be zero. */ > + if (id == 0) > + return NULL; > return idr_find(&mem_cgroup_idr, id); > } > > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 9:39 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:39 UTC (permalink / raw) To: Muchun Song Cc: hannes-druUgvl0LCNAfugRpC6u6w, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Sat 13-02-21 01:01:58, Muchun Song wrote: > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > so idr_find() is pointless and wastes CPU cycles. Is this possible at all to happen? If not why should we add a test for _all_ invocations? > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > --- > mm/memcontrol.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a3f26522765a..68ed4b297c13 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > { > WARN_ON_ONCE(!rcu_read_lock_held()); > + /* The memcg ID cannot be zero. */ > + if (id == 0) > + return NULL; > return idr_find(&mem_cgroup_idr, id); > } > > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero 2021-02-15 9:39 ` Michal Hocko (?) @ 2021-02-15 10:09 ` Muchun Song -1 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:09 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote: > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > so idr_find() is pointless and wastes CPU cycles. > > Is this possible at all to happen? If not why should we add a test for > _all_ invocations? Yeah, this indeed can happen. If we allocate a new swap cache page and charge it via mem_cgroup_charge, then the page will uncharge the swap counter via mem_cgroup_uncharge_swap. When the swap entry is indeed freed, we will call mem_cgroup_uncharge_swap again, In this routine, we can pass zero to mem_cgroup_from_id. Right? > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/memcontrol.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a3f26522765a..68ed4b297c13 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > { > > WARN_ON_ONCE(!rcu_read_lock_held()); > > + /* The memcg ID cannot be zero. */ > > + if (id == 0) > > + return NULL; > > return idr_find(&mem_cgroup_idr, id); > > } > > > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 10:09 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:09 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > so idr_find() is pointless and wastes CPU cycles. > > Is this possible at all to happen? If not why should we add a test for > _all_ invocations? Yeah, this indeed can happen. If we allocate a new swap cache page and charge it via mem_cgroup_charge, then the page will uncharge the swap counter via mem_cgroup_uncharge_swap. When the swap entry is indeed freed, we will call mem_cgroup_uncharge_swap again, In this routine, we can pass zero to mem_cgroup_from_id. Right? > > > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > --- > > mm/memcontrol.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a3f26522765a..68ed4b297c13 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > { > > WARN_ON_ONCE(!rcu_read_lock_held()); > > + /* The memcg ID cannot be zero. */ > > + if (id == 0) > > + return NULL; > > return idr_find(&mem_cgroup_idr, id); > > } > > > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 10:09 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:09 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote: > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > so idr_find() is pointless and wastes CPU cycles. > > Is this possible at all to happen? If not why should we add a test for > _all_ invocations? Yeah, this indeed can happen. If we allocate a new swap cache page and charge it via mem_cgroup_charge, then the page will uncharge the swap counter via mem_cgroup_uncharge_swap. When the swap entry is indeed freed, we will call mem_cgroup_uncharge_swap again, In this routine, we can pass zero to mem_cgroup_from_id. Right? > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/memcontrol.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a3f26522765a..68ed4b297c13 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > { > > WARN_ON_ONCE(!rcu_read_lock_held()); > > + /* The memcg ID cannot be zero. */ > > + if (id == 0) > > + return NULL; > > return idr_find(&mem_cgroup_idr, id); > > } > > > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 10:27 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 10:27 UTC (permalink / raw) To: Muchun Song Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon 15-02-21 18:09:44, Muchun Song wrote: > On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > > so idr_find() is pointless and wastes CPU cycles. > > > > Is this possible at all to happen? If not why should we add a test for > > _all_ invocations? > > Yeah, this indeed can happen. If we allocate a new swap cache page > and charge it via mem_cgroup_charge, then the page will uncharge > the swap counter via mem_cgroup_uncharge_swap. When the swap > entry is indeed freed, we will call mem_cgroup_uncharge_swap again, > In this routine, we can pass zero to mem_cgroup_from_id. Right? If the above claim is correct, which I would need to double check then it should have been part of the changelog! Please think of your poor reviewers and the time they have to invest into the review. I would also like to see your waste of CPU cycles argument to be backed by something. Are we talking about cycles due to an additional function call? Is this really something we should even care about? > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > --- > > > mm/memcontrol.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index a3f26522765a..68ed4b297c13 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > > { > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > + /* The memcg ID cannot be zero. */ > > > + if (id == 0) > > > + return NULL; > > > return idr_find(&mem_cgroup_idr, id); > > > } > > > > > > -- > > > 2.11.0 > > > > -- > > Michal Hocko > > SUSE Labs -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 10:27 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 10:27 UTC (permalink / raw) To: Muchun Song Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon 15-02-21 18:09:44, Muchun Song wrote: > On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > > so idr_find() is pointless and wastes CPU cycles. > > > > Is this possible at all to happen? If not why should we add a test for > > _all_ invocations? > > Yeah, this indeed can happen. If we allocate a new swap cache page > and charge it via mem_cgroup_charge, then the page will uncharge > the swap counter via mem_cgroup_uncharge_swap. When the swap > entry is indeed freed, we will call mem_cgroup_uncharge_swap again, > In this routine, we can pass zero to mem_cgroup_from_id. Right? If the above claim is correct, which I would need to double check then it should have been part of the changelog! Please think of your poor reviewers and the time they have to invest into the review. I would also like to see your waste of CPU cycles argument to be backed by something. Are we talking about cycles due to an additional function call? Is this really something we should even care about? > > > > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > > --- > > > mm/memcontrol.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index a3f26522765a..68ed4b297c13 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > > { > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > + /* The memcg ID cannot be zero. */ > > > + if (id == 0) > > > + return NULL; > > > return idr_find(&mem_cgroup_idr, id); > > > } > > > > > > -- > > > 2.11.0 > > > > -- > > Michal Hocko > > SUSE Labs -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero 2021-02-15 10:27 ` Michal Hocko (?) @ 2021-02-15 11:34 ` Muchun Song -1 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 11:34 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 6:27 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 15-02-21 18:09:44, Muchun Song wrote: > > On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > > > so idr_find() is pointless and wastes CPU cycles. > > > > > > Is this possible at all to happen? If not why should we add a test for > > > _all_ invocations? > > > > Yeah, this indeed can happen. If we allocate a new swap cache page > > and charge it via mem_cgroup_charge, then the page will uncharge > > the swap counter via mem_cgroup_uncharge_swap. When the swap > > entry is indeed freed, we will call mem_cgroup_uncharge_swap again, > > In this routine, we can pass zero to mem_cgroup_from_id. Right? > > If the above claim is correct, which I would need to double check then > it should have been part of the changelog! Please think of your poor > reviewers and the time they have to invest into the review. The easy way may be adding a printk to mem_cgroup_from_id when the parameter is zero. > > I would also like to see your waste of CPU cycles argument to be backed > by something. Are we talking about cycles due to an additional function Yeah, when the parameter is already zero, idr_find() must return zero. So I think that the additional function call is unnecessary. I have added a printk to mem_cgroup_from_id, I found the parameter can be zero several times. > call? Is this really something we should even care about? Maybe not. Just my thoughts. Thanks. > > > > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > --- > > > > mm/memcontrol.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index a3f26522765a..68ed4b297c13 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > > > { > > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > > + /* The memcg ID cannot be zero. */ > > > > + if (id == 0) > > > > + return NULL; > > > > return idr_find(&mem_cgroup_idr, id); > > > > } > > > > > > > > -- > > > > 2.11.0 > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 11:34 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 11:34 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 6:27 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > On Mon 15-02-21 18:09:44, Muchun Song wrote: > > On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > > > > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > > > so idr_find() is pointless and wastes CPU cycles. > > > > > > Is this possible at all to happen? If not why should we add a test for > > > _all_ invocations? > > > > Yeah, this indeed can happen. If we allocate a new swap cache page > > and charge it via mem_cgroup_charge, then the page will uncharge > > the swap counter via mem_cgroup_uncharge_swap. When the swap > > entry is indeed freed, we will call mem_cgroup_uncharge_swap again, > > In this routine, we can pass zero to mem_cgroup_from_id. Right? > > If the above claim is correct, which I would need to double check then > it should have been part of the changelog! Please think of your poor > reviewers and the time they have to invest into the review. The easy way may be adding a printk to mem_cgroup_from_id when the parameter is zero. > > I would also like to see your waste of CPU cycles argument to be backed > by something. Are we talking about cycles due to an additional function Yeah, when the parameter is already zero, idr_find() must return zero. So I think that the additional function call is unnecessary. I have added a printk to mem_cgroup_from_id, I found the parameter can be zero several times. > call? Is this really something we should even care about? Maybe not. Just my thoughts. Thanks. > > > > > > > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > > > --- > > > > mm/memcontrol.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index a3f26522765a..68ed4b297c13 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > > > { > > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > > + /* The memcg ID cannot be zero. */ > > > > + if (id == 0) > > > > + return NULL; > > > > return idr_find(&mem_cgroup_idr, id); > > > > } > > > > > > > > -- > > > > 2.11.0 > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 3/4] mm: memcontrol: bail out early when id is zero @ 2021-02-15 11:34 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 11:34 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 6:27 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 15-02-21 18:09:44, Muchun Song wrote: > > On Mon, Feb 15, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Sat 13-02-21 01:01:58, Muchun Song wrote: > > > > The memcg ID cannot be zero, but we can pass zero to mem_cgroup_from_id, > > > > so idr_find() is pointless and wastes CPU cycles. > > > > > > Is this possible at all to happen? If not why should we add a test for > > > _all_ invocations? > > > > Yeah, this indeed can happen. If we allocate a new swap cache page > > and charge it via mem_cgroup_charge, then the page will uncharge > > the swap counter via mem_cgroup_uncharge_swap. When the swap > > entry is indeed freed, we will call mem_cgroup_uncharge_swap again, > > In this routine, we can pass zero to mem_cgroup_from_id. Right? > > If the above claim is correct, which I would need to double check then > it should have been part of the changelog! Please think of your poor > reviewers and the time they have to invest into the review. The easy way may be adding a printk to mem_cgroup_from_id when the parameter is zero. > > I would also like to see your waste of CPU cycles argument to be backed > by something. Are we talking about cycles due to an additional function Yeah, when the parameter is already zero, idr_find() must return zero. So I think that the additional function call is unnecessary. I have added a printk to mem_cgroup_from_id, I found the parameter can be zero several times. > call? Is this really something we should even care about? Maybe not. Just my thoughts. Thanks. > > > > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > --- > > > > mm/memcontrol.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index a3f26522765a..68ed4b297c13 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -5173,6 +5173,9 @@ static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) > > > > struct mem_cgroup *mem_cgroup_from_id(unsigned short id) > > > > { > > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > > + /* The memcg ID cannot be zero. */ > > > > + if (id == 0) > > > > + return NULL; > > > > return idr_find(&mem_cgroup_idr, id); > > > > } > > > > > > > > -- > > > > 2.11.0 > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes, mhocko, vdavydov.dev, akpm Cc: cgroups, linux-mm, linux-kernel, Muchun Song The swap charges the actual number of swap entries on cgroup v2. If a swap cache page is charged successful, and then we uncharge the swap counter. It is wrong on cgroup v2. Because the swap entry is not freed. Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c737c8f05992..be6bc5044150 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) memcg_check_events(memcg, page); local_irq_enable(); - if (PageSwapCache(page)) { + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; /* * The swap entry might not get freed for a long time, -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-12 17:01 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-12 17:01 UTC (permalink / raw) To: hannes-druUgvl0LCNAfugRpC6u6w, mhocko-DgEjT+Ai2ygdnm+yROfE0A, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Muchun Song The swap charges the actual number of swap entries on cgroup v2. If a swap cache page is charged successful, and then we uncharge the swap counter. It is wrong on cgroup v2. Because the swap entry is not freed. Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c737c8f05992..be6bc5044150 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) memcg_check_events(memcg, page); local_irq_enable(); - if (PageSwapCache(page)) { + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; /* * The swap entry might not get freed for a long time, -- 2.11.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-12 17:01 ` Muchun Song @ 2021-02-12 18:56 ` Shakeel Butt -1 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-12 18:56 UTC (permalink / raw) To: Muchun Song, Huang Ying, Tim Chen Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML CCing more folks. On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> What's the user visible impact of this change? One impact I can see is that without this patch meminfo's (SwapTotal - SwapFree) is larger than the sum of top level memory.swap.current. This change will reduce that gap. BTW what about per-cpu slots_ret cache? Should we call mem_cgroup_uncharge_swap() before putting in the cache after this change? > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > swp_entry_t entry = { .val = page_private(page) }; > /* > * The swap entry might not get freed for a long time, > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-12 18:56 ` Shakeel Butt 0 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-12 18:56 UTC (permalink / raw) To: Muchun Song, Huang Ying, Tim Chen Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML CCing more folks. On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> What's the user visible impact of this change? One impact I can see is that without this patch meminfo's (SwapTotal - SwapFree) is larger than the sum of top level memory.swap.current. This change will reduce that gap. BTW what about per-cpu slots_ret cache? Should we call mem_cgroup_uncharge_swap() before putting in the cache after this change? > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > swp_entry_t entry = { .val = page_private(page) }; > /* > * The swap entry might not get freed for a long time, > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-12 18:56 ` Shakeel Butt @ 2021-02-13 6:48 ` Muchun Song -1 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-13 6:48 UTC (permalink / raw) To: Shakeel Butt Cc: Huang Ying, Tim Chen, Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt <shakeelb@google.com> wrote: > > CCing more folks. > > On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > What's the user visible impact of this change? IIUC, I think that we cannot limit the swap to memory.swap.max on cgroup v2. cd /sys/fs/cgroup/ mkdir test cd test echo 8192 > memory.max echo 4096 > memory.swap.max OK. Now we limit swap to 1 page and memory to 2 pages. Firstly, we allocate 1 page from this memory cgroup and swap this page to swap disk. We can see: memory.current: 0 memory.swap.current: 1 Then we touch this page, we will swap in and charge the swap cache page to the memory counter and uncharge the swap counter. memory.current: 1 memory.swap.current: 0 (but actually we use a swap entry) Then we allocate another 1 page from this memory cgroup. memory.current: 2 memory.swap.current: 0 (but actually we use a swap entry) If we swap those 2 pages to swap disk. We can charge and swap those 2 pages successfully. Right? Maybe I am wrong. > > One impact I can see is that without this patch meminfo's (SwapTotal - > SwapFree) is larger than the sum of top level memory.swap.current. > This change will reduce that gap. > > BTW what about per-cpu slots_ret cache? Should we call > mem_cgroup_uncharge_swap() before putting in the cache after this > change? > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > > /* > > * The swap entry might not get freed for a long time, > > -- > > 2.11.0 > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-13 6:48 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-13 6:48 UTC (permalink / raw) To: Shakeel Butt Cc: Huang Ying, Tim Chen, Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt <shakeelb@google.com> wrote: > > CCing more folks. > > On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > What's the user visible impact of this change? IIUC, I think that we cannot limit the swap to memory.swap.max on cgroup v2. cd /sys/fs/cgroup/ mkdir test cd test echo 8192 > memory.max echo 4096 > memory.swap.max OK. Now we limit swap to 1 page and memory to 2 pages. Firstly, we allocate 1 page from this memory cgroup and swap this page to swap disk. We can see: memory.current: 0 memory.swap.current: 1 Then we touch this page, we will swap in and charge the swap cache page to the memory counter and uncharge the swap counter. memory.current: 1 memory.swap.current: 0 (but actually we use a swap entry) Then we allocate another 1 page from this memory cgroup. memory.current: 2 memory.swap.current: 0 (but actually we use a swap entry) If we swap those 2 pages to swap disk. We can charge and swap those 2 pages successfully. Right? Maybe I am wrong. > > One impact I can see is that without this patch meminfo's (SwapTotal - > SwapFree) is larger than the sum of top level memory.swap.current. > This change will reduce that gap. > > BTW what about per-cpu slots_ret cache? Should we call > mem_cgroup_uncharge_swap() before putting in the cache after this > change? > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > > /* > > * The swap entry might not get freed for a long time, > > -- > > 2.11.0 > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-13 6:48 ` Muchun Song (?) @ 2021-02-16 17:16 ` Shakeel Butt -1 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:16 UTC (permalink / raw) To: Muchun Song Cc: Huang Ying, Tim Chen, Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Fri, Feb 12, 2021 at 10:48 PM Muchun Song <songmuchun@bytedance.com> wrote: > > On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt <shakeelb@google.com> wrote: > > > > CCing more folks. > > > > On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > > > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > What's the user visible impact of this change? > > IIUC, I think that we cannot limit the swap to memory.swap.max > on cgroup v2. > > cd /sys/fs/cgroup/ > mkdir test > cd test > echo 8192 > memory.max > echo 4096 > memory.swap.max > > OK. Now we limit swap to 1 page and memory to 2 pages. > Firstly, we allocate 1 page from this memory cgroup and > swap this page to swap disk. We can see: > > memory.current: 0 > memory.swap.current: 1 > > Then we touch this page, we will swap in and charge > the swap cache page to the memory counter and uncharge > the swap counter. > > memory.current: 1 > memory.swap.current: 0 (but actually we use a swap entry) > > Then we allocate another 1 page from this memory cgroup. > > memory.current: 2 > memory.swap.current: 0 (but actually we use a swap entry) > > If we swap those 2 pages to swap disk. We can charge and swap > those 2 pages successfully. Right? Maybe I am wrong. > I was trying to repro this but couldn't and later remembered that swap on zram skips the swapcache and thus is not impacted by this issue. This is reproducible on swap on disk and I see Johannes has already described in good detail. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:16 ` Shakeel Butt 0 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:16 UTC (permalink / raw) To: Muchun Song Cc: Huang Ying, Tim Chen, Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Fri, Feb 12, 2021 at 10:48 PM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote: > > On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > > > > CCing more folks. > > > > On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> wrote: > > > > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > > > What's the user visible impact of this change? > > IIUC, I think that we cannot limit the swap to memory.swap.max > on cgroup v2. > > cd /sys/fs/cgroup/ > mkdir test > cd test > echo 8192 > memory.max > echo 4096 > memory.swap.max > > OK. Now we limit swap to 1 page and memory to 2 pages. > Firstly, we allocate 1 page from this memory cgroup and > swap this page to swap disk. We can see: > > memory.current: 0 > memory.swap.current: 1 > > Then we touch this page, we will swap in and charge > the swap cache page to the memory counter and uncharge > the swap counter. > > memory.current: 1 > memory.swap.current: 0 (but actually we use a swap entry) > > Then we allocate another 1 page from this memory cgroup. > > memory.current: 2 > memory.swap.current: 0 (but actually we use a swap entry) > > If we swap those 2 pages to swap disk. We can charge and swap > those 2 pages successfully. Right? Maybe I am wrong. > I was trying to repro this but couldn't and later remembered that swap on zram skips the swapcache and thus is not impacted by this issue. This is reproducible on swap on disk and I see Johannes has already described in good detail. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:16 ` Shakeel Butt 0 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:16 UTC (permalink / raw) To: Muchun Song Cc: Huang Ying, Tim Chen, Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Fri, Feb 12, 2021 at 10:48 PM Muchun Song <songmuchun@bytedance.com> wrote: > > On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt <shakeelb@google.com> wrote: > > > > CCing more folks. > > > > On Fri, Feb 12, 2021 at 9:14 AM Muchun Song <songmuchun@bytedance.com> wrote: > > > > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > What's the user visible impact of this change? > > IIUC, I think that we cannot limit the swap to memory.swap.max > on cgroup v2. > > cd /sys/fs/cgroup/ > mkdir test > cd test > echo 8192 > memory.max > echo 4096 > memory.swap.max > > OK. Now we limit swap to 1 page and memory to 2 pages. > Firstly, we allocate 1 page from this memory cgroup and > swap this page to swap disk. We can see: > > memory.current: 0 > memory.swap.current: 1 > > Then we touch this page, we will swap in and charge > the swap cache page to the memory counter and uncharge > the swap counter. > > memory.current: 1 > memory.swap.current: 0 (but actually we use a swap entry) > > Then we allocate another 1 page from this memory cgroup. > > memory.current: 2 > memory.swap.current: 0 (but actually we use a swap entry) > > If we swap those 2 pages to swap disk. We can charge and swap > those 2 pages successfully. Right? Maybe I am wrong. > I was trying to repro this but couldn't and later remembered that swap on zram skips the swapcache and thus is not impacted by this issue. This is reproducible on swap on disk and I see Johannes has already described in good detail. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 9:47 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:47 UTC (permalink / raw) To: Muchun Song; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Sat 13-02-21 01:01:59, Muchun Song wrote: > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. Is there any actual problem though? Can you describe the specific scenario please? Swap cache charge life time is a bit tricky and I have to confess I have to relearn it every time I need to understand it. The patch would be much more easier to review if the changelog was much more specific. > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > swp_entry_t entry = { .val = page_private(page) }; > /* > * The swap entry might not get freed for a long time, > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 9:47 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:47 UTC (permalink / raw) To: Muchun Song Cc: hannes-druUgvl0LCNAfugRpC6u6w, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Sat 13-02-21 01:01:59, Muchun Song wrote: > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. Is there any actual problem though? Can you describe the specific scenario please? Swap cache charge life time is a bit tricky and I have to confess I have to relearn it every time I need to understand it. The patch would be much more easier to review if the changelog was much more specific. > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > swp_entry_t entry = { .val = page_private(page) }; > /* > * The swap entry might not get freed for a long time, > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-15 9:47 ` Michal Hocko (?) @ 2021-02-15 10:15 ` Muchun Song -1 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:15 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:47 PM Michal Hocko <mhocko@suse.com> wrote: > > On Sat 13-02-21 01:01:59, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > Is there any actual problem though? Can you describe the specific > scenario please? Swap cache charge life time is a bit tricky and I have > to confess I have to relearn it every time I need to understand it. The > patch would be much more easier to review if the changelog was much more > specific. I copied the reply to Shakeel here. :-) IIUC, I think that we cannot limit the swap to memory.swap.max on cgroup v2. cd /sys/fs/cgroup/ mkdir test cd test echo 8192 > memory.max echo 4096 > memory.swap.max OK. Now we limit swap to 1 page and memory to 2 pages. Firstly, we allocate 1 page from this memory cgroup and swap this page to swap disk. We can see: memory.current: 0 memory.swap.current: 1 Then we touch this page, we will swap in and charge the swap cache page to the memory counter and uncharge the swap counter. memory.current: 1 memory.swap.current: 0 (but actually we use a swap entry) Then we allocate another 1 page from this memory cgroup. memory.current: 2 memory.swap.current: 0 (but actually we use a swap entry) If we swap those 2 pages to swap disk. We can charge and swap those 2 pages successfully. Right? Maybe I am wrong. > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > > /* > > * The swap entry might not get freed for a long time, > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 10:15 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:15 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:47 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > On Sat 13-02-21 01:01:59, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > Is there any actual problem though? Can you describe the specific > scenario please? Swap cache charge life time is a bit tricky and I have > to confess I have to relearn it every time I need to understand it. The > patch would be much more easier to review if the changelog was much more > specific. I copied the reply to Shakeel here. :-) IIUC, I think that we cannot limit the swap to memory.swap.max on cgroup v2. cd /sys/fs/cgroup/ mkdir test cd test echo 8192 > memory.max echo 4096 > memory.swap.max OK. Now we limit swap to 1 page and memory to 2 pages. Firstly, we allocate 1 page from this memory cgroup and swap this page to swap disk. We can see: memory.current: 0 memory.swap.current: 1 Then we touch this page, we will swap in and charge the swap cache page to the memory counter and uncharge the swap counter. memory.current: 1 memory.swap.current: 0 (but actually we use a swap entry) Then we allocate another 1 page from this memory cgroup. memory.current: 2 memory.swap.current: 0 (but actually we use a swap entry) If we swap those 2 pages to swap disk. We can charge and swap those 2 pages successfully. Right? Maybe I am wrong. > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > > /* > > * The swap entry might not get freed for a long time, > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 10:15 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-15 10:15 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon, Feb 15, 2021 at 5:47 PM Michal Hocko <mhocko@suse.com> wrote: > > On Sat 13-02-21 01:01:59, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > Is there any actual problem though? Can you describe the specific > scenario please? Swap cache charge life time is a bit tricky and I have > to confess I have to relearn it every time I need to understand it. The > patch would be much more easier to review if the changelog was much more > specific. I copied the reply to Shakeel here. :-) IIUC, I think that we cannot limit the swap to memory.swap.max on cgroup v2. cd /sys/fs/cgroup/ mkdir test cd test echo 8192 > memory.max echo 4096 > memory.swap.max OK. Now we limit swap to 1 page and memory to 2 pages. Firstly, we allocate 1 page from this memory cgroup and swap this page to swap disk. We can see: memory.current: 0 memory.swap.current: 1 Then we touch this page, we will swap in and charge the swap cache page to the memory counter and uncharge the swap counter. memory.current: 1 memory.swap.current: 0 (but actually we use a swap entry) Then we allocate another 1 page from this memory cgroup. memory.current: 2 memory.swap.current: 0 (but actually we use a swap entry) If we swap those 2 pages to swap disk. We can charge and swap those 2 pages successfully. Right? Maybe I am wrong. > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > swp_entry_t entry = { .val = page_private(page) }; > > /* > > * The swap entry might not get freed for a long time, > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 10:24 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 10:24 UTC (permalink / raw) To: Muchun Song Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon 15-02-21 18:15:36, Muchun Song wrote: > On Mon, Feb 15, 2021 at 5:47 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Sat 13-02-21 01:01:59, Muchun Song wrote: > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > Is there any actual problem though? Can you describe the specific > > scenario please? Swap cache charge life time is a bit tricky and I have > > to confess I have to relearn it every time I need to understand it. The > > patch would be much more easier to review if the changelog was much more > > specific. > > I copied the reply to Shakeel here. :-) Not helpful much. I have seen that reply before answering your patch. I really have to say I find it rather annoying to constantly get unclear and cryptic responses from you. This is a subtle area! Make your claims clear and specific so they can be verified. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-15 10:24 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 10:24 UTC (permalink / raw) To: Muchun Song Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Mon 15-02-21 18:15:36, Muchun Song wrote: > On Mon, Feb 15, 2021 at 5:47 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote: > > > > On Sat 13-02-21 01:01:59, Muchun Song wrote: > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > Is there any actual problem though? Can you describe the specific > > scenario please? Swap cache charge life time is a bit tricky and I have > > to confess I have to relearn it every time I need to understand it. The > > patch would be much more easier to review if the changelog was much more > > specific. > > I copied the reply to Shakeel here. :-) Not helpful much. I have seen that reply before answering your patch. I really have to say I find it rather annoying to constantly get unclear and cryptic responses from you. This is a subtle area! Make your claims clear and specific so they can be verified. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-12 17:01 ` Muchun Song ` (2 preceding siblings ...) (?) @ 2021-02-16 16:59 ` Johannes Weiner 2021-02-16 17:17 ` Shakeel Butt ` (3 more replies) -1 siblings, 4 replies; 50+ messages in thread From: Johannes Weiner @ 2021-02-16 16:59 UTC (permalink / raw) To: Muchun Song; +Cc: mhocko, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel Hello Muchun, On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > The swap charges the actual number of swap entries on cgroup v2. > If a swap cache page is charged successful, and then we uncharge > the swap counter. It is wrong on cgroup v2. Because the swap > entry is not freed. The patch makes sense to me. But this code is a bit tricky, we should add more documentation to how it works and what the problem is. How about this for the changelog? --- mm: memcontrol: fix swap undercounting for shared pages in cgroup2 When shared pages are swapped in partially, we can have some page tables referencing the in-memory page and some referencing the swap slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts consumed swap when shared pages are partially swapped back in. This in turn allows a cgroup to consume more swap than its configured limit intends. Add the do_memsw_account() check back to fix this problem. --- > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c737c8f05992..be6bc5044150 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > memcg_check_events(memcg, page); > local_irq_enable(); > > - if (PageSwapCache(page)) { > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { It's more descriptive to use do_memsw_account() here, IMO. We should also add a comment. How about this above the branch? /* * Cgroup1's unified memory+swap counter has been charged with the * new swapcache page, finish the transfer by uncharging the swap * slot. The swap slot would also get uncharged when it dies, but * for shared pages it can stick around indefinitely and we'd count * the page twice the entire time. * * Cgroup2 has separate resource counters for memory and swap, * so this is a non-issue here. Memory and swap charge lifetimes * correspond 1:1 to page and swap slot lifetimes: we charge the * page to memory here, and uncharge swap when the slot is freed. */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-16 16:59 ` Johannes Weiner 2021-02-16 17:17 ` Shakeel Butt @ 2021-02-16 17:17 ` Shakeel Butt 2021-02-16 17:28 ` Johannes Weiner 2021-02-17 5:11 ` Muchun Song 3 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:17 UTC (permalink / raw) To: Johannes Weiner Cc: Muchun Song, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Tue, Feb 16, 2021 at 9:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> With Johannes's suggestions: Reviewed-by: Shakeel Butt <shakeelb@google.com> > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:17 ` Shakeel Butt 0 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:17 UTC (permalink / raw) To: Johannes Weiner Cc: Muchun Song, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Tue, Feb 16, 2021 at 9:05 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> With Johannes's suggestions: Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:17 ` Shakeel Butt 0 siblings, 0 replies; 50+ messages in thread From: Shakeel Butt @ 2021-02-16 17:17 UTC (permalink / raw) To: Johannes Weiner Cc: Muchun Song, Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML On Tue, Feb 16, 2021 at 9:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> With Johannes's suggestions: Reviewed-by: Shakeel Butt <shakeelb@google.com> > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:19 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-16 17:19 UTC (permalink / raw) To: Johannes Weiner Cc: Muchun Song, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Tue 16-02-21 11:59:00, Johannes Weiner wrote: > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. Yes this clarfies both the issue and the subtlety of the accounting. Thanks a lot Johannes! This is a great example of how changelogs should really look. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ Yes very helpful. With the changelog update and the comment Acked-by: Michal Hocko <mhocko@suse.com> Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:19 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-16 17:19 UTC (permalink / raw) To: Johannes Weiner Cc: Muchun Song, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue 16-02-21 11:59:00, Johannes Weiner wrote: > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. Yes this clarfies both the issue and the subtlety of the accounting. Thanks a lot Johannes! This is a great example of how changelogs should really look. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ Yes very helpful. With the changelog update and the comment Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:28 ` Johannes Weiner 0 siblings, 0 replies; 50+ messages in thread From: Johannes Weiner @ 2021-02-16 17:28 UTC (permalink / raw) To: Muchun Song; +Cc: mhocko, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Tue, Feb 16, 2021 at 11:59:01AM -0500, Johannes Weiner wrote: > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 Coming to think of it, this isn't just for shared pages. We also hold on to the swap slot as long as the page is read-only, not mlocked, and swap isn't full. So scratch "for shared pages" here. > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: Correction: When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. Correction: As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. > Add the do_memsw_account() check back to fix this problem. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> I also think we should tag stable on this one. The potential accounting error is quite large and, even without concrete examples, is likely to cause problems for swap management in the real world. Cc: stable@vger.kernel.org # 5.8+ > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. -for shared pages > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-16 17:28 ` Johannes Weiner 0 siblings, 0 replies; 50+ messages in thread From: Johannes Weiner @ 2021-02-16 17:28 UTC (permalink / raw) To: Muchun Song Cc: mhocko-DgEjT+Ai2ygdnm+yROfE0A, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue, Feb 16, 2021 at 11:59:01AM -0500, Johannes Weiner wrote: > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 Coming to think of it, this isn't just for shared pages. We also hold on to the swap slot as long as the page is read-only, not mlocked, and swap isn't full. So scratch "for shared pages" here. > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: Correction: When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. Correction: As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. > Add the do_memsw_account() check back to fix this problem. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> I also think we should tag stable on this one. The potential accounting error is quite large and, even without concrete examples, is likely to cause problems for swap management in the real world. Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org # 5.8+ > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. -for shared pages > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-16 17:28 ` Johannes Weiner (?) @ 2021-02-17 5:15 ` Muchun Song -1 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:15 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 1:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Feb 16, 2021 at 11:59:01AM -0500, Johannes Weiner wrote: > > Hello Muchun, > > > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > The patch makes sense to me. But this code is a bit tricky, we should > > add more documentation to how it works and what the problem is. > > > > How about this for the changelog? > > > > --- > > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > Coming to think of it, this isn't just for shared pages. We also hold > on to the swap slot as long as the page is read-only, not mlocked, and > swap isn't full. So scratch "for shared pages" here. OK. Will do. > > > When shared pages are swapped in partially, we can have some page > > tables referencing the in-memory page and some referencing the swap > > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > > differently due to the nature of how they account memory and swap: > > Correction: > > When pages are swapped in, the VM may retain the swap copy to avoid > repeated writes in the future. It's also retained if shared pages are > faulted back in some processes, but not in others. During that time we > have an in-memory copy of the page, as well as an on-swap > copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: Thanks a lot. > > > Cgroup1 has a unified memory+swap counter that tracks a data page > > regardless whether it's in-core or swapped out. On swapin, we transfer > > the charge from the swap entry to the newly allocated swapcache page, > > even though the swap entry might stick around for a while. That's why > > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > > > Cgroup2 tracks memory and swap as separate, independent resources and > > thus has split memory and swap counters. On swapin, we charge the > > newly allocated swapcache page as memory, while the swap slot in turn > > must remain charged to the swap counter as long as its allocated too. > > > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > > make swap tracking an integral part of memory control"), because it > > accidentally removed the do_memsw_account() check in the branch inside > > mem_cgroup_uncharge() that was supposed to tell the difference between > > the charge transfer in cgroup1 and the separate counters in cgroup2. > > > > As a result, cgroup2 currently undercounts consumed swap when shared > > pages are partially swapped back in. This in turn allows a cgroup to > > consume more swap than its configured limit intends. > > Correction: > > As a result, cgroup2 currently undercounts retained swap to varying > degrees: swap slots are cached up to 50% of the configured limit or > total available swap space; partially faulted back shared pages are > only limited by physical capacity. This in turn allows cgroups to > significantly overconsume their alloted swap space. Thanks a lot. > > > Add the do_memsw_account() check back to fix this problem. > > --- > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > I also think we should tag stable on this one. The potential > accounting error is quite large and, even without concrete examples, > is likely to cause problems for swap management in the real world. > > Cc: stable@vger.kernel.org # 5.8+ OK. I will add a Cc stable tag. Thanks. > > > > mm/memcontrol.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index c737c8f05992..be6bc5044150 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > > memcg_check_events(memcg, page); > > > local_irq_enable(); > > > > > > - if (PageSwapCache(page)) { > > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > > > It's more descriptive to use do_memsw_account() here, IMO. > > > > We should also add a comment. How about this above the branch? > > > > /* > > * Cgroup1's unified memory+swap counter has been charged with the > > * new swapcache page, finish the transfer by uncharging the swap > > * slot. The swap slot would also get uncharged when it dies, but > > * for shared pages it can stick around indefinitely and we'd count > > * the page twice the entire time. > > -for shared pages > > > * Cgroup2 has separate resource counters for memory and swap, > > * so this is a non-issue here. Memory and swap charge lifetimes > > * correspond 1:1 to page and swap slot lifetimes: we charge the > > * page to memory here, and uncharge swap when the slot is freed. > > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-17 5:15 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:15 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 1:28 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote: > > On Tue, Feb 16, 2021 at 11:59:01AM -0500, Johannes Weiner wrote: > > Hello Muchun, > > > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > The patch makes sense to me. But this code is a bit tricky, we should > > add more documentation to how it works and what the problem is. > > > > How about this for the changelog? > > > > --- > > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > Coming to think of it, this isn't just for shared pages. We also hold > on to the swap slot as long as the page is read-only, not mlocked, and > swap isn't full. So scratch "for shared pages" here. OK. Will do. > > > When shared pages are swapped in partially, we can have some page > > tables referencing the in-memory page and some referencing the swap > > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > > differently due to the nature of how they account memory and swap: > > Correction: > > When pages are swapped in, the VM may retain the swap copy to avoid > repeated writes in the future. It's also retained if shared pages are > faulted back in some processes, but not in others. During that time we > have an in-memory copy of the page, as well as an on-swap > copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: Thanks a lot. > > > Cgroup1 has a unified memory+swap counter that tracks a data page > > regardless whether it's in-core or swapped out. On swapin, we transfer > > the charge from the swap entry to the newly allocated swapcache page, > > even though the swap entry might stick around for a while. That's why > > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > > > Cgroup2 tracks memory and swap as separate, independent resources and > > thus has split memory and swap counters. On swapin, we charge the > > newly allocated swapcache page as memory, while the swap slot in turn > > must remain charged to the swap counter as long as its allocated too. > > > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > > make swap tracking an integral part of memory control"), because it > > accidentally removed the do_memsw_account() check in the branch inside > > mem_cgroup_uncharge() that was supposed to tell the difference between > > the charge transfer in cgroup1 and the separate counters in cgroup2. > > > > As a result, cgroup2 currently undercounts consumed swap when shared > > pages are partially swapped back in. This in turn allows a cgroup to > > consume more swap than its configured limit intends. > > Correction: > > As a result, cgroup2 currently undercounts retained swap to varying > degrees: swap slots are cached up to 50% of the configured limit or > total available swap space; partially faulted back shared pages are > only limited by physical capacity. This in turn allows cgroups to > significantly overconsume their alloted swap space. Thanks a lot. > > > Add the do_memsw_account() check back to fix this problem. > > --- > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > > > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > > I also think we should tag stable on this one. The potential > accounting error is quite large and, even without concrete examples, > is likely to cause problems for swap management in the real world. > > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org # 5.8+ OK. I will add a Cc stable tag. Thanks. > > > > mm/memcontrol.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index c737c8f05992..be6bc5044150 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > > memcg_check_events(memcg, page); > > > local_irq_enable(); > > > > > > - if (PageSwapCache(page)) { > > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > > > It's more descriptive to use do_memsw_account() here, IMO. > > > > We should also add a comment. How about this above the branch? > > > > /* > > * Cgroup1's unified memory+swap counter has been charged with the > > * new swapcache page, finish the transfer by uncharging the swap > > * slot. The swap slot would also get uncharged when it dies, but > > * for shared pages it can stick around indefinitely and we'd count > > * the page twice the entire time. > > -for shared pages > > > * Cgroup2 has separate resource counters for memory and swap, > > * so this is a non-issue here. Memory and swap charge lifetimes > > * correspond 1:1 to page and swap slot lifetimes: we charge the > > * page to memory here, and uncharge swap when the slot is freed. > > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-17 5:15 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:15 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 1:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Feb 16, 2021 at 11:59:01AM -0500, Johannes Weiner wrote: > > Hello Muchun, > > > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > > The swap charges the actual number of swap entries on cgroup v2. > > > If a swap cache page is charged successful, and then we uncharge > > > the swap counter. It is wrong on cgroup v2. Because the swap > > > entry is not freed. > > > > The patch makes sense to me. But this code is a bit tricky, we should > > add more documentation to how it works and what the problem is. > > > > How about this for the changelog? > > > > --- > > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > Coming to think of it, this isn't just for shared pages. We also hold > on to the swap slot as long as the page is read-only, not mlocked, and > swap isn't full. So scratch "for shared pages" here. OK. Will do. > > > When shared pages are swapped in partially, we can have some page > > tables referencing the in-memory page and some referencing the swap > > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > > differently due to the nature of how they account memory and swap: > > Correction: > > When pages are swapped in, the VM may retain the swap copy to avoid > repeated writes in the future. It's also retained if shared pages are > faulted back in some processes, but not in others. During that time we > have an in-memory copy of the page, as well as an on-swap > copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: Thanks a lot. > > > Cgroup1 has a unified memory+swap counter that tracks a data page > > regardless whether it's in-core or swapped out. On swapin, we transfer > > the charge from the swap entry to the newly allocated swapcache page, > > even though the swap entry might stick around for a while. That's why > > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > > > Cgroup2 tracks memory and swap as separate, independent resources and > > thus has split memory and swap counters. On swapin, we charge the > > newly allocated swapcache page as memory, while the swap slot in turn > > must remain charged to the swap counter as long as its allocated too. > > > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > > make swap tracking an integral part of memory control"), because it > > accidentally removed the do_memsw_account() check in the branch inside > > mem_cgroup_uncharge() that was supposed to tell the difference between > > the charge transfer in cgroup1 and the separate counters in cgroup2. > > > > As a result, cgroup2 currently undercounts consumed swap when shared > > pages are partially swapped back in. This in turn allows a cgroup to > > consume more swap than its configured limit intends. > > Correction: > > As a result, cgroup2 currently undercounts retained swap to varying > degrees: swap slots are cached up to 50% of the configured limit or > total available swap space; partially faulted back shared pages are > only limited by physical capacity. This in turn allows cgroups to > significantly overconsume their alloted swap space. Thanks a lot. > > > Add the do_memsw_account() check back to fix this problem. > > --- > > > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > I also think we should tag stable on this one. The potential > accounting error is quite large and, even without concrete examples, > is likely to cause problems for swap management in the real world. > > Cc: stable@vger.kernel.org # 5.8+ OK. I will add a Cc stable tag. Thanks. > > > > mm/memcontrol.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index c737c8f05992..be6bc5044150 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > > memcg_check_events(memcg, page); > > > local_irq_enable(); > > > > > > - if (PageSwapCache(page)) { > > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > > > It's more descriptive to use do_memsw_account() here, IMO. > > > > We should also add a comment. How about this above the branch? > > > > /* > > * Cgroup1's unified memory+swap counter has been charged with the > > * new swapcache page, finish the transfer by uncharging the swap > > * slot. The swap slot would also get uncharged when it dies, but > > * for shared pages it can stick around indefinitely and we'd count > > * the page twice the entire time. > > -for shared pages > > > * Cgroup2 has separate resource counters for memory and swap, > > * so this is a non-issue here. Memory and swap charge lifetimes > > * correspond 1:1 to page and swap slot lifetimes: we charge the > > * page to memory here, and uncharge swap when the slot is freed. > > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 2021-02-16 16:59 ` Johannes Weiner 2021-02-16 17:17 ` Shakeel Butt @ 2021-02-17 5:11 ` Muchun Song 2021-02-16 17:28 ` Johannes Weiner 2021-02-17 5:11 ` Muchun Song 3 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:11 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 12:59 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. Thanks. Sorry for my poor English. I should improve my changelog. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> Thanks. > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? I will add a comment here. Thanks. > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-17 5:11 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:11 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 12:59 AM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. Thanks. Sorry for my poor English. I should improve my changelog. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > > Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> Thanks. > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? I will add a comment here. Thanks. > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 @ 2021-02-17 5:11 ` Muchun Song 0 siblings, 0 replies; 50+ messages in thread From: Muchun Song @ 2021-02-17 5:11 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Vladimir Davydov, Andrew Morton, Cgroups, Linux Memory Management List, LKML On Wed, Feb 17, 2021 at 12:59 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hello Muchun, > > On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote: > > The swap charges the actual number of swap entries on cgroup v2. > > If a swap cache page is charged successful, and then we uncharge > > the swap counter. It is wrong on cgroup v2. Because the swap > > entry is not freed. > > The patch makes sense to me. But this code is a bit tricky, we should > add more documentation to how it works and what the problem is. > > How about this for the changelog? > > --- > mm: memcontrol: fix swap undercounting for shared pages in cgroup2 > > When shared pages are swapped in partially, we can have some page > tables referencing the in-memory page and some referencing the swap > slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly > differently due to the nature of how they account memory and swap: > > Cgroup1 has a unified memory+swap counter that tracks a data page > regardless whether it's in-core or swapped out. On swapin, we transfer > the charge from the swap entry to the newly allocated swapcache page, > even though the swap entry might stick around for a while. That's why > we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). > > Cgroup2 tracks memory and swap as separate, independent resources and > thus has split memory and swap counters. On swapin, we charge the > newly allocated swapcache page as memory, while the swap slot in turn > must remain charged to the swap counter as long as its allocated too. > > The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: > make swap tracking an integral part of memory control"), because it > accidentally removed the do_memsw_account() check in the branch inside > mem_cgroup_uncharge() that was supposed to tell the difference between > the charge transfer in cgroup1 and the separate counters in cgroup2. > > As a result, cgroup2 currently undercounts consumed swap when shared > pages are partially swapped back in. This in turn allows a cgroup to > consume more swap than its configured limit intends. > > Add the do_memsw_account() check back to fix this problem. Thanks. Sorry for my poor English. I should improve my changelog. > --- > > > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> Thanks. > > > --- > > mm/memcontrol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c737c8f05992..be6bc5044150 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) > > memcg_check_events(memcg, page); > > local_irq_enable(); > > > > - if (PageSwapCache(page)) { > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) { > > It's more descriptive to use do_memsw_account() here, IMO. > > We should also add a comment. How about this above the branch? I will add a comment here. Thanks. > > /* > * Cgroup1's unified memory+swap counter has been charged with the > * new swapcache page, finish the transfer by uncharging the swap > * slot. The swap slot would also get uncharged when it dies, but > * for shared pages it can stick around indefinitely and we'd count > * the page twice the entire time. > * > * Cgroup2 has separate resource counters for memory and swap, > * so this is a non-issue here. Memory and swap charge lifetimes > * correspond 1:1 to page and swap slot lifetimes: we charge the > * page to memory here, and uncharge swap when the slot is freed. > */ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover @ 2021-02-15 9:24 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:24 UTC (permalink / raw) To: Muchun Song; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel On Sat 13-02-21 01:01:56, Muchun Song wrote: > The memcg_oom_recover() almost never do anything but the test (because > oom_disabled is a rarely used) is just waste of cycles in some hot > paths (e.g. kmem uncharge). And it is very small, so it is better to > make it inline. Also, the parameter of memcg cannot be NULL, so removing > the check can reduce useless check. You probably wanted to make this patch follow the second one in the series. As there is no oom recover form the kmem uncharge path now. Also I believe that I've asked you to split the memcg check to its separate patch. Regarding the inlining, I would add it along with a static key check in memcg_oom_recover. > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8c035846c7a4..7afca9677693 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1925,7 +1925,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, > return autoremove_wake_function(wait, mode, sync, arg); > } > > -static void memcg_oom_recover(struct mem_cgroup *memcg) > +static inline void memcg_oom_recover(struct mem_cgroup *memcg) > { > /* > * For the following lockless ->under_oom test, the only required > @@ -1935,7 +1935,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) > * achieved by invoking mem_cgroup_mark_under_oom() before > * triggering notification. > */ > - if (memcg && memcg->under_oom) > + if (memcg->under_oom) > __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); > } > > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover @ 2021-02-15 9:24 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2021-02-15 9:24 UTC (permalink / raw) To: Muchun Song Cc: hannes-druUgvl0LCNAfugRpC6u6w, vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Sat 13-02-21 01:01:56, Muchun Song wrote: > The memcg_oom_recover() almost never do anything but the test (because > oom_disabled is a rarely used) is just waste of cycles in some hot > paths (e.g. kmem uncharge). And it is very small, so it is better to > make it inline. Also, the parameter of memcg cannot be NULL, so removing > the check can reduce useless check. You probably wanted to make this patch follow the second one in the series. As there is no oom recover form the kmem uncharge path now. Also I believe that I've asked you to split the memcg check to its separate patch. Regarding the inlining, I would add it along with a static key check in memcg_oom_recover. > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> > --- > mm/memcontrol.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8c035846c7a4..7afca9677693 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1925,7 +1925,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, > return autoremove_wake_function(wait, mode, sync, arg); > } > > -static void memcg_oom_recover(struct mem_cgroup *memcg) > +static inline void memcg_oom_recover(struct mem_cgroup *memcg) > { > /* > * For the following lockless ->under_oom test, the only required > @@ -1935,7 +1935,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) > * achieved by invoking mem_cgroup_mark_under_oom() before > * triggering notification. > */ > - if (memcg && memcg->under_oom) > + if (memcg->under_oom) > __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); > } > > -- > 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2021-02-17 5:17 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-12 17:01 [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover Muchun Song 2021-02-12 17:01 ` Muchun Song 2021-02-12 17:01 ` [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page Muchun Song 2021-02-12 17:01 ` Muchun Song 2021-02-15 9:37 ` Michal Hocko 2021-02-15 9:37 ` Michal Hocko 2021-02-12 17:01 ` [PATCH 3/4] mm: memcontrol: bail out early when id is zero Muchun Song 2021-02-12 17:01 ` Muchun Song 2021-02-15 9:39 ` Michal Hocko 2021-02-15 9:39 ` Michal Hocko 2021-02-15 10:09 ` [External] " Muchun Song 2021-02-15 10:09 ` Muchun Song 2021-02-15 10:09 ` Muchun Song 2021-02-15 10:27 ` Michal Hocko 2021-02-15 10:27 ` Michal Hocko 2021-02-15 11:34 ` Muchun Song 2021-02-15 11:34 ` Muchun Song 2021-02-15 11:34 ` Muchun Song 2021-02-12 17:01 ` [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 Muchun Song 2021-02-12 17:01 ` Muchun Song 2021-02-12 18:56 ` Shakeel Butt 2021-02-12 18:56 ` Shakeel Butt 2021-02-13 6:48 ` [External] " Muchun Song 2021-02-13 6:48 ` Muchun Song 2021-02-16 17:16 ` Shakeel Butt 2021-02-16 17:16 ` Shakeel Butt 2021-02-16 17:16 ` Shakeel Butt 2021-02-15 9:47 ` Michal Hocko 2021-02-15 9:47 ` Michal Hocko 2021-02-15 10:15 ` [External] " Muchun Song 2021-02-15 10:15 ` Muchun Song 2021-02-15 10:15 ` Muchun Song 2021-02-15 10:24 ` Michal Hocko 2021-02-15 10:24 ` Michal Hocko 2021-02-16 16:59 ` Johannes Weiner 2021-02-16 17:17 ` Shakeel Butt 2021-02-16 17:17 ` Shakeel Butt 2021-02-16 17:17 ` Shakeel Butt 2021-02-16 17:19 ` Michal Hocko 2021-02-16 17:19 ` Michal Hocko 2021-02-16 17:28 ` Johannes Weiner 2021-02-16 17:28 ` Johannes Weiner 2021-02-17 5:15 ` [External] " Muchun Song 2021-02-17 5:15 ` Muchun Song 2021-02-17 5:15 ` Muchun Song 2021-02-17 5:11 ` Muchun Song 2021-02-17 5:11 ` Muchun Song 2021-02-17 5:11 ` Muchun Song 2021-02-15 9:24 ` [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover Michal Hocko 2021-02-15 9:24 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.