All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-04-21  6:26 Muchun Song
  2021-04-21  7:34 ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Muchun Song @ 2021-04-21  6:26 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, fam.zheng, Muchun Song

The below scenario can cause the page counters of the root_mem_cgroup
to be out of balance.

CPU0:                                   CPU1:

objcg = get_obj_cgroup_from_current()
obj_cgroup_charge_pages(objcg)
                                        memcg_reparent_objcgs()
                                            // reparent to root_mem_cgroup
                                            WRITE_ONCE(iter->memcg, parent)
    // memcg == root_mem_cgroup
    memcg = get_mem_cgroup_from_objcg(objcg)
    // do not charge to the root_mem_cgroup
    try_charge(memcg)

obj_cgroup_uncharge_pages(objcg)
    memcg = get_mem_cgroup_from_objcg(objcg)
    // uncharge from the root_mem_cgroup
    page_counter_uncharge(&memcg->memory)

This can cause the page counter to be less than the actual value,
Although we do not display the value (mem_cgroup_usage) so there
shouldn't be any actual problem, but there is a WARN_ON_ONCE in
the page_counter_cancel(). Who knows if it will trigger? So it
is better to fix it.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/memcontrol.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1e68a9992b01..81b54bd9b9e0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2686,8 +2686,8 @@ void mem_cgroup_handle_over_high(void)
 	css_put(&memcg->css);
 }
 
-static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
-		      unsigned int nr_pages)
+static int __try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
+			unsigned int nr_pages)
 {
 	unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
 	int nr_retries = MAX_RECLAIM_RETRIES;
@@ -2699,8 +2699,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	bool drained = false;
 	unsigned long pflags;
 
-	if (mem_cgroup_is_root(memcg))
-		return 0;
 retry:
 	if (consume_stock(memcg, nr_pages))
 		return 0;
@@ -2880,6 +2878,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	return 0;
 }
 
+static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
+			     unsigned int nr_pages)
+{
+	if (mem_cgroup_is_root(memcg))
+		return 0;
+
+	return __try_charge(memcg, gfp_mask, nr_pages);
+}
+
 #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
 static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
@@ -3125,7 +3132,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
 
 	memcg = get_mem_cgroup_from_objcg(objcg);
 
-	ret = try_charge(memcg, gfp, nr_pages);
+	ret = __try_charge(memcg, gfp, nr_pages);
 	if (ret)
 		goto out;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-21  6:26 [PATCH] mm: memcontrol: fix root_mem_cgroup charging Muchun Song
@ 2021-04-21  7:34 ` Michal Hocko
  2021-04-21  9:50     ` Muchun Song
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2021-04-21  7:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, hannes, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun, fam.zheng

On Wed 21-04-21 14:26:44, Muchun Song wrote:
> The below scenario can cause the page counters of the root_mem_cgroup
> to be out of balance.
> 
> CPU0:                                   CPU1:
> 
> objcg = get_obj_cgroup_from_current()
> obj_cgroup_charge_pages(objcg)
>                                         memcg_reparent_objcgs()
>                                             // reparent to root_mem_cgroup
>                                             WRITE_ONCE(iter->memcg, parent)
>     // memcg == root_mem_cgroup
>     memcg = get_mem_cgroup_from_objcg(objcg)
>     // do not charge to the root_mem_cgroup
>     try_charge(memcg)
> 
> obj_cgroup_uncharge_pages(objcg)
>     memcg = get_mem_cgroup_from_objcg(objcg)
>     // uncharge from the root_mem_cgroup
>     page_counter_uncharge(&memcg->memory)
> 
> This can cause the page counter to be less than the actual value,
> Although we do not display the value (mem_cgroup_usage) so there
> shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> the page_counter_cancel(). Who knows if it will trigger? So it
> is better to fix it.

The changelog doesn't explain the fix and why you have chosen to charge
kmem objects to root memcg and left all other try_charge users intact.
The reason is likely that those are not reparented now but that just
adds an inconsistency.

Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
to check for the root memcg and bail out early?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/memcontrol.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1e68a9992b01..81b54bd9b9e0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2686,8 +2686,8 @@ void mem_cgroup_handle_over_high(void)
>  	css_put(&memcg->css);
>  }
>  
> -static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> -		      unsigned int nr_pages)
> +static int __try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> +			unsigned int nr_pages)
>  {
>  	unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
>  	int nr_retries = MAX_RECLAIM_RETRIES;
> @@ -2699,8 +2699,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	bool drained = false;
>  	unsigned long pflags;
>  
> -	if (mem_cgroup_is_root(memcg))
> -		return 0;
>  retry:
>  	if (consume_stock(memcg, nr_pages))
>  		return 0;
> @@ -2880,6 +2878,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	return 0;
>  }
>  
> +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> +			     unsigned int nr_pages)
> +{
> +	if (mem_cgroup_is_root(memcg))
> +		return 0;
> +
> +	return __try_charge(memcg, gfp_mask, nr_pages);
> +}
> +
>  #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
>  static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> @@ -3125,7 +3132,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
>  
>  	memcg = get_mem_cgroup_from_objcg(objcg);
>  
> -	ret = try_charge(memcg, gfp, nr_pages);
> +	ret = __try_charge(memcg, gfp, nr_pages);
>  	if (ret)
>  		goto out;
>  
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-21  7:34 ` Michal Hocko
@ 2021-04-21  9:50     ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-21  9:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > The below scenario can cause the page counters of the root_mem_cgroup
> > to be out of balance.
> >
> > CPU0:                                   CPU1:
> >
> > objcg = get_obj_cgroup_from_current()
> > obj_cgroup_charge_pages(objcg)
> >                                         memcg_reparent_objcgs()
> >                                             // reparent to root_mem_cgroup
> >                                             WRITE_ONCE(iter->memcg, parent)
> >     // memcg == root_mem_cgroup
> >     memcg = get_mem_cgroup_from_objcg(objcg)
> >     // do not charge to the root_mem_cgroup
> >     try_charge(memcg)
> >
> > obj_cgroup_uncharge_pages(objcg)
> >     memcg = get_mem_cgroup_from_objcg(objcg)
> >     // uncharge from the root_mem_cgroup
> >     page_counter_uncharge(&memcg->memory)
> >
> > This can cause the page counter to be less than the actual value,
> > Although we do not display the value (mem_cgroup_usage) so there
> > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > the page_counter_cancel(). Who knows if it will trigger? So it
> > is better to fix it.
>
> The changelog doesn't explain the fix and why you have chosen to charge
> kmem objects to root memcg and left all other try_charge users intact.

The object cgroup is special (because the page can reparent). Only the
user of objcg APIs should be fixed.

> The reason is likely that those are not reparented now but that just
> adds an inconsistency.
>
> Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> to check for the root memcg and bail out early?

Because obj_cgroup_uncharge_pages() uncharges pages from the
root memcg unconditionally. Why? Because some pages can be
reparented to root memcg, in order to ensure the correctness of
page counter of root memcg. We have to uncharge pages from
root memcg. So we do not check whether the page belongs to
the root memcg when it uncharges. Based on this, we have
to make sure that the root memcg page counter is increased
when the page charged. I think the diagram (in the commit log) can
illustrate this problem well.

Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/memcontrol.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1e68a9992b01..81b54bd9b9e0 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2686,8 +2686,8 @@ void mem_cgroup_handle_over_high(void)
> >       css_put(&memcg->css);
> >  }
> >
> > -static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > -                   unsigned int nr_pages)
> > +static int __try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > +                     unsigned int nr_pages)
> >  {
> >       unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
> >       int nr_retries = MAX_RECLAIM_RETRIES;
> > @@ -2699,8 +2699,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >       bool drained = false;
> >       unsigned long pflags;
> >
> > -     if (mem_cgroup_is_root(memcg))
> > -             return 0;
> >  retry:
> >       if (consume_stock(memcg, nr_pages))
> >               return 0;
> > @@ -2880,6 +2878,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >       return 0;
> >  }
> >
> > +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > +                          unsigned int nr_pages)
> > +{
> > +     if (mem_cgroup_is_root(memcg))
> > +             return 0;
> > +
> > +     return __try_charge(memcg, gfp_mask, nr_pages);
> > +}
> > +
> >  #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
> >  static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > @@ -3125,7 +3132,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> >
> >       memcg = get_mem_cgroup_from_objcg(objcg);
> >
> > -     ret = try_charge(memcg, gfp, nr_pages);
> > +     ret = __try_charge(memcg, gfp, nr_pages);
> >       if (ret)
> >               goto out;
> >
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-04-21  9:50     ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-21  9:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > The below scenario can cause the page counters of the root_mem_cgroup
> > to be out of balance.
> >
> > CPU0:                                   CPU1:
> >
> > objcg = get_obj_cgroup_from_current()
> > obj_cgroup_charge_pages(objcg)
> >                                         memcg_reparent_objcgs()
> >                                             // reparent to root_mem_cgroup
> >                                             WRITE_ONCE(iter->memcg, parent)
> >     // memcg == root_mem_cgroup
> >     memcg = get_mem_cgroup_from_objcg(objcg)
> >     // do not charge to the root_mem_cgroup
> >     try_charge(memcg)
> >
> > obj_cgroup_uncharge_pages(objcg)
> >     memcg = get_mem_cgroup_from_objcg(objcg)
> >     // uncharge from the root_mem_cgroup
> >     page_counter_uncharge(&memcg->memory)
> >
> > This can cause the page counter to be less than the actual value,
> > Although we do not display the value (mem_cgroup_usage) so there
> > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > the page_counter_cancel(). Who knows if it will trigger? So it
> > is better to fix it.
>
> The changelog doesn't explain the fix and why you have chosen to charge
> kmem objects to root memcg and left all other try_charge users intact.

The object cgroup is special (because the page can reparent). Only the
user of objcg APIs should be fixed.

> The reason is likely that those are not reparented now but that just
> adds an inconsistency.
>
> Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> to check for the root memcg and bail out early?

Because obj_cgroup_uncharge_pages() uncharges pages from the
root memcg unconditionally. Why? Because some pages can be
reparented to root memcg, in order to ensure the correctness of
page counter of root memcg. We have to uncharge pages from
root memcg. So we do not check whether the page belongs to
the root memcg when it uncharges. Based on this, we have
to make sure that the root memcg page counter is increased
when the page charged. I think the diagram (in the commit log) can
illustrate this problem well.

Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/memcontrol.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1e68a9992b01..81b54bd9b9e0 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2686,8 +2686,8 @@ void mem_cgroup_handle_over_high(void)
> >       css_put(&memcg->css);
> >  }
> >
> > -static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > -                   unsigned int nr_pages)
> > +static int __try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > +                     unsigned int nr_pages)
> >  {
> >       unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
> >       int nr_retries = MAX_RECLAIM_RETRIES;
> > @@ -2699,8 +2699,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >       bool drained = false;
> >       unsigned long pflags;
> >
> > -     if (mem_cgroup_is_root(memcg))
> > -             return 0;
> >  retry:
> >       if (consume_stock(memcg, nr_pages))
> >               return 0;
> > @@ -2880,6 +2878,15 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >       return 0;
> >  }
> >
> > +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > +                          unsigned int nr_pages)
> > +{
> > +     if (mem_cgroup_is_root(memcg))
> > +             return 0;
> > +
> > +     return __try_charge(memcg, gfp_mask, nr_pages);
> > +}
> > +
> >  #if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
> >  static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > @@ -3125,7 +3132,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> >
> >       memcg = get_mem_cgroup_from_objcg(objcg);
> >
> > -     ret = try_charge(memcg, gfp, nr_pages);
> > +     ret = __try_charge(memcg, gfp, nr_pages);
> >       if (ret)
> >               goto out;
> >
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-21  9:50     ` Muchun Song
  (?)
@ 2021-04-21 13:03     ` Michal Hocko
  2021-04-21 13:39         ` Muchun Song
  -1 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2021-04-21 13:03 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed 21-04-21 17:50:06, Muchun Song wrote:
> On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > The below scenario can cause the page counters of the root_mem_cgroup
> > > to be out of balance.
> > >
> > > CPU0:                                   CPU1:
> > >
> > > objcg = get_obj_cgroup_from_current()
> > > obj_cgroup_charge_pages(objcg)
> > >                                         memcg_reparent_objcgs()
> > >                                             // reparent to root_mem_cgroup
> > >                                             WRITE_ONCE(iter->memcg, parent)
> > >     // memcg == root_mem_cgroup
> > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > >     // do not charge to the root_mem_cgroup
> > >     try_charge(memcg)
> > >
> > > obj_cgroup_uncharge_pages(objcg)
> > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > >     // uncharge from the root_mem_cgroup
> > >     page_counter_uncharge(&memcg->memory)
> > >
> > > This can cause the page counter to be less than the actual value,
> > > Although we do not display the value (mem_cgroup_usage) so there
> > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > is better to fix it.
> >
> > The changelog doesn't explain the fix and why you have chosen to charge
> > kmem objects to root memcg and left all other try_charge users intact.
> 
> The object cgroup is special (because the page can reparent). Only the
> user of objcg APIs should be fixed.
> 
> > The reason is likely that those are not reparented now but that just
> > adds an inconsistency.
> >
> > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > to check for the root memcg and bail out early?
> 
> Because obj_cgroup_uncharge_pages() uncharges pages from the
> root memcg unconditionally. Why? Because some pages can be
> reparented to root memcg, in order to ensure the correctness of
> page counter of root memcg. We have to uncharge pages from
> root memcg. So we do not check whether the page belongs to
> the root memcg when it uncharges.

I am not sure I follow. Let me ask differently. Wouldn't you
achieve the same if you simply didn't uncharge root memcg in
obj_cgroup_charge_pages?

Btw. which tree is this patch based on? The current linux-next doesn't
uncharge from memcg->memory inside obj_cgroup_uncharge_pages (nor does
the Linus tree).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-21 13:03     ` Michal Hocko
@ 2021-04-21 13:39         ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-21 13:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > to be out of balance.
> > > >
> > > > CPU0:                                   CPU1:
> > > >
> > > > objcg = get_obj_cgroup_from_current()
> > > > obj_cgroup_charge_pages(objcg)
> > > >                                         memcg_reparent_objcgs()
> > > >                                             // reparent to root_mem_cgroup
> > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > >     // memcg == root_mem_cgroup
> > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > >     // do not charge to the root_mem_cgroup
> > > >     try_charge(memcg)
> > > >
> > > > obj_cgroup_uncharge_pages(objcg)
> > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > >     // uncharge from the root_mem_cgroup
> > > >     page_counter_uncharge(&memcg->memory)
> > > >
> > > > This can cause the page counter to be less than the actual value,
> > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > is better to fix it.
> > >
> > > The changelog doesn't explain the fix and why you have chosen to charge
> > > kmem objects to root memcg and left all other try_charge users intact.
> >
> > The object cgroup is special (because the page can reparent). Only the
> > user of objcg APIs should be fixed.
> >
> > > The reason is likely that those are not reparented now but that just
> > > adds an inconsistency.
> > >
> > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > to check for the root memcg and bail out early?
> >
> > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > root memcg unconditionally. Why? Because some pages can be
> > reparented to root memcg, in order to ensure the correctness of
> > page counter of root memcg. We have to uncharge pages from
> > root memcg. So we do not check whether the page belongs to
> > the root memcg when it uncharges.
>
> I am not sure I follow. Let me ask differently. Wouldn't you
> achieve the same if you simply didn't uncharge root memcg in
> obj_cgroup_charge_pages?

I'm afraid not. Some pages should uncharge root memcg, some
pages should not uncharge root memcg. But all those pages belong
to the root memcg. We cannot distinguish between the two.

I believe Roman is very familiar with this mechanism (objcg APIs).

Hi Roman,

Any thoughts on this?

>
> Btw. which tree is this patch based on? The current linux-next doesn't
> uncharge from memcg->memory inside obj_cgroup_uncharge_pages (nor does
> the Linus tree).

Sorry. I should expose more details.

obj_cgroup_uncharge_pages
  refill_stock->drain_stock
    page_counter_uncharge  // uncharging is here

Thanks.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-04-21 13:39         ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-21 13:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > to be out of balance.
> > > >
> > > > CPU0:                                   CPU1:
> > > >
> > > > objcg = get_obj_cgroup_from_current()
> > > > obj_cgroup_charge_pages(objcg)
> > > >                                         memcg_reparent_objcgs()
> > > >                                             // reparent to root_mem_cgroup
> > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > >     // memcg == root_mem_cgroup
> > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > >     // do not charge to the root_mem_cgroup
> > > >     try_charge(memcg)
> > > >
> > > > obj_cgroup_uncharge_pages(objcg)
> > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > >     // uncharge from the root_mem_cgroup
> > > >     page_counter_uncharge(&memcg->memory)
> > > >
> > > > This can cause the page counter to be less than the actual value,
> > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > is better to fix it.
> > >
> > > The changelog doesn't explain the fix and why you have chosen to charge
> > > kmem objects to root memcg and left all other try_charge users intact.
> >
> > The object cgroup is special (because the page can reparent). Only the
> > user of objcg APIs should be fixed.
> >
> > > The reason is likely that those are not reparented now but that just
> > > adds an inconsistency.
> > >
> > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > to check for the root memcg and bail out early?
> >
> > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > root memcg unconditionally. Why? Because some pages can be
> > reparented to root memcg, in order to ensure the correctness of
> > page counter of root memcg. We have to uncharge pages from
> > root memcg. So we do not check whether the page belongs to
> > the root memcg when it uncharges.
>
> I am not sure I follow. Let me ask differently. Wouldn't you
> achieve the same if you simply didn't uncharge root memcg in
> obj_cgroup_charge_pages?

I'm afraid not. Some pages should uncharge root memcg, some
pages should not uncharge root memcg. But all those pages belong
to the root memcg. We cannot distinguish between the two.

I believe Roman is very familiar with this mechanism (objcg APIs).

Hi Roman,

Any thoughts on this?

>
> Btw. which tree is this patch based on? The current linux-next doesn't
> uncharge from memcg->memory inside obj_cgroup_uncharge_pages (nor does
> the Linus tree).

Sorry. I should expose more details.

obj_cgroup_uncharge_pages
  refill_stock->drain_stock
    page_counter_uncharge  // uncharging is here

Thanks.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-21 13:39         ` Muchun Song
  (?)
@ 2021-04-22  0:57         ` Roman Gushchin
  2021-04-22  3:47             ` Muchun Song
  2021-04-22  8:44           ` Michal Hocko
  -1 siblings, 2 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-04-22  0:57 UTC (permalink / raw)
  To: Muchun Song
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > to be out of balance.
> > > > >
> > > > > CPU0:                                   CPU1:
> > > > >
> > > > > objcg = get_obj_cgroup_from_current()
> > > > > obj_cgroup_charge_pages(objcg)
> > > > >                                         memcg_reparent_objcgs()
> > > > >                                             // reparent to root_mem_cgroup
> > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > >     // memcg == root_mem_cgroup
> > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > >     // do not charge to the root_mem_cgroup
> > > > >     try_charge(memcg)
> > > > >
> > > > > obj_cgroup_uncharge_pages(objcg)
> > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > >     // uncharge from the root_mem_cgroup
> > > > >     page_counter_uncharge(&memcg->memory)
> > > > >
> > > > > This can cause the page counter to be less than the actual value,
> > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > is better to fix it.
> > > >
> > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > kmem objects to root memcg and left all other try_charge users intact.
> > >
> > > The object cgroup is special (because the page can reparent). Only the
> > > user of objcg APIs should be fixed.
> > >
> > > > The reason is likely that those are not reparented now but that just
> > > > adds an inconsistency.
> > > >
> > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > to check for the root memcg and bail out early?
> > >
> > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > root memcg unconditionally. Why? Because some pages can be
> > > reparented to root memcg, in order to ensure the correctness of
> > > page counter of root memcg. We have to uncharge pages from
> > > root memcg. So we do not check whether the page belongs to
> > > the root memcg when it uncharges.
> >
> > I am not sure I follow. Let me ask differently. Wouldn't you
> > achieve the same if you simply didn't uncharge root memcg in
> > obj_cgroup_charge_pages?
> 
> I'm afraid not. Some pages should uncharge root memcg, some
> pages should not uncharge root memcg. But all those pages belong
> to the root memcg. We cannot distinguish between the two.
> 
> I believe Roman is very familiar with this mechanism (objcg APIs).
> 
> Hi Roman,
> 
> Any thoughts on this?

First, unfortunately we do export the root's counter on cgroup v1:
/sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
But we don't ignore these counters for the root mem cgroup, so there
are no bugs here. (Otherwise, please, reproduce it). So it's all about
the potential warning in page_counter_cancel().

The patch looks technically correct to me. Not sure about __try_charge()
naming, we never use "__" prefix to do something with the root_mem_cgroup.

The commit message should be more clear and mention the following:
get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
so we never explicitly charge the root_mem_cgroup. And it's not
going to change.
It's all about a race when we got an obj_cgroup pointing at some non-root
memcg, but before we were able to charge it, the cgroup was gone, objcg was
reparented to the root and so we're skipping the charging. Then we store the
objcg pointer and later use to uncharge the root_mem_cgroup.

But honestly I'm not sure the problem is worth the time spent on the fix
and the discussion. It's a small race and it's generally hard to trigger
a kernel allocation racing with a cgroup deletion and then you need *a lot*
of such races and then maybe there will be a single warning printed without
*any* other consequences.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-22  0:57         ` Roman Gushchin
@ 2021-04-22  3:47             ` Muchun Song
  2021-04-22  8:44           ` Michal Hocko
  1 sibling, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-22  3:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > to be out of balance.
> > > > > >
> > > > > > CPU0:                                   CPU1:
> > > > > >
> > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > obj_cgroup_charge_pages(objcg)
> > > > > >                                         memcg_reparent_objcgs()
> > > > > >                                             // reparent to root_mem_cgroup
> > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > >     // memcg == root_mem_cgroup
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // do not charge to the root_mem_cgroup
> > > > > >     try_charge(memcg)
> > > > > >
> > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // uncharge from the root_mem_cgroup
> > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > >
> > > > > > This can cause the page counter to be less than the actual value,
> > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > is better to fix it.
> > > > >
> > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > >
> > > > The object cgroup is special (because the page can reparent). Only the
> > > > user of objcg APIs should be fixed.
> > > >
> > > > > The reason is likely that those are not reparented now but that just
> > > > > adds an inconsistency.
> > > > >
> > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > to check for the root memcg and bail out early?
> > > >
> > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > root memcg unconditionally. Why? Because some pages can be
> > > > reparented to root memcg, in order to ensure the correctness of
> > > > page counter of root memcg. We have to uncharge pages from
> > > > root memcg. So we do not check whether the page belongs to
> > > > the root memcg when it uncharges.
> > >
> > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > achieve the same if you simply didn't uncharge root memcg in
> > > obj_cgroup_charge_pages?
> >
> > I'm afraid not. Some pages should uncharge root memcg, some
> > pages should not uncharge root memcg. But all those pages belong
> > to the root memcg. We cannot distinguish between the two.
> >
> > I believe Roman is very familiar with this mechanism (objcg APIs).
> >
> > Hi Roman,
> >
> > Any thoughts on this?
>
> First, unfortunately we do export the root's counter on cgroup v1:
> /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> But we don't ignore these counters for the root mem cgroup, so there
> are no bugs here. (Otherwise, please, reproduce it). So it's all about
> the potential warning in page_counter_cancel().

Right.

>
> The patch looks technically correct to me. Not sure about __try_charge()
> naming, we never use "__" prefix to do something with the root_mem_cgroup.
>
> The commit message should be more clear and mention the following:
> get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> so we never explicitly charge the root_mem_cgroup. And it's not
> going to change.
> It's all about a race when we got an obj_cgroup pointing at some non-root
> memcg, but before we were able to charge it, the cgroup was gone, objcg was
> reparented to the root and so we're skipping the charging. Then we store the
> objcg pointer and later use to uncharge the root_mem_cgroup.

Very clear. Thanks.

>
> But honestly I'm not sure the problem is worth the time spent on the fix
> and the discussion. It's a small race and it's generally hard to trigger
> a kernel allocation racing with a cgroup deletion and then you need *a lot*
> of such races and then maybe there will be a single warning printed without
> *any* other consequences.

I agree the race is very small. Since the fix is easy, but a little confusing
to someone. I want to hear other people's suggestions on whether to fix it.

>
> Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-04-22  3:47             ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-22  3:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > to be out of balance.
> > > > > >
> > > > > > CPU0:                                   CPU1:
> > > > > >
> > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > obj_cgroup_charge_pages(objcg)
> > > > > >                                         memcg_reparent_objcgs()
> > > > > >                                             // reparent to root_mem_cgroup
> > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > >     // memcg == root_mem_cgroup
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // do not charge to the root_mem_cgroup
> > > > > >     try_charge(memcg)
> > > > > >
> > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // uncharge from the root_mem_cgroup
> > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > >
> > > > > > This can cause the page counter to be less than the actual value,
> > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > is better to fix it.
> > > > >
> > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > >
> > > > The object cgroup is special (because the page can reparent). Only the
> > > > user of objcg APIs should be fixed.
> > > >
> > > > > The reason is likely that those are not reparented now but that just
> > > > > adds an inconsistency.
> > > > >
> > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > to check for the root memcg and bail out early?
> > > >
> > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > root memcg unconditionally. Why? Because some pages can be
> > > > reparented to root memcg, in order to ensure the correctness of
> > > > page counter of root memcg. We have to uncharge pages from
> > > > root memcg. So we do not check whether the page belongs to
> > > > the root memcg when it uncharges.
> > >
> > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > achieve the same if you simply didn't uncharge root memcg in
> > > obj_cgroup_charge_pages?
> >
> > I'm afraid not. Some pages should uncharge root memcg, some
> > pages should not uncharge root memcg. But all those pages belong
> > to the root memcg. We cannot distinguish between the two.
> >
> > I believe Roman is very familiar with this mechanism (objcg APIs).
> >
> > Hi Roman,
> >
> > Any thoughts on this?
>
> First, unfortunately we do export the root's counter on cgroup v1:
> /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> But we don't ignore these counters for the root mem cgroup, so there
> are no bugs here. (Otherwise, please, reproduce it). So it's all about
> the potential warning in page_counter_cancel().

Right.

>
> The patch looks technically correct to me. Not sure about __try_charge()
> naming, we never use "__" prefix to do something with the root_mem_cgroup.
>
> The commit message should be more clear and mention the following:
> get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> so we never explicitly charge the root_mem_cgroup. And it's not
> going to change.
> It's all about a race when we got an obj_cgroup pointing at some non-root
> memcg, but before we were able to charge it, the cgroup was gone, objcg was
> reparented to the root and so we're skipping the charging. Then we store the
> objcg pointer and later use to uncharge the root_mem_cgroup.

Very clear. Thanks.

>
> But honestly I'm not sure the problem is worth the time spent on the fix
> and the discussion. It's a small race and it's generally hard to trigger
> a kernel allocation racing with a cgroup deletion and then you need *a lot*
> of such races and then maybe there will be a single warning printed without
> *any* other consequences.

I agree the race is very small. Since the fix is easy, but a little confusing
to someone. I want to hear other people's suggestions on whether to fix it.

>
> Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-22  0:57         ` Roman Gushchin
  2021-04-22  3:47             ` Muchun Song
@ 2021-04-22  8:44           ` Michal Hocko
  2021-04-22 18:37             ` Roman Gushchin
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2021-04-22  8:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Muchun Song, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Wed 21-04-21 17:57:49, Roman Gushchin wrote:
> On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > to be out of balance.
> > > > > >
> > > > > > CPU0:                                   CPU1:
> > > > > >
> > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > obj_cgroup_charge_pages(objcg)
> > > > > >                                         memcg_reparent_objcgs()
> > > > > >                                             // reparent to root_mem_cgroup
> > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > >     // memcg == root_mem_cgroup
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // do not charge to the root_mem_cgroup
> > > > > >     try_charge(memcg)
> > > > > >
> > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > >     // uncharge from the root_mem_cgroup
> > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > >
> > > > > > This can cause the page counter to be less than the actual value,
> > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > is better to fix it.
> > > > >
> > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > >
> > > > The object cgroup is special (because the page can reparent). Only the
> > > > user of objcg APIs should be fixed.
> > > >
> > > > > The reason is likely that those are not reparented now but that just
> > > > > adds an inconsistency.
> > > > >
> > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > to check for the root memcg and bail out early?
> > > >
> > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > root memcg unconditionally. Why? Because some pages can be
> > > > reparented to root memcg, in order to ensure the correctness of
> > > > page counter of root memcg. We have to uncharge pages from
> > > > root memcg. So we do not check whether the page belongs to
> > > > the root memcg when it uncharges.
> > >
> > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > achieve the same if you simply didn't uncharge root memcg in
> > > obj_cgroup_charge_pages?
> > 
> > I'm afraid not. Some pages should uncharge root memcg, some
> > pages should not uncharge root memcg. But all those pages belong
> > to the root memcg. We cannot distinguish between the two.
> > 
> > I believe Roman is very familiar with this mechanism (objcg APIs).
> > 
> > Hi Roman,
> > 
> > Any thoughts on this?
> 
> First, unfortunately we do export the root's counter on cgroup v1:
> /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> But we don't ignore these counters for the root mem cgroup, so there
> are no bugs here. (Otherwise, please, reproduce it). So it's all about
> the potential warning in page_counter_cancel().
> 
> The patch looks technically correct to me. Not sure about __try_charge()
> naming, we never use "__" prefix to do something with the root_mem_cgroup.
> 
> The commit message should be more clear and mention the following:
> get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> so we never explicitly charge the root_mem_cgroup. And it's not
> going to change.
> It's all about a race when we got an obj_cgroup pointing at some non-root
> memcg, but before we were able to charge it, the cgroup was gone, objcg was
> reparented to the root and so we're skipping the charging. Then we store the
> objcg pointer and later use to uncharge the root_mem_cgroup.
> 
> But honestly I'm not sure the problem is worth the time spent on the fix
> and the discussion. It's a small race and it's generally hard to trigger
> a kernel allocation racing with a cgroup deletion and then you need *a lot*
> of such races and then maybe there will be a single warning printed without
> *any* other consequences.

Thanks for the clarification Roman! As I've said I am not a obj-cgroup
accounting insider but it would make some sense to opt out from
accounting in the uncharge path just from clarity point of view to match
the charging path (rather than what the patch is proposing and special
case the charging path and make it inconsistent with non obj-cgroup
tracking. What do you think?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-22  8:44           ` Michal Hocko
@ 2021-04-22 18:37             ` Roman Gushchin
  0 siblings, 0 replies; 17+ messages in thread
From: Roman Gushchin @ 2021-04-22 18:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Muchun Song, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Thu, Apr 22, 2021 at 10:44:43AM +0200, Michal Hocko wrote:
> On Wed 21-04-21 17:57:49, Roman Gushchin wrote:
> > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > > to be out of balance.
> > > > > > >
> > > > > > > CPU0:                                   CPU1:
> > > > > > >
> > > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > > obj_cgroup_charge_pages(objcg)
> > > > > > >                                         memcg_reparent_objcgs()
> > > > > > >                                             // reparent to root_mem_cgroup
> > > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > > >     // memcg == root_mem_cgroup
> > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > >     // do not charge to the root_mem_cgroup
> > > > > > >     try_charge(memcg)
> > > > > > >
> > > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > >     // uncharge from the root_mem_cgroup
> > > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > > >
> > > > > > > This can cause the page counter to be less than the actual value,
> > > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > > is better to fix it.
> > > > > >
> > > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > > >
> > > > > The object cgroup is special (because the page can reparent). Only the
> > > > > user of objcg APIs should be fixed.
> > > > >
> > > > > > The reason is likely that those are not reparented now but that just
> > > > > > adds an inconsistency.
> > > > > >
> > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > > to check for the root memcg and bail out early?
> > > > >
> > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > > root memcg unconditionally. Why? Because some pages can be
> > > > > reparented to root memcg, in order to ensure the correctness of
> > > > > page counter of root memcg. We have to uncharge pages from
> > > > > root memcg. So we do not check whether the page belongs to
> > > > > the root memcg when it uncharges.
> > > >
> > > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > > achieve the same if you simply didn't uncharge root memcg in
> > > > obj_cgroup_charge_pages?
> > > 
> > > I'm afraid not. Some pages should uncharge root memcg, some
> > > pages should not uncharge root memcg. But all those pages belong
> > > to the root memcg. We cannot distinguish between the two.
> > > 
> > > I believe Roman is very familiar with this mechanism (objcg APIs).
> > > 
> > > Hi Roman,
> > > 
> > > Any thoughts on this?
> > 
> > First, unfortunately we do export the root's counter on cgroup v1:
> > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> > But we don't ignore these counters for the root mem cgroup, so there
> > are no bugs here. (Otherwise, please, reproduce it). So it's all about
> > the potential warning in page_counter_cancel().
> > 
> > The patch looks technically correct to me. Not sure about __try_charge()
> > naming, we never use "__" prefix to do something with the root_mem_cgroup.
> > 
> > The commit message should be more clear and mention the following:
> > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> > so we never explicitly charge the root_mem_cgroup. And it's not
> > going to change.
> > It's all about a race when we got an obj_cgroup pointing at some non-root
> > memcg, but before we were able to charge it, the cgroup was gone, objcg was
> > reparented to the root and so we're skipping the charging. Then we store the
> > objcg pointer and later use to uncharge the root_mem_cgroup.
> > 
> > But honestly I'm not sure the problem is worth the time spent on the fix
> > and the discussion. It's a small race and it's generally hard to trigger
> > a kernel allocation racing with a cgroup deletion and then you need *a lot*
> > of such races and then maybe there will be a single warning printed without
> > *any* other consequences.
> 
> Thanks for the clarification Roman! As I've said I am not a obj-cgroup
> accounting insider but it would make some sense to opt out from
> accounting in the uncharge path just from clarity point of view to match
> the charging path (rather than what the patch is proposing and special
> case the charging path and make it inconsistent with non obj-cgroup
> tracking. What do you think?

I don't see how it's possible to opt out just for these bytes, but what we can
do is to stop propagating charges to the root mem cgroup in general. Not only
objcg-related, but all. That would even likely have some performance benefit.

The only downside is that we'll still need to propagate charges for cgroup v1
dedicated kmem and tcpmem counters, because those are exported to the
userspace (for the root cgroup). So it will make the page counters code more
complicated.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-22  3:47             ` Muchun Song
  (?)
@ 2021-04-22 18:53             ` Roman Gushchin
  2021-04-23  8:20                 ` Muchun Song
  -1 siblings, 1 reply; 17+ messages in thread
From: Roman Gushchin @ 2021-04-22 18:53 UTC (permalink / raw)
  To: Muchun Song
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Thu, Apr 22, 2021 at 11:47:05AM +0800, Muchun Song wrote:
> On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > > to be out of balance.
> > > > > > >
> > > > > > > CPU0:                                   CPU1:
> > > > > > >
> > > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > > obj_cgroup_charge_pages(objcg)
> > > > > > >                                         memcg_reparent_objcgs()
> > > > > > >                                             // reparent to root_mem_cgroup
> > > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > > >     // memcg == root_mem_cgroup
> > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > >     // do not charge to the root_mem_cgroup
> > > > > > >     try_charge(memcg)
> > > > > > >
> > > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > >     // uncharge from the root_mem_cgroup
> > > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > > >
> > > > > > > This can cause the page counter to be less than the actual value,
> > > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > > is better to fix it.
> > > > > >
> > > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > > >
> > > > > The object cgroup is special (because the page can reparent). Only the
> > > > > user of objcg APIs should be fixed.
> > > > >
> > > > > > The reason is likely that those are not reparented now but that just
> > > > > > adds an inconsistency.
> > > > > >
> > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > > to check for the root memcg and bail out early?
> > > > >
> > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > > root memcg unconditionally. Why? Because some pages can be
> > > > > reparented to root memcg, in order to ensure the correctness of
> > > > > page counter of root memcg. We have to uncharge pages from
> > > > > root memcg. So we do not check whether the page belongs to
> > > > > the root memcg when it uncharges.
> > > >
> > > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > > achieve the same if you simply didn't uncharge root memcg in
> > > > obj_cgroup_charge_pages?
> > >
> > > I'm afraid not. Some pages should uncharge root memcg, some
> > > pages should not uncharge root memcg. But all those pages belong
> > > to the root memcg. We cannot distinguish between the two.
> > >
> > > I believe Roman is very familiar with this mechanism (objcg APIs).
> > >
> > > Hi Roman,
> > >
> > > Any thoughts on this?
> >
> > First, unfortunately we do export the root's counter on cgroup v1:
> > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> > But we don't ignore these counters for the root mem cgroup, so there
> > are no bugs here. (Otherwise, please, reproduce it). So it's all about
> > the potential warning in page_counter_cancel().
> 
> Right.
> 
> >
> > The patch looks technically correct to me. Not sure about __try_charge()
> > naming, we never use "__" prefix to do something with the root_mem_cgroup.
> >
> > The commit message should be more clear and mention the following:
> > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> > so we never explicitly charge the root_mem_cgroup. And it's not
> > going to change.
> > It's all about a race when we got an obj_cgroup pointing at some non-root
> > memcg, but before we were able to charge it, the cgroup was gone, objcg was
> > reparented to the root and so we're skipping the charging. Then we store the
> > objcg pointer and later use to uncharge the root_mem_cgroup.
> 
> Very clear. Thanks.
> 
> >
> > But honestly I'm not sure the problem is worth the time spent on the fix
> > and the discussion. It's a small race and it's generally hard to trigger
> > a kernel allocation racing with a cgroup deletion and then you need *a lot*
> > of such races and then maybe there will be a single warning printed without
> > *any* other consequences.
> 
> I agree the race is very small. Since the fix is easy, but a little confusing
> to someone. I want to hear other people's suggestions on whether to fix it.

I'm not opposing the idea to fix this issue. But, __please__, make sure you
include all necessary information into the commit log.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-04-22 18:53             ` Roman Gushchin
@ 2021-04-23  8:20                 ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-23  8:20 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Fri, Apr 23, 2021 at 2:53 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Apr 22, 2021 at 11:47:05AM +0800, Muchun Song wrote:
> > On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > > > to be out of balance.
> > > > > > > >
> > > > > > > > CPU0:                                   CPU1:
> > > > > > > >
> > > > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > > > obj_cgroup_charge_pages(objcg)
> > > > > > > >                                         memcg_reparent_objcgs()
> > > > > > > >                                             // reparent to root_mem_cgroup
> > > > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > > > >     // memcg == root_mem_cgroup
> > > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > >     // do not charge to the root_mem_cgroup
> > > > > > > >     try_charge(memcg)
> > > > > > > >
> > > > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > >     // uncharge from the root_mem_cgroup
> > > > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > > > >
> > > > > > > > This can cause the page counter to be less than the actual value,
> > > > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > > > is better to fix it.
> > > > > > >
> > > > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > > > >
> > > > > > The object cgroup is special (because the page can reparent). Only the
> > > > > > user of objcg APIs should be fixed.
> > > > > >
> > > > > > > The reason is likely that those are not reparented now but that just
> > > > > > > adds an inconsistency.
> > > > > > >
> > > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > > > to check for the root memcg and bail out early?
> > > > > >
> > > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > > > root memcg unconditionally. Why? Because some pages can be
> > > > > > reparented to root memcg, in order to ensure the correctness of
> > > > > > page counter of root memcg. We have to uncharge pages from
> > > > > > root memcg. So we do not check whether the page belongs to
> > > > > > the root memcg when it uncharges.
> > > > >
> > > > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > > > achieve the same if you simply didn't uncharge root memcg in
> > > > > obj_cgroup_charge_pages?
> > > >
> > > > I'm afraid not. Some pages should uncharge root memcg, some
> > > > pages should not uncharge root memcg. But all those pages belong
> > > > to the root memcg. We cannot distinguish between the two.
> > > >
> > > > I believe Roman is very familiar with this mechanism (objcg APIs).
> > > >
> > > > Hi Roman,
> > > >
> > > > Any thoughts on this?
> > >
> > > First, unfortunately we do export the root's counter on cgroup v1:
> > > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> > > But we don't ignore these counters for the root mem cgroup, so there
> > > are no bugs here. (Otherwise, please, reproduce it). So it's all about
> > > the potential warning in page_counter_cancel().
> >
> > Right.
> >
> > >
> > > The patch looks technically correct to me. Not sure about __try_charge()
> > > naming, we never use "__" prefix to do something with the root_mem_cgroup.
> > >
> > > The commit message should be more clear and mention the following:
> > > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> > > so we never explicitly charge the root_mem_cgroup. And it's not
> > > going to change.
> > > It's all about a race when we got an obj_cgroup pointing at some non-root
> > > memcg, but before we were able to charge it, the cgroup was gone, objcg was
> > > reparented to the root and so we're skipping the charging. Then we store the
> > > objcg pointer and later use to uncharge the root_mem_cgroup.
> >
> > Very clear. Thanks.
> >
> > >
> > > But honestly I'm not sure the problem is worth the time spent on the fix
> > > and the discussion. It's a small race and it's generally hard to trigger
> > > a kernel allocation racing with a cgroup deletion and then you need *a lot*
> > > of such races and then maybe there will be a single warning printed without
> > > *any* other consequences.
> >
> > I agree the race is very small. Since the fix is easy, but a little confusing
> > to someone. I want to hear other people's suggestions on whether to fix it.
>
> I'm not opposing the idea to fix this issue. But, __please__, make sure you
> include all necessary information into the commit log.

Got it. Thanks Roman.

>
> Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-04-23  8:20                 ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-04-23  8:20 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan, fam.zheng

On Fri, Apr 23, 2021 at 2:53 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Apr 22, 2021 at 11:47:05AM +0800, Muchun Song wrote:
> > On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > > > to be out of balance.
> > > > > > > >
> > > > > > > > CPU0:                                   CPU1:
> > > > > > > >
> > > > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > > > obj_cgroup_charge_pages(objcg)
> > > > > > > >                                         memcg_reparent_objcgs()
> > > > > > > >                                             // reparent to root_mem_cgroup
> > > > > > > >                                             WRITE_ONCE(iter->memcg, parent)
> > > > > > > >     // memcg == root_mem_cgroup
> > > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > >     // do not charge to the root_mem_cgroup
> > > > > > > >     try_charge(memcg)
> > > > > > > >
> > > > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > > > >     memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > >     // uncharge from the root_mem_cgroup
> > > > > > > >     page_counter_uncharge(&memcg->memory)
> > > > > > > >
> > > > > > > > This can cause the page counter to be less than the actual value,
> > > > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > > > is better to fix it.
> > > > > > >
> > > > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > > > >
> > > > > > The object cgroup is special (because the page can reparent). Only the
> > > > > > user of objcg APIs should be fixed.
> > > > > >
> > > > > > > The reason is likely that those are not reparented now but that just
> > > > > > > adds an inconsistency.
> > > > > > >
> > > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > > > to check for the root memcg and bail out early?
> > > > > >
> > > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > > > root memcg unconditionally. Why? Because some pages can be
> > > > > > reparented to root memcg, in order to ensure the correctness of
> > > > > > page counter of root memcg. We have to uncharge pages from
> > > > > > root memcg. So we do not check whether the page belongs to
> > > > > > the root memcg when it uncharges.
> > > > >
> > > > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > > > achieve the same if you simply didn't uncharge root memcg in
> > > > > obj_cgroup_charge_pages?
> > > >
> > > > I'm afraid not. Some pages should uncharge root memcg, some
> > > > pages should not uncharge root memcg. But all those pages belong
> > > > to the root memcg. We cannot distinguish between the two.
> > > >
> > > > I believe Roman is very familiar with this mechanism (objcg APIs).
> > > >
> > > > Hi Roman,
> > > >
> > > > Any thoughts on this?
> > >
> > > First, unfortunately we do export the root's counter on cgroup v1:
> > > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> > > But we don't ignore these counters for the root mem cgroup, so there
> > > are no bugs here. (Otherwise, please, reproduce it). So it's all about
> > > the potential warning in page_counter_cancel().
> >
> > Right.
> >
> > >
> > > The patch looks technically correct to me. Not sure about __try_charge()
> > > naming, we never use "__" prefix to do something with the root_mem_cgroup.
> > >
> > > The commit message should be more clear and mention the following:
> > > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> > > so we never explicitly charge the root_mem_cgroup. And it's not
> > > going to change.
> > > It's all about a race when we got an obj_cgroup pointing at some non-root
> > > memcg, but before we were able to charge it, the cgroup was gone, objcg was
> > > reparented to the root and so we're skipping the charging. Then we store the
> > > objcg pointer and later use to uncharge the root_mem_cgroup.
> >
> > Very clear. Thanks.
> >
> > >
> > > But honestly I'm not sure the problem is worth the time spent on the fix
> > > and the discussion. It's a small race and it's generally hard to trigger
> > > a kernel allocation racing with a cgroup deletion and then you need *a lot*
> > > of such races and then maybe there will be a single warning printed without
> > > *any* other consequences.
> >
> > I agree the race is very small. Since the fix is easy, but a little confusing
> > to someone. I want to hear other people's suggestions on whether to fix it.
>
> I'm not opposing the idea to fix this issue. But, __please__, make sure you
> include all necessary information into the commit log.

Got it. Thanks Roman.

>
> Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
  2021-03-02 18:58 ` Roman Gushchin
@ 2021-03-03  3:12     ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-03  3:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Shakeel Butt, LKML,
	Linux Memory Management List

On Wed, Mar 3, 2021 at 2:58 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Mar 02, 2021 at 04:18:23PM +0800, Muchun Song wrote:
> > CPU0:                                   CPU1:
> >
> > objcg = get_obj_cgroup_from_current();
> > obj_cgroup_charge(objcg);
> >                                         memcg_reparent_objcgs();
> >                                             xchg(&objcg->memcg, root_mem_cgroup);
> >     // memcg == root_mem_cgroup
> >     memcg = obj_cgroup_memcg(objcg);
> >     __memcg_kmem_charge(memcg);
> >         // Do not charge to the root memcg
> >         try_charge(memcg);
> >
> > If the objcg->memcg is reparented to the root_mem_cgroup,
> > obj_cgroup_charge() can pass root_mem_cgroup as the first
> > parameter to here. The root_mem_cgroup is skipped in the
> > try_charge(). So the page counters of it do not update.
> >
> > When we uncharge this, we will decrease the page counters
> > (e.g. memory and memsw) of the root_mem_cgroup. This will
> > cause the page counters of the root_mem_cgroup to be out
> > of balance. Fix it by charging the page to the
> > root_mem_cgroup unconditional.
>
> Is this a problem? It seems that we do not expose root memcg's counters
> except kmem and tcp.

In the page_counter_cancel(), we can see a WARN_ON_ONCE()
to catch this issue. Yeah, it is very hard to trigger this warn for
root memcg. But it actually can. Right?

If we do not care about the root memcg counter, we should not warn
for the root memcg.

> It seems that the described problem is not
> applicable to the kmem counter. Please, explain.

The kmem counter of the root memcg is updated unconditionally.
Because we do not check whether the memcg is root when we
charge pages to the kmem counter.

Thanks.

>
> Thanks!
>
> >
> > Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/memcontrol.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 2db2aeac8a9e..edf604824d63 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3078,6 +3078,19 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
> >       if (ret)
> >               return ret;
> >
> > +     /*
> > +      * If the objcg->memcg is reparented to the root_mem_cgroup,
> > +      * obj_cgroup_charge() can pass root_mem_cgroup as the first
> > +      * parameter to here. We should charge the page to the
> > +      * root_mem_cgroup unconditional to keep it's page counters
> > +      * balance.
> > +      */
> > +     if (unlikely(mem_cgroup_is_root(memcg))) {
> > +             page_counter_charge(&memcg->memory, nr_pages);
> > +             if (do_memsw_account())
> > +                     page_counter_charge(&memcg->memsw, nr_pages);
> > +     }
> > +
> >       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> >           !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
> >
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
@ 2021-03-03  3:12     ` Muchun Song
  0 siblings, 0 replies; 17+ messages in thread
From: Muchun Song @ 2021-03-03  3:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Shakeel Butt, LKML,
	Linux Memory Management List

On Wed, Mar 3, 2021 at 2:58 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Mar 02, 2021 at 04:18:23PM +0800, Muchun Song wrote:
> > CPU0:                                   CPU1:
> >
> > objcg = get_obj_cgroup_from_current();
> > obj_cgroup_charge(objcg);
> >                                         memcg_reparent_objcgs();
> >                                             xchg(&objcg->memcg, root_mem_cgroup);
> >     // memcg == root_mem_cgroup
> >     memcg = obj_cgroup_memcg(objcg);
> >     __memcg_kmem_charge(memcg);
> >         // Do not charge to the root memcg
> >         try_charge(memcg);
> >
> > If the objcg->memcg is reparented to the root_mem_cgroup,
> > obj_cgroup_charge() can pass root_mem_cgroup as the first
> > parameter to here. The root_mem_cgroup is skipped in the
> > try_charge(). So the page counters of it do not update.
> >
> > When we uncharge this, we will decrease the page counters
> > (e.g. memory and memsw) of the root_mem_cgroup. This will
> > cause the page counters of the root_mem_cgroup to be out
> > of balance. Fix it by charging the page to the
> > root_mem_cgroup unconditional.
>
> Is this a problem? It seems that we do not expose root memcg's counters
> except kmem and tcp.

In the page_counter_cancel(), we can see a WARN_ON_ONCE()
to catch this issue. Yeah, it is very hard to trigger this warn for
root memcg. But it actually can. Right?

If we do not care about the root memcg counter, we should not warn
for the root memcg.

> It seems that the described problem is not
> applicable to the kmem counter. Please, explain.

The kmem counter of the root memcg is updated unconditionally.
Because we do not check whether the memcg is root when we
charge pages to the kmem counter.

Thanks.

>
> Thanks!
>
> >
> > Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/memcontrol.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 2db2aeac8a9e..edf604824d63 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3078,6 +3078,19 @@ static int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
> >       if (ret)
> >               return ret;
> >
> > +     /*
> > +      * If the objcg->memcg is reparented to the root_mem_cgroup,
> > +      * obj_cgroup_charge() can pass root_mem_cgroup as the first
> > +      * parameter to here. We should charge the page to the
> > +      * root_mem_cgroup unconditional to keep it's page counters
> > +      * balance.
> > +      */
> > +     if (unlikely(mem_cgroup_is_root(memcg))) {
> > +             page_counter_charge(&memcg->memory, nr_pages);
> > +             if (do_memsw_account())
> > +                     page_counter_charge(&memcg->memsw, nr_pages);
> > +     }
> > +
> >       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> >           !page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
> >
> > --
> > 2.11.0
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-23  8:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  6:26 [PATCH] mm: memcontrol: fix root_mem_cgroup charging Muchun Song
2021-04-21  7:34 ` Michal Hocko
2021-04-21  9:50   ` [External] " Muchun Song
2021-04-21  9:50     ` Muchun Song
2021-04-21 13:03     ` Michal Hocko
2021-04-21 13:39       ` Muchun Song
2021-04-21 13:39         ` Muchun Song
2021-04-22  0:57         ` Roman Gushchin
2021-04-22  3:47           ` Muchun Song
2021-04-22  3:47             ` Muchun Song
2021-04-22 18:53             ` Roman Gushchin
2021-04-23  8:20               ` Muchun Song
2021-04-23  8:20                 ` Muchun Song
2021-04-22  8:44           ` Michal Hocko
2021-04-22 18:37             ` Roman Gushchin
  -- strict thread matches above, loose matches on Subject: below --
2021-03-02  8:18 Muchun Song
2021-03-02 18:58 ` Roman Gushchin
2021-03-03  3:12   ` [External] " Muchun Song
2021-03-03  3:12     ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.