linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
       [not found] <CA+G9fYs==eMEmY_OpdhyCHO_1Z5f_M8CAQQTh-AOf5xAvBHKAQ@mail.gmail.com>
@ 2020-03-03 10:52 ` Tetsuo Handa
  2020-03-03 17:47   ` Yang Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2020-03-03 10:52 UTC (permalink / raw)
  To: Naresh Kamboju, linux-mm; +Cc: Andrew Morton, Mel Gorman, Michal Hocko

Hello, Naresh.

> [   98.003346] WARNING: CPU: 2 PID: 340 at
> include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288

This is

/**
 * memalloc_use_memcg - Starts the remote memcg charging scope.
 * @memcg: memcg to charge.
 *
 * This function marks the beginning of the remote memcg charging scope. All the
 * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
 * given memcg.
 *
 * NOTE: This function is not nesting safe.
 */
static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
{
        WARN_ON_ONCE(current->active_memcg);
        current->active_memcg = memcg;
}

which is about memcg. Redirecting to linux-mm.

Please include exact backtrace extracted from scripts/faddr2line .



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 10:52 ` fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Tetsuo Handa
@ 2020-03-03 17:47   ` Yang Shi
  2020-03-03 18:14     ` Shakeel Butt
  2020-03-03 18:40     ` Naresh Kamboju
  0 siblings, 2 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 17:47 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman,
	Michal Hocko, Shakeel Butt, schatzberg.dan

On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Hello, Naresh.
>
> > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
>
> This is
>
> /**
>  * memalloc_use_memcg - Starts the remote memcg charging scope.
>  * @memcg: memcg to charge.
>  *
>  * This function marks the beginning of the remote memcg charging scope. All the
>  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
>  * given memcg.
>  *
>  * NOTE: This function is not nesting safe.
>  */
> static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> {
>         WARN_ON_ONCE(current->active_memcg);
>         current->active_memcg = memcg;
> }
>
> which is about memcg. Redirecting to linux-mm.

Isn't this triggered by ("loop: use worker per cgroup instead of
kworker") in linux-next, which converted loop driver to use worker per
cgroup, so it may have multiple workers work at the mean time?

So they may share the same "current", then it may cause kind of nested
call to memalloc_use_memcg().

Could you please try the below debug patch? This is not the proper
fix, but it may help us narrow down the problem.

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index c49257a..1cc1cdc 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -320,6 +320,10 @@ static inline void
memalloc_nocma_restore(unsigned int flags)
  */
 static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
 {
+       if ((current->flags & PF_KTHREAD) &&
+            current->active_memcg)
+               return;
+
        WARN_ON_ONCE(current->active_memcg);
        current->active_memcg = memcg;
 }

>
> Please include exact backtrace extracted from scripts/faddr2line .
>
>


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 17:47   ` Yang Shi
@ 2020-03-03 18:14     ` Shakeel Butt
  2020-03-03 18:34       ` Yang Shi
                         ` (2 more replies)
  2020-03-03 18:40     ` Naresh Kamboju
  1 sibling, 3 replies; 19+ messages in thread
From: Shakeel Butt @ 2020-03-03 18:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner

On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > Hello, Naresh.
> >
> > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> >
> > This is
> >
> > /**
> >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> >  * @memcg: memcg to charge.
> >  *
> >  * This function marks the beginning of the remote memcg charging scope. All the
> >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> >  * given memcg.
> >  *
> >  * NOTE: This function is not nesting safe.
> >  */
> > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > {
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> > }
> >
> > which is about memcg. Redirecting to linux-mm.
>
> Isn't this triggered by ("loop: use worker per cgroup instead of
> kworker") in linux-next, which converted loop driver to use worker per
> cgroup, so it may have multiple workers work at the mean time?
>
> So they may share the same "current", then it may cause kind of nested
> call to memalloc_use_memcg().
>
> Could you please try the below debug patch? This is not the proper
> fix, but it may help us narrow down the problem.
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index c49257a..1cc1cdc 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -320,6 +320,10 @@ static inline void
> memalloc_nocma_restore(unsigned int flags)
>   */
>  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> +       if ((current->flags & PF_KTHREAD) &&
> +            current->active_memcg)
> +               return;
> +
>         WARN_ON_ONCE(current->active_memcg);
>         current->active_memcg = memcg;
>  }
>

Maybe it's time to make memalloc_use_memcg() nesting safe.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 18:14     ` Shakeel Butt
@ 2020-03-03 18:34       ` Yang Shi
  2020-03-03 19:42       ` Yang Shi
  2020-03-03 20:26       ` Johannes Weiner
  2 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 18:34 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner

On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >
> > > Hello, Naresh.
> > >
> > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > >
> > > This is
> > >
> > > /**
> > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > >  * @memcg: memcg to charge.
> > >  *
> > >  * This function marks the beginning of the remote memcg charging scope. All the
> > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > >  * given memcg.
> > >  *
> > >  * NOTE: This function is not nesting safe.
> > >  */
> > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > {
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > > }
> > >
> > > which is about memcg. Redirecting to linux-mm.
> >
> > Isn't this triggered by ("loop: use worker per cgroup instead of
> > kworker") in linux-next, which converted loop driver to use worker per
> > cgroup, so it may have multiple workers work at the mean time?
> >
> > So they may share the same "current", then it may cause kind of nested
> > call to memalloc_use_memcg().
> >
> > Could you please try the below debug patch? This is not the proper
> > fix, but it may help us narrow down the problem.
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index c49257a..1cc1cdc 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -320,6 +320,10 @@ static inline void
> > memalloc_nocma_restore(unsigned int flags)
> >   */
> >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> >  {
> > +       if ((current->flags & PF_KTHREAD) &&
> > +            current->active_memcg)
> > +               return;
> > +
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> >  }
> >
>
> Maybe it's time to make memalloc_use_memcg() nesting safe.

That would be preferred. Maybe like: static inline void
memalloc_use_memcg(struct mem_cgroup *memcg, struct mem_cgroup *saved)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 17:47   ` Yang Shi
  2020-03-03 18:14     ` Shakeel Butt
@ 2020-03-03 18:40     ` Naresh Kamboju
  2020-03-03 19:04       ` Yang Shi
  1 sibling, 1 reply; 19+ messages in thread
From: Naresh Kamboju @ 2020-03-03 18:40 UTC (permalink / raw)
  To: Yang Shi
  Cc: Tetsuo Handa, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko,
	Shakeel Butt, schatzberg.dan

On Tue, 3 Mar 2020 at 23:17, Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > Hello, Naresh.
> >
> > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
<Trim>
> >
> Could you please try the below debug patch? This is not the proper
> fix, but it may help us narrow down the problem.
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index c49257a..1cc1cdc 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -320,6 +320,10 @@ static inline void
> memalloc_nocma_restore(unsigned int flags)
>   */
>  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> +       if ((current->flags & PF_KTHREAD) &&
> +            current->active_memcg)
> +               return;
> +
>         WARN_ON_ONCE(current->active_memcg);
>         current->active_memcg = memcg;
>  }

After applying this patch the reported "warning" did not happen.
Here is the full test log.
https://lkft.validation.linaro.org/scheduler/job/1265325#L1287

- Naresh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 18:40     ` Naresh Kamboju
@ 2020-03-03 19:04       ` Yang Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 19:04 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Tetsuo Handa, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko,
	Shakeel Butt, schatzberg.dan

On Tue, Mar 3, 2020 at 10:40 AM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> On Tue, 3 Mar 2020 at 23:17, Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >
> > > Hello, Naresh.
> > >
> > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> <Trim>
> > >
> > Could you please try the below debug patch? This is not the proper
> > fix, but it may help us narrow down the problem.
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index c49257a..1cc1cdc 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -320,6 +320,10 @@ static inline void
> > memalloc_nocma_restore(unsigned int flags)
> >   */
> >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> >  {
> > +       if ((current->flags & PF_KTHREAD) &&
> > +            current->active_memcg)
> > +               return;
> > +
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> >  }
>
> After applying this patch the reported "warning" did not happen.
> Here is the full test log.
> https://lkft.validation.linaro.org/scheduler/job/1265325#L1287

Thanks for doing testing. We could confirm this is caused by
paralleled memalloc_use_memcg().

>
> - Naresh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 18:14     ` Shakeel Butt
  2020-03-03 18:34       ` Yang Shi
@ 2020-03-03 19:42       ` Yang Shi
  2020-03-03 20:26         ` Shakeel Butt
  2020-03-03 20:33         ` Johannes Weiner
  2020-03-03 20:26       ` Johannes Weiner
  2 siblings, 2 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 19:42 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner

On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >
> > > Hello, Naresh.
> > >
> > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > >
> > > This is
> > >
> > > /**
> > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > >  * @memcg: memcg to charge.
> > >  *
> > >  * This function marks the beginning of the remote memcg charging scope. All the
> > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > >  * given memcg.
> > >  *
> > >  * NOTE: This function is not nesting safe.
> > >  */
> > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > {
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > > }
> > >
> > > which is about memcg. Redirecting to linux-mm.
> >
> > Isn't this triggered by ("loop: use worker per cgroup instead of
> > kworker") in linux-next, which converted loop driver to use worker per
> > cgroup, so it may have multiple workers work at the mean time?
> >
> > So they may share the same "current", then it may cause kind of nested
> > call to memalloc_use_memcg().
> >
> > Could you please try the below debug patch? This is not the proper
> > fix, but it may help us narrow down the problem.
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index c49257a..1cc1cdc 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -320,6 +320,10 @@ static inline void
> > memalloc_nocma_restore(unsigned int flags)
> >   */
> >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> >  {
> > +       if ((current->flags & PF_KTHREAD) &&
> > +            current->active_memcg)
> > +               return;
> > +
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> >  }
> >
>
> Maybe it's time to make memalloc_use_memcg() nesting safe.

Need handle the below case:

            CPU A                                          CPU B
memalloc_use_memcg
                                                     memalloc_use_memcg
memalloc_unuse_memcg
                                                     memalloc_unuse_memcg


They may manipulate the same task->active_memcg, so CPU B may still
see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B
may not restore active_memcg to NULL. And, some code depends on
correct active_memcg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 19:42       ` Yang Shi
@ 2020-03-03 20:26         ` Shakeel Butt
  2020-03-03 20:33         ` Johannes Weiner
  1 sibling, 0 replies; 19+ messages in thread
From: Shakeel Butt @ 2020-03-03 20:26 UTC (permalink / raw)
  To: Yang Shi
  Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner

On Tue, Mar 3, 2020 at 11:42 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >
> > > > Hello, Naresh.
> > > >
> > > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > > >
> > > > This is
> > > >
> > > > /**
> > > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > > >  * @memcg: memcg to charge.
> > > >  *
> > > >  * This function marks the beginning of the remote memcg charging scope. All the
> > > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > > >  * given memcg.
> > > >  *
> > > >  * NOTE: This function is not nesting safe.
> > > >  */
> > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > > {
> > > >         WARN_ON_ONCE(current->active_memcg);
> > > >         current->active_memcg = memcg;
> > > > }
> > > >
> > > > which is about memcg. Redirecting to linux-mm.
> > >
> > > Isn't this triggered by ("loop: use worker per cgroup instead of
> > > kworker") in linux-next, which converted loop driver to use worker per
> > > cgroup, so it may have multiple workers work at the mean time?
> > >
> > > So they may share the same "current", then it may cause kind of nested
> > > call to memalloc_use_memcg().
> > >
> > > Could you please try the below debug patch? This is not the proper
> > > fix, but it may help us narrow down the problem.
> > >
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index c49257a..1cc1cdc 100644
> > > --- a/include/linux/sched/mm.h
> > > +++ b/include/linux/sched/mm.h
> > > @@ -320,6 +320,10 @@ static inline void
> > > memalloc_nocma_restore(unsigned int flags)
> > >   */
> > >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > >  {
> > > +       if ((current->flags & PF_KTHREAD) &&
> > > +            current->active_memcg)
> > > +               return;
> > > +
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > >  }
> > >
> >
> > Maybe it's time to make memalloc_use_memcg() nesting safe.
>
> Need handle the below case:
>
>             CPU A                                          CPU B
> memalloc_use_memcg
>                                                      memalloc_use_memcg
> memalloc_unuse_memcg
>                                                      memalloc_unuse_memcg
>
>
> They may manipulate the same task->active_memcg, so CPU B may still
> see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B
> may not restore active_memcg to NULL. And, some code depends on
> correct active_memcg.

Sorry for asking a simple question. What does it mean that more than
one kworkers share the same 'current'?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 18:14     ` Shakeel Butt
  2020-03-03 18:34       ` Yang Shi
  2020-03-03 19:42       ` Yang Shi
@ 2020-03-03 20:26       ` Johannes Weiner
  2020-03-03 20:40         ` Shakeel Butt
  2020-03-03 20:57         ` Yang Shi
  2 siblings, 2 replies; 19+ messages in thread
From: Johannes Weiner @ 2020-03-03 20:26 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote:
> On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >
> > > Hello, Naresh.
> > >
> > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > >
> > > This is
> > >
> > > /**
> > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > >  * @memcg: memcg to charge.
> > >  *
> > >  * This function marks the beginning of the remote memcg charging scope. All the
> > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > >  * given memcg.
> > >  *
> > >  * NOTE: This function is not nesting safe.
> > >  */
> > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > {
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > > }
> > >
> > > which is about memcg. Redirecting to linux-mm.
> >
> > Isn't this triggered by ("loop: use worker per cgroup instead of
> > kworker") in linux-next, which converted loop driver to use worker per
> > cgroup, so it may have multiple workers work at the mean time?
> >
> > So they may share the same "current", then it may cause kind of nested
> > call to memalloc_use_memcg().
> >
> > Could you please try the below debug patch? This is not the proper
> > fix, but it may help us narrow down the problem.
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index c49257a..1cc1cdc 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -320,6 +320,10 @@ static inline void
> > memalloc_nocma_restore(unsigned int flags)
> >   */
> >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> >  {
> > +       if ((current->flags & PF_KTHREAD) &&
> > +            current->active_memcg)
> > +               return;
> > +
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> >  }
> >
> 
> Maybe it's time to make memalloc_use_memcg() nesting safe.

Yes, I think so. The stack trace:

[   98.137605]  alloc_page_buffers+0x210/0x288
[   98.141799]  __getblk_gfp+0x1d4/0x400
[   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
[   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
[   98.154821]  ext4_mb_init_group+0x270/0x390
[   98.159014]  ext4_mb_good_group+0x264/0x270
[   98.163208]  ext4_mb_regular_allocator+0x480/0x798
[   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
[   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
[   98.176660]  ext4_map_blocks+0x1b8/0x8a0
[   98.180592]  ext4_writepages+0x830/0xf10
[   98.184523]  do_writepages+0xb4/0x198
[   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
[   98.193086]  filemap_write_and_wait_range+0x40/0xb0
[   98.197974]  ext4_punch_hole+0x4a4/0x660
[   98.201907]  ext4_fallocate+0x294/0x1190
[   98.205839]  loop_process_work+0x690/0x1100
[   98.210032]  loop_workfn+0x2c/0x110
[   98.213529]  process_one_work+0x3e0/0x648
[   98.217546]  worker_thread+0x70/0x670
[   98.221217]  kthread+0x1b8/0x1c0
[   98.224452]  ret_from_fork+0x10/0x18

The loop kworker is instantiating cache pages on behalf of who queued
the io request, but if the page already exists, the buffers should be
allocated on behalf of who already owns the page. Nesting makes sense.

Since the only difference between the use and unuse function is the
warn when we nest, we can remove the unuse and do something like:

old = memalloc_use_memcg(memcg);
memalloc_use_memcg(old);

What do you think? Patch below. It should go in before Dan's patches,
and they in turn need a small update to save and restore active_memcg.
(Since loop's use is from a kworker, it's unlikely that there is an
outer scope. But it's probably best to keep this simple and robust.)

---

From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 3 Mar 2020 15:15:39 -0500
Subject: [PATCH] mm: support nesting memalloc_use_memcg()

The memalloc_use_memcg() function to override the default memcg
accounting context currently doesn't nest. But the patches to make the
loop driver cgroup-aware will end up nesting:

[   98.137605]  alloc_page_buffers+0x210/0x288
[   98.141799]  __getblk_gfp+0x1d4/0x400
[   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
[   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
[   98.154821]  ext4_mb_init_group+0x270/0x390
[   98.159014]  ext4_mb_good_group+0x264/0x270
[   98.163208]  ext4_mb_regular_allocator+0x480/0x798
[   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
[   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
[   98.176660]  ext4_map_blocks+0x1b8/0x8a0
[   98.180592]  ext4_writepages+0x830/0xf10
[   98.184523]  do_writepages+0xb4/0x198
[   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
[   98.193086]  filemap_write_and_wait_range+0x40/0xb0
[   98.197974]  ext4_punch_hole+0x4a4/0x660
[   98.201907]  ext4_fallocate+0x294/0x1190
[   98.205839]  loop_process_work+0x690/0x1100
[   98.210032]  loop_workfn+0x2c/0x110
[   98.213529]  process_one_work+0x3e0/0x648
[   98.217546]  worker_thread+0x70/0x670
[   98.221217]  kthread+0x1b8/0x1c0
[   98.224452]  ret_from_fork+0x10/0x18

where loop_process_work() sets the memcg override to the memcg that
submitted the IO request, and alloc_page_buffers() sets the override
to the memcg that instantiated the cache page, which may differ.

Make memalloc_use_memcg() return the old memcg and convert existing
users to a stacking model. Delete the unused memalloc_unuse_memcg().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/buffer.c                          |  6 ++---
 fs/notify/fanotify/fanotify.c        |  5 +++--
 fs/notify/inotify/inotify_fsnotify.c |  5 +++--
 include/linux/sched/mm.h             | 33 ++++++++++++----------------
 4 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d8c7242426bb..54d5df14bd36 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 	struct buffer_head *bh, *head;
 	gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
 	long offset;
-	struct mem_cgroup *memcg;
+	struct mem_cgroup *memcg, *oldmemcg;
 
 	if (retry)
 		gfp |= __GFP_NOFAIL;
 
 	memcg = get_mem_cgroup_from_page(page);
-	memalloc_use_memcg(memcg);
+	oldmemcg = memalloc_use_memcg(memcg);
 
 	head = NULL;
 	offset = PAGE_SIZE;
@@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 		set_bh_page(bh, page, offset);
 	}
 out:
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 	mem_cgroup_put(memcg);
 	return head;
 /*
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 5778d1347b35..cb596ad002d8 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct fanotify_event *event = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
+	struct mem_cgroup *oldmemcg;
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		gfp |= __GFP_RETRY_MAYFAIL;
 
 	/* Whoever is interested in the event, pays for the allocation. */
-	memalloc_use_memcg(group->memcg);
+	oldmemcg = memalloc_use_memcg(group->memcg);
 
 	if (fanotify_is_perm_event(mask)) {
 		struct fanotify_perm_event *pevent;
@@ -334,7 +335,7 @@ init: __maybe_unused
 		event->path.dentry = NULL;
 	}
 out:
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 	return event;
 }
 
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d510223d302c..776ce66aaa47 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group,
 	int ret;
 	int len = 0;
 	int alloc_len = sizeof(struct inotify_event_info);
+	struct mem_cgroup *oldmemcg;
 
 	if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info)))
 		return 0;
@@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group,
 	 * trigger OOM killer in the target monitoring memcg as it may have
 	 * security repercussion.
 	 */
-	memalloc_use_memcg(group->memcg);
+	oldmemcg = memalloc_use_memcg(group->memcg);
 	event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 
 	if (unlikely(!event)) {
 		/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index c49257a3b510..ced06c12daf7 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
  * given memcg.
  *
- * NOTE: This function is not nesting safe.
- */
-static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
-{
-	WARN_ON_ONCE(current->active_memcg);
-	current->active_memcg = memcg;
-}
-
-/**
- * memalloc_unuse_memcg - Ends the remote memcg charging scope.
+ * NOTE: This function can nest. Users must save the return value and
+ * reset the previous value after their own charging scope is over:
  *
- * This function marks the end of the remote memcg charging scope started by
- * memalloc_use_memcg().
+ *      old = memalloc_use_memcg(memcg);
+ *      // ... allocations ...
+ *      memalloc_use_memcg(old);
  */
-static inline void memalloc_unuse_memcg(void)
+static inline struct mem_cgroup *__must_check
+memalloc_use_memcg(struct mem_cgroup *memcg)
 {
-	current->active_memcg = NULL;
+	struct mem_cgroup *old = current->active_memcg;
+
+	current->active_memcg = memcg;
+	return old;
 }
 #else
-static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void memalloc_unuse_memcg(void)
+static inline struct mem_cgroup *__must_check
+memalloc_use_memcg(struct mem_cgroup *memcg)
 {
+	return NULL;
 }
 #endif
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 19:42       ` Yang Shi
  2020-03-03 20:26         ` Shakeel Butt
@ 2020-03-03 20:33         ` Johannes Weiner
  2020-03-03 20:59           ` Yang Shi
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Weiner @ 2020-03-03 20:33 UTC (permalink / raw)
  To: Yang Shi
  Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm,
	Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 03, 2020 at 11:42:31AM -0800, Yang Shi wrote:
> On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >
> > > > Hello, Naresh.
> > > >
> > > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > > >
> > > > This is
> > > >
> > > > /**
> > > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > > >  * @memcg: memcg to charge.
> > > >  *
> > > >  * This function marks the beginning of the remote memcg charging scope. All the
> > > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > > >  * given memcg.
> > > >  *
> > > >  * NOTE: This function is not nesting safe.
> > > >  */
> > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > > {
> > > >         WARN_ON_ONCE(current->active_memcg);
> > > >         current->active_memcg = memcg;
> > > > }
> > > >
> > > > which is about memcg. Redirecting to linux-mm.
> > >
> > > Isn't this triggered by ("loop: use worker per cgroup instead of
> > > kworker") in linux-next, which converted loop driver to use worker per
> > > cgroup, so it may have multiple workers work at the mean time?
> > >
> > > So they may share the same "current", then it may cause kind of nested
> > > call to memalloc_use_memcg().
> > >
> > > Could you please try the below debug patch? This is not the proper
> > > fix, but it may help us narrow down the problem.
> > >
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index c49257a..1cc1cdc 100644
> > > --- a/include/linux/sched/mm.h
> > > +++ b/include/linux/sched/mm.h
> > > @@ -320,6 +320,10 @@ static inline void
> > > memalloc_nocma_restore(unsigned int flags)
> > >   */
> > >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > >  {
> > > +       if ((current->flags & PF_KTHREAD) &&
> > > +            current->active_memcg)
> > > +               return;
> > > +
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > >  }
> > >
> >
> > Maybe it's time to make memalloc_use_memcg() nesting safe.
> 
> Need handle the below case:
> 
>             CPU A                                          CPU B
> memalloc_use_memcg
>                                                      memalloc_use_memcg
> memalloc_unuse_memcg
>                                                      memalloc_unuse_memcg
> 
> 
> They may manipulate the same task->active_memcg, so CPU B may still
> see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B
> may not restore active_memcg to NULL. And, some code depends on
> correct active_memcg.

It's safe because it's only `current` updating a private pointer -
nobody is changing active_memcg of a different task. And a task cannot
run on more than one CPU simultaneously.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 20:26       ` Johannes Weiner
@ 2020-03-03 20:40         ` Shakeel Butt
  2020-03-03 21:06           ` Johannes Weiner
  2020-03-03 20:57         ` Yang Shi
  1 sibling, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2020-03-03 20:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote:
> > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >
> > > > Hello, Naresh.
> > > >
> > > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > > >
> > > > This is
> > > >
> > > > /**
> > > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > > >  * @memcg: memcg to charge.
> > > >  *
> > > >  * This function marks the beginning of the remote memcg charging scope. All the
> > > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > > >  * given memcg.
> > > >  *
> > > >  * NOTE: This function is not nesting safe.
> > > >  */
> > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > > {
> > > >         WARN_ON_ONCE(current->active_memcg);
> > > >         current->active_memcg = memcg;
> > > > }
> > > >
> > > > which is about memcg. Redirecting to linux-mm.
> > >
> > > Isn't this triggered by ("loop: use worker per cgroup instead of
> > > kworker") in linux-next, which converted loop driver to use worker per
> > > cgroup, so it may have multiple workers work at the mean time?
> > >
> > > So they may share the same "current", then it may cause kind of nested
> > > call to memalloc_use_memcg().
> > >
> > > Could you please try the below debug patch? This is not the proper
> > > fix, but it may help us narrow down the problem.
> > >
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index c49257a..1cc1cdc 100644
> > > --- a/include/linux/sched/mm.h
> > > +++ b/include/linux/sched/mm.h
> > > @@ -320,6 +320,10 @@ static inline void
> > > memalloc_nocma_restore(unsigned int flags)
> > >   */
> > >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > >  {
> > > +       if ((current->flags & PF_KTHREAD) &&
> > > +            current->active_memcg)
> > > +               return;
> > > +
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > >  }
> > >
> >
> > Maybe it's time to make memalloc_use_memcg() nesting safe.
>
> Yes, I think so. The stack trace:
>
> [   98.137605]  alloc_page_buffers+0x210/0x288
> [   98.141799]  __getblk_gfp+0x1d4/0x400
> [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> [   98.154821]  ext4_mb_init_group+0x270/0x390
> [   98.159014]  ext4_mb_good_group+0x264/0x270
> [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> [   98.180592]  ext4_writepages+0x830/0xf10
> [   98.184523]  do_writepages+0xb4/0x198
> [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> [   98.197974]  ext4_punch_hole+0x4a4/0x660
> [   98.201907]  ext4_fallocate+0x294/0x1190
> [   98.205839]  loop_process_work+0x690/0x1100
> [   98.210032]  loop_workfn+0x2c/0x110
> [   98.213529]  process_one_work+0x3e0/0x648
> [   98.217546]  worker_thread+0x70/0x670
> [   98.221217]  kthread+0x1b8/0x1c0
> [   98.224452]  ret_from_fork+0x10/0x18
>
> The loop kworker is instantiating cache pages on behalf of who queued
> the io request, but if the page already exists, the buffers should be
> allocated on behalf of who already owns the page. Nesting makes sense.
>
> Since the only difference between the use and unuse function is the
> warn when we nest, we can remove the unuse and do something like:
>
> old = memalloc_use_memcg(memcg);
> memalloc_use_memcg(old);
>
> What do you think?

SGTM.

> Patch below. It should go in before Dan's patches,
> and they in turn need a small update to save and restore active_memcg.
> (Since loop's use is from a kworker, it's unlikely that there is an
> outer scope. But it's probably best to keep this simple and robust.)
>
> ---
>
> From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Tue, 3 Mar 2020 15:15:39 -0500
> Subject: [PATCH] mm: support nesting memalloc_use_memcg()
>
> The memalloc_use_memcg() function to override the default memcg
> accounting context currently doesn't nest. But the patches to make the
> loop driver cgroup-aware will end up nesting:
>
> [   98.137605]  alloc_page_buffers+0x210/0x288
> [   98.141799]  __getblk_gfp+0x1d4/0x400
> [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> [   98.154821]  ext4_mb_init_group+0x270/0x390
> [   98.159014]  ext4_mb_good_group+0x264/0x270
> [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> [   98.180592]  ext4_writepages+0x830/0xf10
> [   98.184523]  do_writepages+0xb4/0x198
> [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> [   98.197974]  ext4_punch_hole+0x4a4/0x660
> [   98.201907]  ext4_fallocate+0x294/0x1190
> [   98.205839]  loop_process_work+0x690/0x1100
> [   98.210032]  loop_workfn+0x2c/0x110
> [   98.213529]  process_one_work+0x3e0/0x648
> [   98.217546]  worker_thread+0x70/0x670
> [   98.221217]  kthread+0x1b8/0x1c0
> [   98.224452]  ret_from_fork+0x10/0x18
>
> where loop_process_work() sets the memcg override to the memcg that
> submitted the IO request, and alloc_page_buffers() sets the override
> to the memcg that instantiated the cache page, which may differ.
>
> Make memalloc_use_memcg() return the old memcg and convert existing
> users to a stacking model. Delete the unused memalloc_unuse_memcg().
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

> ---
>  fs/buffer.c                          |  6 ++---
>  fs/notify/fanotify/fanotify.c        |  5 +++--
>  fs/notify/inotify/inotify_fsnotify.c |  5 +++--
>  include/linux/sched/mm.h             | 33 ++++++++++++----------------
>  4 files changed, 23 insertions(+), 26 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d8c7242426bb..54d5df14bd36 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
>         struct buffer_head *bh, *head;
>         gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
>         long offset;
> -       struct mem_cgroup *memcg;
> +       struct mem_cgroup *memcg, *oldmemcg;
>
>         if (retry)
>                 gfp |= __GFP_NOFAIL;
>
>         memcg = get_mem_cgroup_from_page(page);
> -       memalloc_use_memcg(memcg);
> +       oldmemcg = memalloc_use_memcg(memcg);
>
>         head = NULL;
>         offset = PAGE_SIZE;
> @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
>                 set_bh_page(bh, page, offset);
>         }
>  out:
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>         mem_cgroup_put(memcg);
>         return head;
>  /*
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 5778d1347b35..cb596ad002d8 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>         struct fanotify_event *event = NULL;
>         gfp_t gfp = GFP_KERNEL_ACCOUNT;
>         struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
> +       struct mem_cgroup *oldmemcg;
>
>         /*
>          * For queues with unlimited length lost events are not expected and
> @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>                 gfp |= __GFP_RETRY_MAYFAIL;
>
>         /* Whoever is interested in the event, pays for the allocation. */
> -       memalloc_use_memcg(group->memcg);
> +       oldmemcg = memalloc_use_memcg(group->memcg);
>
>         if (fanotify_is_perm_event(mask)) {
>                 struct fanotify_perm_event *pevent;
> @@ -334,7 +335,7 @@ init: __maybe_unused
>                 event->path.dentry = NULL;
>         }
>  out:
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>         return event;
>  }
>
> diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
> index d510223d302c..776ce66aaa47 100644
> --- a/fs/notify/inotify/inotify_fsnotify.c
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group,
>         int ret;
>         int len = 0;
>         int alloc_len = sizeof(struct inotify_event_info);
> +       struct mem_cgroup *oldmemcg;
>
>         if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info)))
>                 return 0;
> @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group,
>          * trigger OOM killer in the target monitoring memcg as it may have
>          * security repercussion.
>          */
> -       memalloc_use_memcg(group->memcg);
> +       oldmemcg = memalloc_use_memcg(group->memcg);
>         event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>
>         if (unlikely(!event)) {
>                 /*
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index c49257a3b510..ced06c12daf7 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
>   * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
>   * given memcg.
>   *
> - * NOTE: This function is not nesting safe.
> - */
> -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> -{
> -       WARN_ON_ONCE(current->active_memcg);
> -       current->active_memcg = memcg;
> -}
> -
> -/**
> - * memalloc_unuse_memcg - Ends the remote memcg charging scope.
> + * NOTE: This function can nest. Users must save the return value and
> + * reset the previous value after their own charging scope is over:

Should we mention that this is still not irq safe? At the moment other
than skmem we don't do memcg charging in the interrupt and skmem has a
memcg already associated with it. Maybe in future there might be a
case where we want a specific memcg be charged for kmem in the
interrupt context. Until then we can mark that this function should
not be used in interrupt.

>   *
> - * This function marks the end of the remote memcg charging scope started by
> - * memalloc_use_memcg().
> + *      old = memalloc_use_memcg(memcg);
> + *      // ... allocations ...
> + *      memalloc_use_memcg(old);
>   */
> -static inline void memalloc_unuse_memcg(void)
> +static inline struct mem_cgroup *__must_check
> +memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> -       current->active_memcg = NULL;
> +       struct mem_cgroup *old = current->active_memcg;
> +
> +       current->active_memcg = memcg;
> +       return old;
>  }
>  #else
> -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> -{
> -}
> -
> -static inline void memalloc_unuse_memcg(void)
> +static inline struct mem_cgroup *__must_check
> +memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> +       return NULL;
>  }
>  #endif
>
> --
> 2.24.1
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 20:26       ` Johannes Weiner
  2020-03-03 20:40         ` Shakeel Butt
@ 2020-03-03 20:57         ` Yang Shi
  1 sibling, 0 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 20:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm,
	Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote:
> > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >
> > > > Hello, Naresh.
> > > >
> > > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > > >
> > > > This is
> > > >
> > > > /**
> > > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > > >  * @memcg: memcg to charge.
> > > >  *
> > > >  * This function marks the beginning of the remote memcg charging scope. All the
> > > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > > >  * given memcg.
> > > >  *
> > > >  * NOTE: This function is not nesting safe.
> > > >  */
> > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > > {
> > > >         WARN_ON_ONCE(current->active_memcg);
> > > >         current->active_memcg = memcg;
> > > > }
> > > >
> > > > which is about memcg. Redirecting to linux-mm.
> > >
> > > Isn't this triggered by ("loop: use worker per cgroup instead of
> > > kworker") in linux-next, which converted loop driver to use worker per
> > > cgroup, so it may have multiple workers work at the mean time?
> > >
> > > So they may share the same "current", then it may cause kind of nested
> > > call to memalloc_use_memcg().
> > >
> > > Could you please try the below debug patch? This is not the proper
> > > fix, but it may help us narrow down the problem.
> > >
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index c49257a..1cc1cdc 100644
> > > --- a/include/linux/sched/mm.h
> > > +++ b/include/linux/sched/mm.h
> > > @@ -320,6 +320,10 @@ static inline void
> > > memalloc_nocma_restore(unsigned int flags)
> > >   */
> > >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > >  {
> > > +       if ((current->flags & PF_KTHREAD) &&
> > > +            current->active_memcg)
> > > +               return;
> > > +
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > >  }
> > >
> >
> > Maybe it's time to make memalloc_use_memcg() nesting safe.
>
> Yes, I think so. The stack trace:
>
> [   98.137605]  alloc_page_buffers+0x210/0x288
> [   98.141799]  __getblk_gfp+0x1d4/0x400
> [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> [   98.154821]  ext4_mb_init_group+0x270/0x390
> [   98.159014]  ext4_mb_good_group+0x264/0x270
> [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> [   98.180592]  ext4_writepages+0x830/0xf10
> [   98.184523]  do_writepages+0xb4/0x198
> [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> [   98.197974]  ext4_punch_hole+0x4a4/0x660
> [   98.201907]  ext4_fallocate+0x294/0x1190
> [   98.205839]  loop_process_work+0x690/0x1100
> [   98.210032]  loop_workfn+0x2c/0x110
> [   98.213529]  process_one_work+0x3e0/0x648
> [   98.217546]  worker_thread+0x70/0x670
> [   98.221217]  kthread+0x1b8/0x1c0
> [   98.224452]  ret_from_fork+0x10/0x18
>
> The loop kworker is instantiating cache pages on behalf of who queued
> the io request, but if the page already exists, the buffers should be
> allocated on behalf of who already owns the page. Nesting makes sense.
>
> Since the only difference between the use and unuse function is the
> warn when we nest, we can remove the unuse and do something like:
>
> old = memalloc_use_memcg(memcg);
> memalloc_use_memcg(old);
>
> What do you think? Patch below. It should go in before Dan's patches,
> and they in turn need a small update to save and restore active_memcg.
> (Since loop's use is from a kworker, it's unlikely that there is an
> outer scope. But it's probably best to keep this simple and robust.)
>
> ---
>
> From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Tue, 3 Mar 2020 15:15:39 -0500
> Subject: [PATCH] mm: support nesting memalloc_use_memcg()
>
> The memalloc_use_memcg() function to override the default memcg
> accounting context currently doesn't nest. But the patches to make the
> loop driver cgroup-aware will end up nesting:
>
> [   98.137605]  alloc_page_buffers+0x210/0x288
> [   98.141799]  __getblk_gfp+0x1d4/0x400
> [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> [   98.154821]  ext4_mb_init_group+0x270/0x390
> [   98.159014]  ext4_mb_good_group+0x264/0x270
> [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> [   98.180592]  ext4_writepages+0x830/0xf10
> [   98.184523]  do_writepages+0xb4/0x198
> [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> [   98.197974]  ext4_punch_hole+0x4a4/0x660
> [   98.201907]  ext4_fallocate+0x294/0x1190
> [   98.205839]  loop_process_work+0x690/0x1100
> [   98.210032]  loop_workfn+0x2c/0x110
> [   98.213529]  process_one_work+0x3e0/0x648
> [   98.217546]  worker_thread+0x70/0x670
> [   98.221217]  kthread+0x1b8/0x1c0
> [   98.224452]  ret_from_fork+0x10/0x18
>
> where loop_process_work() sets the memcg override to the memcg that
> submitted the IO request, and alloc_page_buffers() sets the override
> to the memcg that instantiated the cache page, which may differ.
>
> Make memalloc_use_memcg() return the old memcg and convert existing
> users to a stacking model. Delete the unused memalloc_unuse_memcg().

Aha, I was thinking about the same appraoch. Looks good to me.
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>

>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  fs/buffer.c                          |  6 ++---
>  fs/notify/fanotify/fanotify.c        |  5 +++--
>  fs/notify/inotify/inotify_fsnotify.c |  5 +++--
>  include/linux/sched/mm.h             | 33 ++++++++++++----------------
>  4 files changed, 23 insertions(+), 26 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d8c7242426bb..54d5df14bd36 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
>         struct buffer_head *bh, *head;
>         gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
>         long offset;
> -       struct mem_cgroup *memcg;
> +       struct mem_cgroup *memcg, *oldmemcg;
>
>         if (retry)
>                 gfp |= __GFP_NOFAIL;
>
>         memcg = get_mem_cgroup_from_page(page);
> -       memalloc_use_memcg(memcg);
> +       oldmemcg = memalloc_use_memcg(memcg);
>
>         head = NULL;
>         offset = PAGE_SIZE;
> @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
>                 set_bh_page(bh, page, offset);
>         }
>  out:
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>         mem_cgroup_put(memcg);
>         return head;
>  /*
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 5778d1347b35..cb596ad002d8 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>         struct fanotify_event *event = NULL;
>         gfp_t gfp = GFP_KERNEL_ACCOUNT;
>         struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
> +       struct mem_cgroup *oldmemcg;
>
>         /*
>          * For queues with unlimited length lost events are not expected and
> @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
>                 gfp |= __GFP_RETRY_MAYFAIL;
>
>         /* Whoever is interested in the event, pays for the allocation. */
> -       memalloc_use_memcg(group->memcg);
> +       oldmemcg = memalloc_use_memcg(group->memcg);
>
>         if (fanotify_is_perm_event(mask)) {
>                 struct fanotify_perm_event *pevent;
> @@ -334,7 +335,7 @@ init: __maybe_unused
>                 event->path.dentry = NULL;
>         }
>  out:
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>         return event;
>  }
>
> diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
> index d510223d302c..776ce66aaa47 100644
> --- a/fs/notify/inotify/inotify_fsnotify.c
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group,
>         int ret;
>         int len = 0;
>         int alloc_len = sizeof(struct inotify_event_info);
> +       struct mem_cgroup *oldmemcg;
>
>         if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info)))
>                 return 0;
> @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group,
>          * trigger OOM killer in the target monitoring memcg as it may have
>          * security repercussion.
>          */
> -       memalloc_use_memcg(group->memcg);
> +       oldmemcg = memalloc_use_memcg(group->memcg);
>         event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
> -       memalloc_unuse_memcg();
> +       memalloc_use_memcg(oldmemcg);
>
>         if (unlikely(!event)) {
>                 /*
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index c49257a3b510..ced06c12daf7 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
>   * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
>   * given memcg.
>   *
> - * NOTE: This function is not nesting safe.
> - */
> -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> -{
> -       WARN_ON_ONCE(current->active_memcg);
> -       current->active_memcg = memcg;
> -}
> -
> -/**
> - * memalloc_unuse_memcg - Ends the remote memcg charging scope.
> + * NOTE: This function can nest. Users must save the return value and
> + * reset the previous value after their own charging scope is over:
>   *
> - * This function marks the end of the remote memcg charging scope started by
> - * memalloc_use_memcg().
> + *      old = memalloc_use_memcg(memcg);
> + *      // ... allocations ...
> + *      memalloc_use_memcg(old);
>   */
> -static inline void memalloc_unuse_memcg(void)
> +static inline struct mem_cgroup *__must_check
> +memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> -       current->active_memcg = NULL;
> +       struct mem_cgroup *old = current->active_memcg;
> +
> +       current->active_memcg = memcg;
> +       return old;
>  }
>  #else
> -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> -{
> -}
> -
> -static inline void memalloc_unuse_memcg(void)
> +static inline struct mem_cgroup *__must_check
> +memalloc_use_memcg(struct mem_cgroup *memcg)
>  {
> +       return NULL;
>  }
>  #endif
>
> --
> 2.24.1
>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 20:33         ` Johannes Weiner
@ 2020-03-03 20:59           ` Yang Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2020-03-03 20:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm,
	Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 3, 2020 at 12:33 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 03, 2020 at 11:42:31AM -0800, Yang Shi wrote:
> > On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > > >
> > > > > Hello, Naresh.
> > > > >
> > > > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > > > >
> > > > > This is
> > > > >
> > > > > /**
> > > > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > > > >  * @memcg: memcg to charge.
> > > > >  *
> > > > >  * This function marks the beginning of the remote memcg charging scope. All the
> > > > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > > > >  * given memcg.
> > > > >  *
> > > > >  * NOTE: This function is not nesting safe.
> > > > >  */
> > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > > > {
> > > > >         WARN_ON_ONCE(current->active_memcg);
> > > > >         current->active_memcg = memcg;
> > > > > }
> > > > >
> > > > > which is about memcg. Redirecting to linux-mm.
> > > >
> > > > Isn't this triggered by ("loop: use worker per cgroup instead of
> > > > kworker") in linux-next, which converted loop driver to use worker per
> > > > cgroup, so it may have multiple workers work at the mean time?
> > > >
> > > > So they may share the same "current", then it may cause kind of nested
> > > > call to memalloc_use_memcg().
> > > >
> > > > Could you please try the below debug patch? This is not the proper
> > > > fix, but it may help us narrow down the problem.
> > > >
> > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > > index c49257a..1cc1cdc 100644
> > > > --- a/include/linux/sched/mm.h
> > > > +++ b/include/linux/sched/mm.h
> > > > @@ -320,6 +320,10 @@ static inline void
> > > > memalloc_nocma_restore(unsigned int flags)
> > > >   */
> > > >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > >  {
> > > > +       if ((current->flags & PF_KTHREAD) &&
> > > > +            current->active_memcg)
> > > > +               return;
> > > > +
> > > >         WARN_ON_ONCE(current->active_memcg);
> > > >         current->active_memcg = memcg;
> > > >  }
> > > >
> > >
> > > Maybe it's time to make memalloc_use_memcg() nesting safe.
> >
> > Need handle the below case:
> >
> >             CPU A                                          CPU B
> > memalloc_use_memcg
> >                                                      memalloc_use_memcg
> > memalloc_unuse_memcg
> >                                                      memalloc_unuse_memcg
> >
> >
> > They may manipulate the same task->active_memcg, so CPU B may still
> > see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B
> > may not restore active_memcg to NULL. And, some code depends on
> > correct active_memcg.
>
> It's safe because it's only `current` updating a private pointer -
> nobody is changing active_memcg of a different task. And a task cannot
> run on more than one CPU simultaneously.

Yes, you are correct.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 20:40         ` Shakeel Butt
@ 2020-03-03 21:06           ` Johannes Weiner
  2020-03-03 23:22             ` Shakeel Butt
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Weiner @ 2020-03-03 21:06 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 03, 2020 at 12:40:33PM -0800, Shakeel Butt wrote:
> On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Tue, 3 Mar 2020 15:15:39 -0500
> > Subject: [PATCH] mm: support nesting memalloc_use_memcg()
> >
> > The memalloc_use_memcg() function to override the default memcg
> > accounting context currently doesn't nest. But the patches to make the
> > loop driver cgroup-aware will end up nesting:
> >
> > [   98.137605]  alloc_page_buffers+0x210/0x288
> > [   98.141799]  __getblk_gfp+0x1d4/0x400
> > [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> > [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> > [   98.154821]  ext4_mb_init_group+0x270/0x390
> > [   98.159014]  ext4_mb_good_group+0x264/0x270
> > [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> > [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> > [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> > [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> > [   98.180592]  ext4_writepages+0x830/0xf10
> > [   98.184523]  do_writepages+0xb4/0x198
> > [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> > [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> > [   98.197974]  ext4_punch_hole+0x4a4/0x660
> > [   98.201907]  ext4_fallocate+0x294/0x1190
> > [   98.205839]  loop_process_work+0x690/0x1100
> > [   98.210032]  loop_workfn+0x2c/0x110
> > [   98.213529]  process_one_work+0x3e0/0x648
> > [   98.217546]  worker_thread+0x70/0x670
> > [   98.221217]  kthread+0x1b8/0x1c0
> > [   98.224452]  ret_from_fork+0x10/0x18
> >
> > where loop_process_work() sets the memcg override to the memcg that
> > submitted the IO request, and alloc_page_buffers() sets the override
> > to the memcg that instantiated the cache page, which may differ.
> >
> > Make memalloc_use_memcg() return the old memcg and convert existing
> > users to a stacking model. Delete the unused memalloc_unuse_memcg().
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Thanks Shakeel

> > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> >   * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> >   * given memcg.
> >   *
> > - * NOTE: This function is not nesting safe.
> > - */
> > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > -{
> > -       WARN_ON_ONCE(current->active_memcg);
> > -       current->active_memcg = memcg;
> > -}
> > -
> > -/**
> > - * memalloc_unuse_memcg - Ends the remote memcg charging scope.
> > + * NOTE: This function can nest. Users must save the return value and
> > + * reset the previous value after their own charging scope is over:
> 
> Should we mention that this is still not irq safe? At the moment other
> than skmem we don't do memcg charging in the interrupt and skmem has a
> memcg already associated with it. Maybe in future there might be a
> case where we want a specific memcg be charged for kmem in the
> interrupt context. Until then we can mark that this function should
> not be used in interrupt.

Is it actually unsafe? It's not an RMW operation, being interrupted
doesn't corrupt its state.

I.e. this is safe:

  process:                            interrupt:
  old = current->active_memcg
                                      old = current->active_memcg
                                      current->active_memcg = new
                                      allocate
                                      current->active_memcg = old
  current->active_memcg = new
  return old

This is safe as well:

  process:                            interrupt:
  old = current->active_memcg
  current->active_memcg = new
                                      old = current->active_memcg
                                      current->active_memcg = new
                                      allocate
                                      current->active_memcg = old
  return old


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 21:06           ` Johannes Weiner
@ 2020-03-03 23:22             ` Shakeel Butt
  2020-03-04  0:29               ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2020-03-03 23:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton,
	Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 3, 2020 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 03, 2020 at 12:40:33PM -0800, Shakeel Butt wrote:
> > On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > Date: Tue, 3 Mar 2020 15:15:39 -0500
> > > Subject: [PATCH] mm: support nesting memalloc_use_memcg()
> > >
> > > The memalloc_use_memcg() function to override the default memcg
> > > accounting context currently doesn't nest. But the patches to make the
> > > loop driver cgroup-aware will end up nesting:
> > >
> > > [   98.137605]  alloc_page_buffers+0x210/0x288
> > > [   98.141799]  __getblk_gfp+0x1d4/0x400
> > > [   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
> > > [   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
> > > [   98.154821]  ext4_mb_init_group+0x270/0x390
> > > [   98.159014]  ext4_mb_good_group+0x264/0x270
> > > [   98.163208]  ext4_mb_regular_allocator+0x480/0x798
> > > [   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
> > > [   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
> > > [   98.176660]  ext4_map_blocks+0x1b8/0x8a0
> > > [   98.180592]  ext4_writepages+0x830/0xf10
> > > [   98.184523]  do_writepages+0xb4/0x198
> > > [   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
> > > [   98.193086]  filemap_write_and_wait_range+0x40/0xb0
> > > [   98.197974]  ext4_punch_hole+0x4a4/0x660
> > > [   98.201907]  ext4_fallocate+0x294/0x1190
> > > [   98.205839]  loop_process_work+0x690/0x1100
> > > [   98.210032]  loop_workfn+0x2c/0x110
> > > [   98.213529]  process_one_work+0x3e0/0x648
> > > [   98.217546]  worker_thread+0x70/0x670
> > > [   98.221217]  kthread+0x1b8/0x1c0
> > > [   98.224452]  ret_from_fork+0x10/0x18
> > >
> > > where loop_process_work() sets the memcg override to the memcg that
> > > submitted the IO request, and alloc_page_buffers() sets the override
> > > to the memcg that instantiated the cache page, which may differ.
> > >
> > > Make memalloc_use_memcg() return the old memcg and convert existing
> > > users to a stacking model. Delete the unused memalloc_unuse_memcg().
> > >
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
>
> Thanks Shakeel
>
> > > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> > >   * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > >   * given memcg.
> > >   *
> > > - * NOTE: This function is not nesting safe.
> > > - */
> > > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > -{
> > > -       WARN_ON_ONCE(current->active_memcg);
> > > -       current->active_memcg = memcg;
> > > -}
> > > -
> > > -/**
> > > - * memalloc_unuse_memcg - Ends the remote memcg charging scope.
> > > + * NOTE: This function can nest. Users must save the return value and
> > > + * reset the previous value after their own charging scope is over:
> >
> > Should we mention that this is still not irq safe? At the moment other
> > than skmem we don't do memcg charging in the interrupt and skmem has a
> > memcg already associated with it. Maybe in future there might be a
> > case where we want a specific memcg be charged for kmem in the
> > interrupt context. Until then we can mark that this function should
> > not be used in interrupt.
>
> Is it actually unsafe? It's not an RMW operation, being interrupted
> doesn't corrupt its state.
>
> I.e. this is safe:
>
>   process:                            interrupt:
>   old = current->active_memcg
>                                       old = current->active_memcg
>                                       current->active_memcg = new
>                                       allocate
>                                       current->active_memcg = old
>   current->active_memcg = new
>   return old
>
> This is safe as well:
>
>   process:                            interrupt:
>   old = current->active_memcg
>   current->active_memcg = new
>                                       old = current->active_memcg
>                                       current->active_memcg = new
>                                       allocate
>                                       current->active_memcg = old
>   return old

Yes, you are right. Thanks for the explanation.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-03 23:22             ` Shakeel Butt
@ 2020-03-04  0:29               ` Andrew Morton
  2020-04-20 16:41                 ` Shakeel Butt
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2020-03-04  0:29 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Yang Shi, Tetsuo Handa, Naresh Kamboju,
	linux-mm, Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote:

> > doesn't corrupt its state.
> >
> > I.e. this is safe:
> >
> >   process:                            interrupt:
> >   old = current->active_memcg
> >                                       old = current->active_memcg
> >                                       current->active_memcg = new
> >                                       allocate
> >                                       current->active_memcg = old
> >   current->active_memcg = new
> >   return old
> >
> > This is safe as well:
> >
> >   process:                            interrupt:
> >   old = current->active_memcg
> >   current->active_memcg = new
> >                                       old = current->active_memcg
> >                                       current->active_memcg = new
> >                                       allocate
> >                                       current->active_memcg = old
> >   return old
> 
> Yes, you are right. Thanks for the explanation.

Thanks, all.

Dan, I dropped the three-patch series

loop-use-worker-per-cgroup-instead-of-kworker.patch
mm-charge-active-memcg-when-no-mm-is-set.patch
loop-charge-i-o-to-mem-and-blk-cg.patch

When convenient, could you please prepend Johannes' patch to the
series, retest and resend?  While doing so, please figure out why you
didn't also see this warning in your testing and make the appropriate
changes!

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-03-04  0:29               ` Andrew Morton
@ 2020-04-20 16:41                 ` Shakeel Butt
  2020-04-20 22:45                   ` Dan Schatzberg
  0 siblings, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2020-04-20 16:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Yang Shi, Tetsuo Handa, Naresh Kamboju,
	linux-mm, Mel Gorman, Michal Hocko, Dan Schatzberg

On Tue, Mar 3, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote:
>
> > > doesn't corrupt its state.
> > >
> > > I.e. this is safe:
> > >
> > >   process:                            interrupt:
> > >   old = current->active_memcg
> > >                                       old = current->active_memcg
> > >                                       current->active_memcg = new
> > >                                       allocate
> > >                                       current->active_memcg = old
> > >   current->active_memcg = new
> > >   return old
> > >
> > > This is safe as well:
> > >
> > >   process:                            interrupt:
> > >   old = current->active_memcg
> > >   current->active_memcg = new
> > >                                       old = current->active_memcg
> > >                                       current->active_memcg = new
> > >                                       allocate
> > >                                       current->active_memcg = old
> > >   return old
> >
> > Yes, you are right. Thanks for the explanation.
>
> Thanks, all.
>
> Dan, I dropped the three-patch series
>
> loop-use-worker-per-cgroup-instead-of-kworker.patch
> mm-charge-active-memcg-when-no-mm-is-set.patch
> loop-charge-i-o-to-mem-and-blk-cg.patch
>
> When convenient, could you please prepend Johannes' patch to the
> series, retest and resend?  While doing so, please figure out why you
> didn't also see this warning in your testing and make the appropriate
> changes!
>

Hi Dan, did you get the chance to follow up on Andrew's request?

Shakeel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-04-20 16:41                 ` Shakeel Butt
@ 2020-04-20 22:45                   ` Dan Schatzberg
  2020-04-21  5:02                     ` Naresh Kamboju
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Schatzberg @ 2020-04-20 22:45 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Yang Shi, Tetsuo Handa,
	Naresh Kamboju, linux-mm, Mel Gorman, Michal Hocko

On Mon, Apr 20, 2020 at 09:41:33AM -0700, Shakeel Butt wrote:
> On Tue, Mar 3, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote:
> >
> > > > doesn't corrupt its state.
> > > >
> > > > I.e. this is safe:
> > > >
> > > >   process:                            interrupt:
> > > >   old = current->active_memcg
> > > >                                       old = current->active_memcg
> > > >                                       current->active_memcg = new
> > > >                                       allocate
> > > >                                       current->active_memcg = old
> > > >   current->active_memcg = new
> > > >   return old
> > > >
> > > > This is safe as well:
> > > >
> > > >   process:                            interrupt:
> > > >   old = current->active_memcg
> > > >   current->active_memcg = new
> > > >                                       old = current->active_memcg
> > > >                                       current->active_memcg = new
> > > >                                       allocate
> > > >                                       current->active_memcg = old
> > > >   return old
> > >
> > > Yes, you are right. Thanks for the explanation.
> >
> > Thanks, all.
> >
> > Dan, I dropped the three-patch series
> >
> > loop-use-worker-per-cgroup-instead-of-kworker.patch
> > mm-charge-active-memcg-when-no-mm-is-set.patch
> > loop-charge-i-o-to-mem-and-blk-cg.patch
> >
> > When convenient, could you please prepend Johannes' patch to the
> > series, retest and resend?  While doing so, please figure out why you
> > didn't also see this warning in your testing and make the appropriate
> > changes!
> >
> 
> Hi Dan, did you get the chance to follow up on Andrew's request?
> 
> Shakeel

Just sent out a new patch series


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
  2020-04-20 22:45                   ` Dan Schatzberg
@ 2020-04-21  5:02                     ` Naresh Kamboju
  0 siblings, 0 replies; 19+ messages in thread
From: Naresh Kamboju @ 2020-04-21  5:02 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Yang Shi,
	Tetsuo Handa, linux-mm, Mel Gorman, Michal Hocko

Dan,

>
> Just sent out a new patch series

Please add reported by if you feel it is appropriate to your patch series.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

- Naresh


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-04-21  5:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+G9fYs==eMEmY_OpdhyCHO_1Z5f_M8CAQQTh-AOf5xAvBHKAQ@mail.gmail.com>
2020-03-03 10:52 ` fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Tetsuo Handa
2020-03-03 17:47   ` Yang Shi
2020-03-03 18:14     ` Shakeel Butt
2020-03-03 18:34       ` Yang Shi
2020-03-03 19:42       ` Yang Shi
2020-03-03 20:26         ` Shakeel Butt
2020-03-03 20:33         ` Johannes Weiner
2020-03-03 20:59           ` Yang Shi
2020-03-03 20:26       ` Johannes Weiner
2020-03-03 20:40         ` Shakeel Butt
2020-03-03 21:06           ` Johannes Weiner
2020-03-03 23:22             ` Shakeel Butt
2020-03-04  0:29               ` Andrew Morton
2020-04-20 16:41                 ` Shakeel Butt
2020-04-20 22:45                   ` Dan Schatzberg
2020-04-21  5:02                     ` Naresh Kamboju
2020-03-03 20:57         ` Yang Shi
2020-03-03 18:40     ` Naresh Kamboju
2020-03-03 19:04       ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).