* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs [not found] <CA+G9fYs==eMEmY_OpdhyCHO_1Z5f_M8CAQQTh-AOf5xAvBHKAQ@mail.gmail.com> @ 2020-03-03 10:52 ` Tetsuo Handa 2020-03-03 17:47 ` Yang Shi 0 siblings, 1 reply; 19+ messages in thread From: Tetsuo Handa @ 2020-03-03 10:52 UTC (permalink / raw) To: Naresh Kamboju, linux-mm; +Cc: Andrew Morton, Mel Gorman, Michal Hocko Hello, Naresh. > [ 98.003346] WARNING: CPU: 2 PID: 340 at > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 This is /** * memalloc_use_memcg - Starts the remote memcg charging scope. * @memcg: memcg to charge. * * This function marks the beginning of the remote memcg charging scope. All the * __GFP_ACCOUNT allocations till the end of the scope will be charged to the * given memcg. * * NOTE: This function is not nesting safe. */ static inline void memalloc_use_memcg(struct mem_cgroup *memcg) { WARN_ON_ONCE(current->active_memcg); current->active_memcg = memcg; } which is about memcg. Redirecting to linux-mm. Please include exact backtrace extracted from scripts/faddr2line . ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 10:52 ` fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Tetsuo Handa @ 2020-03-03 17:47 ` Yang Shi 2020-03-03 18:14 ` Shakeel Butt 2020-03-03 18:40 ` Naresh Kamboju 0 siblings, 2 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 17:47 UTC (permalink / raw) To: Tetsuo Handa Cc: Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Shakeel Butt, schatzberg.dan On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Hello, Naresh. > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > This is > > /** > * memalloc_use_memcg - Starts the remote memcg charging scope. > * @memcg: memcg to charge. > * > * This function marks the beginning of the remote memcg charging scope. All the > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > * given memcg. > * > * NOTE: This function is not nesting safe. > */ > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > { > WARN_ON_ONCE(current->active_memcg); > current->active_memcg = memcg; > } > > which is about memcg. Redirecting to linux-mm. Isn't this triggered by ("loop: use worker per cgroup instead of kworker") in linux-next, which converted loop driver to use worker per cgroup, so it may have multiple workers work at the mean time? So they may share the same "current", then it may cause kind of nested call to memalloc_use_memcg(). Could you please try the below debug patch? This is not the proper fix, but it may help us narrow down the problem. diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index c49257a..1cc1cdc 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -320,6 +320,10 @@ static inline void memalloc_nocma_restore(unsigned int flags) */ static inline void memalloc_use_memcg(struct mem_cgroup *memcg) { + if ((current->flags & PF_KTHREAD) && + current->active_memcg) + return; + WARN_ON_ONCE(current->active_memcg); current->active_memcg = memcg; } > > Please include exact backtrace extracted from scripts/faddr2line . > > ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 17:47 ` Yang Shi @ 2020-03-03 18:14 ` Shakeel Butt 2020-03-03 18:34 ` Yang Shi ` (2 more replies) 2020-03-03 18:40 ` Naresh Kamboju 1 sibling, 3 replies; 19+ messages in thread From: Shakeel Butt @ 2020-03-03 18:14 UTC (permalink / raw) To: Yang Shi Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > Hello, Naresh. > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > This is > > > > /** > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > * @memcg: memcg to charge. > > * > > * This function marks the beginning of the remote memcg charging scope. All the > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > * given memcg. > > * > > * NOTE: This function is not nesting safe. > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > > > which is about memcg. Redirecting to linux-mm. > > Isn't this triggered by ("loop: use worker per cgroup instead of > kworker") in linux-next, which converted loop driver to use worker per > cgroup, so it may have multiple workers work at the mean time? > > So they may share the same "current", then it may cause kind of nested > call to memalloc_use_memcg(). > > Could you please try the below debug patch? This is not the proper > fix, but it may help us narrow down the problem. > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index c49257a..1cc1cdc 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -320,6 +320,10 @@ static inline void > memalloc_nocma_restore(unsigned int flags) > */ > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > { > + if ((current->flags & PF_KTHREAD) && > + current->active_memcg) > + return; > + > WARN_ON_ONCE(current->active_memcg); > current->active_memcg = memcg; > } > Maybe it's time to make memalloc_use_memcg() nesting safe. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 18:14 ` Shakeel Butt @ 2020-03-03 18:34 ` Yang Shi 2020-03-03 19:42 ` Yang Shi 2020-03-03 20:26 ` Johannes Weiner 2 siblings, 0 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 18:34 UTC (permalink / raw) To: Shakeel Butt Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote: > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > Hello, Naresh. > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > This is > > > > > > /** > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > * @memcg: memcg to charge. > > > * > > > * This function marks the beginning of the remote memcg charging scope. All the > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > * given memcg. > > > * > > > * NOTE: This function is not nesting safe. > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > which is about memcg. Redirecting to linux-mm. > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > kworker") in linux-next, which converted loop driver to use worker per > > cgroup, so it may have multiple workers work at the mean time? > > > > So they may share the same "current", then it may cause kind of nested > > call to memalloc_use_memcg(). > > > > Could you please try the below debug patch? This is not the proper > > fix, but it may help us narrow down the problem. > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > index c49257a..1cc1cdc 100644 > > --- a/include/linux/sched/mm.h > > +++ b/include/linux/sched/mm.h > > @@ -320,6 +320,10 @@ static inline void > > memalloc_nocma_restore(unsigned int flags) > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > + if ((current->flags & PF_KTHREAD) && > > + current->active_memcg) > > + return; > > + > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. That would be preferred. Maybe like: static inline void memalloc_use_memcg(struct mem_cgroup *memcg, struct mem_cgroup *saved) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 18:14 ` Shakeel Butt 2020-03-03 18:34 ` Yang Shi @ 2020-03-03 19:42 ` Yang Shi 2020-03-03 20:26 ` Shakeel Butt 2020-03-03 20:33 ` Johannes Weiner 2020-03-03 20:26 ` Johannes Weiner 2 siblings, 2 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 19:42 UTC (permalink / raw) To: Shakeel Butt Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote: > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > Hello, Naresh. > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > This is > > > > > > /** > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > * @memcg: memcg to charge. > > > * > > > * This function marks the beginning of the remote memcg charging scope. All the > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > * given memcg. > > > * > > > * NOTE: This function is not nesting safe. > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > which is about memcg. Redirecting to linux-mm. > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > kworker") in linux-next, which converted loop driver to use worker per > > cgroup, so it may have multiple workers work at the mean time? > > > > So they may share the same "current", then it may cause kind of nested > > call to memalloc_use_memcg(). > > > > Could you please try the below debug patch? This is not the proper > > fix, but it may help us narrow down the problem. > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > index c49257a..1cc1cdc 100644 > > --- a/include/linux/sched/mm.h > > +++ b/include/linux/sched/mm.h > > @@ -320,6 +320,10 @@ static inline void > > memalloc_nocma_restore(unsigned int flags) > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > + if ((current->flags & PF_KTHREAD) && > > + current->active_memcg) > > + return; > > + > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. Need handle the below case: CPU A CPU B memalloc_use_memcg memalloc_use_memcg memalloc_unuse_memcg memalloc_unuse_memcg They may manipulate the same task->active_memcg, so CPU B may still see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B may not restore active_memcg to NULL. And, some code depends on correct active_memcg. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 19:42 ` Yang Shi @ 2020-03-03 20:26 ` Shakeel Butt 2020-03-03 20:33 ` Johannes Weiner 1 sibling, 0 replies; 19+ messages in thread From: Shakeel Butt @ 2020-03-03 20:26 UTC (permalink / raw) To: Yang Shi Cc: Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg, Johannes Weiner On Tue, Mar 3, 2020 at 11:42 AM Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote: > > > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > Hello, Naresh. > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > This is > > > > > > > > /** > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > * @memcg: memcg to charge. > > > > * > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > * given memcg. > > > > * > > > > * NOTE: This function is not nesting safe. > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > kworker") in linux-next, which converted loop driver to use worker per > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > So they may share the same "current", then it may cause kind of nested > > > call to memalloc_use_memcg(). > > > > > > Could you please try the below debug patch? This is not the proper > > > fix, but it may help us narrow down the problem. > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > index c49257a..1cc1cdc 100644 > > > --- a/include/linux/sched/mm.h > > > +++ b/include/linux/sched/mm.h > > > @@ -320,6 +320,10 @@ static inline void > > > memalloc_nocma_restore(unsigned int flags) > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > + if ((current->flags & PF_KTHREAD) && > > > + current->active_memcg) > > > + return; > > > + > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > Need handle the below case: > > CPU A CPU B > memalloc_use_memcg > memalloc_use_memcg > memalloc_unuse_memcg > memalloc_unuse_memcg > > > They may manipulate the same task->active_memcg, so CPU B may still > see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B > may not restore active_memcg to NULL. And, some code depends on > correct active_memcg. Sorry for asking a simple question. What does it mean that more than one kworkers share the same 'current'? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 19:42 ` Yang Shi 2020-03-03 20:26 ` Shakeel Butt @ 2020-03-03 20:33 ` Johannes Weiner 2020-03-03 20:59 ` Yang Shi 1 sibling, 1 reply; 19+ messages in thread From: Johannes Weiner @ 2020-03-03 20:33 UTC (permalink / raw) To: Yang Shi Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 03, 2020 at 11:42:31AM -0800, Yang Shi wrote: > On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote: > > > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > Hello, Naresh. > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > This is > > > > > > > > /** > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > * @memcg: memcg to charge. > > > > * > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > * given memcg. > > > > * > > > > * NOTE: This function is not nesting safe. > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > kworker") in linux-next, which converted loop driver to use worker per > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > So they may share the same "current", then it may cause kind of nested > > > call to memalloc_use_memcg(). > > > > > > Could you please try the below debug patch? This is not the proper > > > fix, but it may help us narrow down the problem. > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > index c49257a..1cc1cdc 100644 > > > --- a/include/linux/sched/mm.h > > > +++ b/include/linux/sched/mm.h > > > @@ -320,6 +320,10 @@ static inline void > > > memalloc_nocma_restore(unsigned int flags) > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > + if ((current->flags & PF_KTHREAD) && > > > + current->active_memcg) > > > + return; > > > + > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > Need handle the below case: > > CPU A CPU B > memalloc_use_memcg > memalloc_use_memcg > memalloc_unuse_memcg > memalloc_unuse_memcg > > > They may manipulate the same task->active_memcg, so CPU B may still > see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B > may not restore active_memcg to NULL. And, some code depends on > correct active_memcg. It's safe because it's only `current` updating a private pointer - nobody is changing active_memcg of a different task. And a task cannot run on more than one CPU simultaneously. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 20:33 ` Johannes Weiner @ 2020-03-03 20:59 ` Yang Shi 0 siblings, 0 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 20:59 UTC (permalink / raw) To: Johannes Weiner Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 3, 2020 at 12:33 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Mar 03, 2020 at 11:42:31AM -0800, Yang Shi wrote: > > On Tue, Mar 3, 2020 at 10:15 AM Shakeel Butt <shakeelb@google.com> wrote: > > > > > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > > > Hello, Naresh. > > > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > > > This is > > > > > > > > > > /** > > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > > * @memcg: memcg to charge. > > > > > * > > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > > * given memcg. > > > > > * > > > > > * NOTE: This function is not nesting safe. > > > > > */ > > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > > { > > > > > WARN_ON_ONCE(current->active_memcg); > > > > > current->active_memcg = memcg; > > > > > } > > > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > > kworker") in linux-next, which converted loop driver to use worker per > > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > > > So they may share the same "current", then it may cause kind of nested > > > > call to memalloc_use_memcg(). > > > > > > > > Could you please try the below debug patch? This is not the proper > > > > fix, but it may help us narrow down the problem. > > > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > > index c49257a..1cc1cdc 100644 > > > > --- a/include/linux/sched/mm.h > > > > +++ b/include/linux/sched/mm.h > > > > @@ -320,6 +320,10 @@ static inline void > > > > memalloc_nocma_restore(unsigned int flags) > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > + if ((current->flags & PF_KTHREAD) && > > > > + current->active_memcg) > > > > + return; > > > > + > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > > > Need handle the below case: > > > > CPU A CPU B > > memalloc_use_memcg > > memalloc_use_memcg > > memalloc_unuse_memcg > > memalloc_unuse_memcg > > > > > > They may manipulate the same task->active_memcg, so CPU B may still > > see wrong memcg, and the last call to memalloc_unuse_memcg() on CPU B > > may not restore active_memcg to NULL. And, some code depends on > > correct active_memcg. > > It's safe because it's only `current` updating a private pointer - > nobody is changing active_memcg of a different task. And a task cannot > run on more than one CPU simultaneously. Yes, you are correct. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 18:14 ` Shakeel Butt 2020-03-03 18:34 ` Yang Shi 2020-03-03 19:42 ` Yang Shi @ 2020-03-03 20:26 ` Johannes Weiner 2020-03-03 20:40 ` Shakeel Butt 2020-03-03 20:57 ` Yang Shi 2 siblings, 2 replies; 19+ messages in thread From: Johannes Weiner @ 2020-03-03 20:26 UTC (permalink / raw) To: Shakeel Butt Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote: > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > Hello, Naresh. > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > This is > > > > > > /** > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > * @memcg: memcg to charge. > > > * > > > * This function marks the beginning of the remote memcg charging scope. All the > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > * given memcg. > > > * > > > * NOTE: This function is not nesting safe. > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > which is about memcg. Redirecting to linux-mm. > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > kworker") in linux-next, which converted loop driver to use worker per > > cgroup, so it may have multiple workers work at the mean time? > > > > So they may share the same "current", then it may cause kind of nested > > call to memalloc_use_memcg(). > > > > Could you please try the below debug patch? This is not the proper > > fix, but it may help us narrow down the problem. > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > index c49257a..1cc1cdc 100644 > > --- a/include/linux/sched/mm.h > > +++ b/include/linux/sched/mm.h > > @@ -320,6 +320,10 @@ static inline void > > memalloc_nocma_restore(unsigned int flags) > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > + if ((current->flags & PF_KTHREAD) && > > + current->active_memcg) > > + return; > > + > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. Yes, I think so. The stack trace: [ 98.137605] alloc_page_buffers+0x210/0x288 [ 98.141799] __getblk_gfp+0x1d4/0x400 [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 [ 98.154821] ext4_mb_init_group+0x270/0x390 [ 98.159014] ext4_mb_good_group+0x264/0x270 [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 [ 98.180592] ext4_writepages+0x830/0xf10 [ 98.184523] do_writepages+0xb4/0x198 [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 [ 98.197974] ext4_punch_hole+0x4a4/0x660 [ 98.201907] ext4_fallocate+0x294/0x1190 [ 98.205839] loop_process_work+0x690/0x1100 [ 98.210032] loop_workfn+0x2c/0x110 [ 98.213529] process_one_work+0x3e0/0x648 [ 98.217546] worker_thread+0x70/0x670 [ 98.221217] kthread+0x1b8/0x1c0 [ 98.224452] ret_from_fork+0x10/0x18 The loop kworker is instantiating cache pages on behalf of who queued the io request, but if the page already exists, the buffers should be allocated on behalf of who already owns the page. Nesting makes sense. Since the only difference between the use and unuse function is the warn when we nest, we can remove the unuse and do something like: old = memalloc_use_memcg(memcg); memalloc_use_memcg(old); What do you think? Patch below. It should go in before Dan's patches, and they in turn need a small update to save and restore active_memcg. (Since loop's use is from a kworker, it's unlikely that there is an outer scope. But it's probably best to keep this simple and robust.) --- From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@cmpxchg.org> Date: Tue, 3 Mar 2020 15:15:39 -0500 Subject: [PATCH] mm: support nesting memalloc_use_memcg() The memalloc_use_memcg() function to override the default memcg accounting context currently doesn't nest. But the patches to make the loop driver cgroup-aware will end up nesting: [ 98.137605] alloc_page_buffers+0x210/0x288 [ 98.141799] __getblk_gfp+0x1d4/0x400 [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 [ 98.154821] ext4_mb_init_group+0x270/0x390 [ 98.159014] ext4_mb_good_group+0x264/0x270 [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 [ 98.180592] ext4_writepages+0x830/0xf10 [ 98.184523] do_writepages+0xb4/0x198 [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 [ 98.197974] ext4_punch_hole+0x4a4/0x660 [ 98.201907] ext4_fallocate+0x294/0x1190 [ 98.205839] loop_process_work+0x690/0x1100 [ 98.210032] loop_workfn+0x2c/0x110 [ 98.213529] process_one_work+0x3e0/0x648 [ 98.217546] worker_thread+0x70/0x670 [ 98.221217] kthread+0x1b8/0x1c0 [ 98.224452] ret_from_fork+0x10/0x18 where loop_process_work() sets the memcg override to the memcg that submitted the IO request, and alloc_page_buffers() sets the override to the memcg that instantiated the cache page, which may differ. Make memalloc_use_memcg() return the old memcg and convert existing users to a stacking model. Delete the unused memalloc_unuse_memcg(). Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- fs/buffer.c | 6 ++--- fs/notify/fanotify/fanotify.c | 5 +++-- fs/notify/inotify/inotify_fsnotify.c | 5 +++-- include/linux/sched/mm.h | 33 ++++++++++++---------------- 4 files changed, 23 insertions(+), 26 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index d8c7242426bb..54d5df14bd36 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *bh, *head; gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; long offset; - struct mem_cgroup *memcg; + struct mem_cgroup *memcg, *oldmemcg; if (retry) gfp |= __GFP_NOFAIL; memcg = get_mem_cgroup_from_page(page); - memalloc_use_memcg(memcg); + oldmemcg = memalloc_use_memcg(memcg); head = NULL; offset = PAGE_SIZE; @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); mem_cgroup_put(memcg); return head; /* diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 5778d1347b35..cb596ad002d8 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); + struct mem_cgroup *oldmemcg; /* * For queues with unlimited length lost events are not expected and @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp |= __GFP_RETRY_MAYFAIL; /* Whoever is interested in the event, pays for the allocation. */ - memalloc_use_memcg(group->memcg); + oldmemcg = memalloc_use_memcg(group->memcg); if (fanotify_is_perm_event(mask)) { struct fanotify_perm_event *pevent; @@ -334,7 +335,7 @@ init: __maybe_unused event->path.dentry = NULL; } out: - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); return event; } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index d510223d302c..776ce66aaa47 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group, int ret; int len = 0; int alloc_len = sizeof(struct inotify_event_info); + struct mem_cgroup *oldmemcg; if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) return 0; @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group, * trigger OOM killer in the target monitoring memcg as it may have * security repercussion. */ - memalloc_use_memcg(group->memcg); + oldmemcg = memalloc_use_memcg(group->memcg); event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); - memalloc_unuse_memcg(); + memalloc_use_memcg(oldmemcg); if (unlikely(!event)) { /* diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index c49257a3b510..ced06c12daf7 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) * __GFP_ACCOUNT allocations till the end of the scope will be charged to the * given memcg. * - * NOTE: This function is not nesting safe. - */ -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) -{ - WARN_ON_ONCE(current->active_memcg); - current->active_memcg = memcg; -} - -/** - * memalloc_unuse_memcg - Ends the remote memcg charging scope. + * NOTE: This function can nest. Users must save the return value and + * reset the previous value after their own charging scope is over: * - * This function marks the end of the remote memcg charging scope started by - * memalloc_use_memcg(). + * old = memalloc_use_memcg(memcg); + * // ... allocations ... + * memalloc_use_memcg(old); */ -static inline void memalloc_unuse_memcg(void) +static inline struct mem_cgroup *__must_check +memalloc_use_memcg(struct mem_cgroup *memcg) { - current->active_memcg = NULL; + struct mem_cgroup *old = current->active_memcg; + + current->active_memcg = memcg; + return old; } #else -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void memalloc_unuse_memcg(void) +static inline struct mem_cgroup *__must_check +memalloc_use_memcg(struct mem_cgroup *memcg) { + return NULL; } #endif -- 2.24.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 20:26 ` Johannes Weiner @ 2020-03-03 20:40 ` Shakeel Butt 2020-03-03 21:06 ` Johannes Weiner 2020-03-03 20:57 ` Yang Shi 1 sibling, 1 reply; 19+ messages in thread From: Shakeel Butt @ 2020-03-03 20:40 UTC (permalink / raw) To: Johannes Weiner Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote: > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > Hello, Naresh. > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > This is > > > > > > > > /** > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > * @memcg: memcg to charge. > > > > * > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > * given memcg. > > > > * > > > > * NOTE: This function is not nesting safe. > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > kworker") in linux-next, which converted loop driver to use worker per > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > So they may share the same "current", then it may cause kind of nested > > > call to memalloc_use_memcg(). > > > > > > Could you please try the below debug patch? This is not the proper > > > fix, but it may help us narrow down the problem. > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > index c49257a..1cc1cdc 100644 > > > --- a/include/linux/sched/mm.h > > > +++ b/include/linux/sched/mm.h > > > @@ -320,6 +320,10 @@ static inline void > > > memalloc_nocma_restore(unsigned int flags) > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > + if ((current->flags & PF_KTHREAD) && > > > + current->active_memcg) > > > + return; > > > + > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > Yes, I think so. The stack trace: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > The loop kworker is instantiating cache pages on behalf of who queued > the io request, but if the page already exists, the buffers should be > allocated on behalf of who already owns the page. Nesting makes sense. > > Since the only difference between the use and unuse function is the > warn when we nest, we can remove the unuse and do something like: > > old = memalloc_use_memcg(memcg); > memalloc_use_memcg(old); > > What do you think? SGTM. > Patch below. It should go in before Dan's patches, > and they in turn need a small update to save and restore active_memcg. > (Since loop's use is from a kworker, it's unlikely that there is an > outer scope. But it's probably best to keep this simple and robust.) > > --- > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@cmpxchg.org> > Date: Tue, 3 Mar 2020 15:15:39 -0500 > Subject: [PATCH] mm: support nesting memalloc_use_memcg() > > The memalloc_use_memcg() function to override the default memcg > accounting context currently doesn't nest. But the patches to make the > loop driver cgroup-aware will end up nesting: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > where loop_process_work() sets the memcg override to the memcg that > submitted the IO request, and alloc_page_buffers() sets the override > to the memcg that instantiated the cache page, which may differ. > > Make memalloc_use_memcg() return the old memcg and convert existing > users to a stacking model. Delete the unused memalloc_unuse_memcg(). > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> > --- > fs/buffer.c | 6 ++--- > fs/notify/fanotify/fanotify.c | 5 +++-- > fs/notify/inotify/inotify_fsnotify.c | 5 +++-- > include/linux/sched/mm.h | 33 ++++++++++++---------------- > 4 files changed, 23 insertions(+), 26 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index d8c7242426bb..54d5df14bd36 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > struct buffer_head *bh, *head; > gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; > long offset; > - struct mem_cgroup *memcg; > + struct mem_cgroup *memcg, *oldmemcg; > > if (retry) > gfp |= __GFP_NOFAIL; > > memcg = get_mem_cgroup_from_page(page); > - memalloc_use_memcg(memcg); > + oldmemcg = memalloc_use_memcg(memcg); > > head = NULL; > offset = PAGE_SIZE; > @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > set_bh_page(bh, page, offset); > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > mem_cgroup_put(memcg); > return head; > /* > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 5778d1347b35..cb596ad002d8 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > struct fanotify_event *event = NULL; > gfp_t gfp = GFP_KERNEL_ACCOUNT; > struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); > + struct mem_cgroup *oldmemcg; > > /* > * For queues with unlimited length lost events are not expected and > @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > gfp |= __GFP_RETRY_MAYFAIL; > > /* Whoever is interested in the event, pays for the allocation. */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > > if (fanotify_is_perm_event(mask)) { > struct fanotify_perm_event *pevent; > @@ -334,7 +335,7 @@ init: __maybe_unused > event->path.dentry = NULL; > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > return event; > } > > diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c > index d510223d302c..776ce66aaa47 100644 > --- a/fs/notify/inotify/inotify_fsnotify.c > +++ b/fs/notify/inotify/inotify_fsnotify.c > @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group, > int ret; > int len = 0; > int alloc_len = sizeof(struct inotify_event_info); > + struct mem_cgroup *oldmemcg; > > if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) > return 0; > @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group, > * trigger OOM killer in the target monitoring memcg as it may have > * security repercussion. > */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > > if (unlikely(!event)) { > /* > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index c49257a3b510..ced06c12daf7 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > * given memcg. > * > - * NOTE: This function is not nesting safe. > - */ > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > - WARN_ON_ONCE(current->active_memcg); > - current->active_memcg = memcg; > -} > - > -/** > - * memalloc_unuse_memcg - Ends the remote memcg charging scope. > + * NOTE: This function can nest. Users must save the return value and > + * reset the previous value after their own charging scope is over: Should we mention that this is still not irq safe? At the moment other than skmem we don't do memcg charging in the interrupt and skmem has a memcg already associated with it. Maybe in future there might be a case where we want a specific memcg be charged for kmem in the interrupt context. Until then we can mark that this function should not be used in interrupt. > * > - * This function marks the end of the remote memcg charging scope started by > - * memalloc_use_memcg(). > + * old = memalloc_use_memcg(memcg); > + * // ... allocations ... > + * memalloc_use_memcg(old); > */ > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > - current->active_memcg = NULL; > + struct mem_cgroup *old = current->active_memcg; > + > + current->active_memcg = memcg; > + return old; > } > #else > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > -} > - > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > + return NULL; > } > #endif > > -- > 2.24.1 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 20:40 ` Shakeel Butt @ 2020-03-03 21:06 ` Johannes Weiner 2020-03-03 23:22 ` Shakeel Butt 0 siblings, 1 reply; 19+ messages in thread From: Johannes Weiner @ 2020-03-03 21:06 UTC (permalink / raw) To: Shakeel Butt Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 03, 2020 at 12:40:33PM -0800, Shakeel Butt wrote: > On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner <hannes@cmpxchg.org> > > Date: Tue, 3 Mar 2020 15:15:39 -0500 > > Subject: [PATCH] mm: support nesting memalloc_use_memcg() > > > > The memalloc_use_memcg() function to override the default memcg > > accounting context currently doesn't nest. But the patches to make the > > loop driver cgroup-aware will end up nesting: > > > > [ 98.137605] alloc_page_buffers+0x210/0x288 > > [ 98.141799] __getblk_gfp+0x1d4/0x400 > > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > > [ 98.154821] ext4_mb_init_group+0x270/0x390 > > [ 98.159014] ext4_mb_good_group+0x264/0x270 > > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > > [ 98.180592] ext4_writepages+0x830/0xf10 > > [ 98.184523] do_writepages+0xb4/0x198 > > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > > [ 98.201907] ext4_fallocate+0x294/0x1190 > > [ 98.205839] loop_process_work+0x690/0x1100 > > [ 98.210032] loop_workfn+0x2c/0x110 > > [ 98.213529] process_one_work+0x3e0/0x648 > > [ 98.217546] worker_thread+0x70/0x670 > > [ 98.221217] kthread+0x1b8/0x1c0 > > [ 98.224452] ret_from_fork+0x10/0x18 > > > > where loop_process_work() sets the memcg override to the memcg that > > submitted the IO request, and alloc_page_buffers() sets the override > > to the memcg that instantiated the cache page, which may differ. > > > > Make memalloc_use_memcg() return the old memcg and convert existing > > users to a stacking model. Delete the unused memalloc_unuse_memcg(). > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > Reviewed-by: Shakeel Butt <shakeelb@google.com> Thanks Shakeel > > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > * given memcg. > > * > > - * NOTE: This function is not nesting safe. > > - */ > > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > -{ > > - WARN_ON_ONCE(current->active_memcg); > > - current->active_memcg = memcg; > > -} > > - > > -/** > > - * memalloc_unuse_memcg - Ends the remote memcg charging scope. > > + * NOTE: This function can nest. Users must save the return value and > > + * reset the previous value after their own charging scope is over: > > Should we mention that this is still not irq safe? At the moment other > than skmem we don't do memcg charging in the interrupt and skmem has a > memcg already associated with it. Maybe in future there might be a > case where we want a specific memcg be charged for kmem in the > interrupt context. Until then we can mark that this function should > not be used in interrupt. Is it actually unsafe? It's not an RMW operation, being interrupted doesn't corrupt its state. I.e. this is safe: process: interrupt: old = current->active_memcg old = current->active_memcg current->active_memcg = new allocate current->active_memcg = old current->active_memcg = new return old This is safe as well: process: interrupt: old = current->active_memcg current->active_memcg = new old = current->active_memcg current->active_memcg = new allocate current->active_memcg = old return old ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 21:06 ` Johannes Weiner @ 2020-03-03 23:22 ` Shakeel Butt 2020-03-04 0:29 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Shakeel Butt @ 2020-03-03 23:22 UTC (permalink / raw) To: Johannes Weiner Cc: Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 3, 2020 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Mar 03, 2020 at 12:40:33PM -0800, Shakeel Butt wrote: > > On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 > > > From: Johannes Weiner <hannes@cmpxchg.org> > > > Date: Tue, 3 Mar 2020 15:15:39 -0500 > > > Subject: [PATCH] mm: support nesting memalloc_use_memcg() > > > > > > The memalloc_use_memcg() function to override the default memcg > > > accounting context currently doesn't nest. But the patches to make the > > > loop driver cgroup-aware will end up nesting: > > > > > > [ 98.137605] alloc_page_buffers+0x210/0x288 > > > [ 98.141799] __getblk_gfp+0x1d4/0x400 > > > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > > > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > > > [ 98.154821] ext4_mb_init_group+0x270/0x390 > > > [ 98.159014] ext4_mb_good_group+0x264/0x270 > > > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > > > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > > > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > > > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > > > [ 98.180592] ext4_writepages+0x830/0xf10 > > > [ 98.184523] do_writepages+0xb4/0x198 > > > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > > > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > > > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > > > [ 98.201907] ext4_fallocate+0x294/0x1190 > > > [ 98.205839] loop_process_work+0x690/0x1100 > > > [ 98.210032] loop_workfn+0x2c/0x110 > > > [ 98.213529] process_one_work+0x3e0/0x648 > > > [ 98.217546] worker_thread+0x70/0x670 > > > [ 98.221217] kthread+0x1b8/0x1c0 > > > [ 98.224452] ret_from_fork+0x10/0x18 > > > > > > where loop_process_work() sets the memcg override to the memcg that > > > submitted the IO request, and alloc_page_buffers() sets the override > > > to the memcg that instantiated the cache page, which may differ. > > > > > > Make memalloc_use_memcg() return the old memcg and convert existing > > > users to a stacking model. Delete the unused memalloc_unuse_memcg(). > > > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > > > Reviewed-by: Shakeel Butt <shakeelb@google.com> > > Thanks Shakeel > > > > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > * given memcg. > > > * > > > - * NOTE: This function is not nesting safe. > > > - */ > > > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > -{ > > > - WARN_ON_ONCE(current->active_memcg); > > > - current->active_memcg = memcg; > > > -} > > > - > > > -/** > > > - * memalloc_unuse_memcg - Ends the remote memcg charging scope. > > > + * NOTE: This function can nest. Users must save the return value and > > > + * reset the previous value after their own charging scope is over: > > > > Should we mention that this is still not irq safe? At the moment other > > than skmem we don't do memcg charging in the interrupt and skmem has a > > memcg already associated with it. Maybe in future there might be a > > case where we want a specific memcg be charged for kmem in the > > interrupt context. Until then we can mark that this function should > > not be used in interrupt. > > Is it actually unsafe? It's not an RMW operation, being interrupted > doesn't corrupt its state. > > I.e. this is safe: > > process: interrupt: > old = current->active_memcg > old = current->active_memcg > current->active_memcg = new > allocate > current->active_memcg = old > current->active_memcg = new > return old > > This is safe as well: > > process: interrupt: > old = current->active_memcg > current->active_memcg = new > old = current->active_memcg > current->active_memcg = new > allocate > current->active_memcg = old > return old Yes, you are right. Thanks for the explanation. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 23:22 ` Shakeel Butt @ 2020-03-04 0:29 ` Andrew Morton 2020-04-20 16:41 ` Shakeel Butt 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2020-03-04 0:29 UTC (permalink / raw) To: Shakeel Butt Cc: Johannes Weiner, Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote: > > doesn't corrupt its state. > > > > I.e. this is safe: > > > > process: interrupt: > > old = current->active_memcg > > old = current->active_memcg > > current->active_memcg = new > > allocate > > current->active_memcg = old > > current->active_memcg = new > > return old > > > > This is safe as well: > > > > process: interrupt: > > old = current->active_memcg > > current->active_memcg = new > > old = current->active_memcg > > current->active_memcg = new > > allocate > > current->active_memcg = old > > return old > > Yes, you are right. Thanks for the explanation. Thanks, all. Dan, I dropped the three-patch series loop-use-worker-per-cgroup-instead-of-kworker.patch mm-charge-active-memcg-when-no-mm-is-set.patch loop-charge-i-o-to-mem-and-blk-cg.patch When convenient, could you please prepend Johannes' patch to the series, retest and resend? While doing so, please figure out why you didn't also see this warning in your testing and make the appropriate changes! Thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-04 0:29 ` Andrew Morton @ 2020-04-20 16:41 ` Shakeel Butt 2020-04-20 22:45 ` Dan Schatzberg 0 siblings, 1 reply; 19+ messages in thread From: Shakeel Butt @ 2020-04-20 16:41 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 3, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote: > > > > doesn't corrupt its state. > > > > > > I.e. this is safe: > > > > > > process: interrupt: > > > old = current->active_memcg > > > old = current->active_memcg > > > current->active_memcg = new > > > allocate > > > current->active_memcg = old > > > current->active_memcg = new > > > return old > > > > > > This is safe as well: > > > > > > process: interrupt: > > > old = current->active_memcg > > > current->active_memcg = new > > > old = current->active_memcg > > > current->active_memcg = new > > > allocate > > > current->active_memcg = old > > > return old > > > > Yes, you are right. Thanks for the explanation. > > Thanks, all. > > Dan, I dropped the three-patch series > > loop-use-worker-per-cgroup-instead-of-kworker.patch > mm-charge-active-memcg-when-no-mm-is-set.patch > loop-charge-i-o-to-mem-and-blk-cg.patch > > When convenient, could you please prepend Johannes' patch to the > series, retest and resend? While doing so, please figure out why you > didn't also see this warning in your testing and make the appropriate > changes! > Hi Dan, did you get the chance to follow up on Andrew's request? Shakeel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-04-20 16:41 ` Shakeel Butt @ 2020-04-20 22:45 ` Dan Schatzberg 2020-04-21 5:02 ` Naresh Kamboju 0 siblings, 1 reply; 19+ messages in thread From: Dan Schatzberg @ 2020-04-20 22:45 UTC (permalink / raw) To: Shakeel Butt Cc: Andrew Morton, Johannes Weiner, Yang Shi, Tetsuo Handa, Naresh Kamboju, linux-mm, Mel Gorman, Michal Hocko On Mon, Apr 20, 2020 at 09:41:33AM -0700, Shakeel Butt wrote: > On Tue, Mar 3, 2020 at 4:29 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 3 Mar 2020 15:22:00 -0800 Shakeel Butt <shakeelb@google.com> wrote: > > > > > > doesn't corrupt its state. > > > > > > > > I.e. this is safe: > > > > > > > > process: interrupt: > > > > old = current->active_memcg > > > > old = current->active_memcg > > > > current->active_memcg = new > > > > allocate > > > > current->active_memcg = old > > > > current->active_memcg = new > > > > return old > > > > > > > > This is safe as well: > > > > > > > > process: interrupt: > > > > old = current->active_memcg > > > > current->active_memcg = new > > > > old = current->active_memcg > > > > current->active_memcg = new > > > > allocate > > > > current->active_memcg = old > > > > return old > > > > > > Yes, you are right. Thanks for the explanation. > > > > Thanks, all. > > > > Dan, I dropped the three-patch series > > > > loop-use-worker-per-cgroup-instead-of-kworker.patch > > mm-charge-active-memcg-when-no-mm-is-set.patch > > loop-charge-i-o-to-mem-and-blk-cg.patch > > > > When convenient, could you please prepend Johannes' patch to the > > series, retest and resend? While doing so, please figure out why you > > didn't also see this warning in your testing and make the appropriate > > changes! > > > > Hi Dan, did you get the chance to follow up on Andrew's request? > > Shakeel Just sent out a new patch series ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-04-20 22:45 ` Dan Schatzberg @ 2020-04-21 5:02 ` Naresh Kamboju 0 siblings, 0 replies; 19+ messages in thread From: Naresh Kamboju @ 2020-04-21 5:02 UTC (permalink / raw) To: Dan Schatzberg Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Yang Shi, Tetsuo Handa, linux-mm, Mel Gorman, Michal Hocko Dan, > > Just sent out a new patch series Please add reported by if you feel it is appropriate to your patch series. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> - Naresh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 20:26 ` Johannes Weiner 2020-03-03 20:40 ` Shakeel Butt @ 2020-03-03 20:57 ` Yang Shi 1 sibling, 0 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 20:57 UTC (permalink / raw) To: Johannes Weiner Cc: Shakeel Butt, Tetsuo Handa, Naresh Kamboju, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Dan Schatzberg On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote: > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > > > Hello, Naresh. > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > This is > > > > > > > > /** > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > * @memcg: memcg to charge. > > > > * > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > * given memcg. > > > > * > > > > * NOTE: This function is not nesting safe. > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > kworker") in linux-next, which converted loop driver to use worker per > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > So they may share the same "current", then it may cause kind of nested > > > call to memalloc_use_memcg(). > > > > > > Could you please try the below debug patch? This is not the proper > > > fix, but it may help us narrow down the problem. > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > index c49257a..1cc1cdc 100644 > > > --- a/include/linux/sched/mm.h > > > +++ b/include/linux/sched/mm.h > > > @@ -320,6 +320,10 @@ static inline void > > > memalloc_nocma_restore(unsigned int flags) > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > + if ((current->flags & PF_KTHREAD) && > > > + current->active_memcg) > > > + return; > > > + > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > Yes, I think so. The stack trace: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > The loop kworker is instantiating cache pages on behalf of who queued > the io request, but if the page already exists, the buffers should be > allocated on behalf of who already owns the page. Nesting makes sense. > > Since the only difference between the use and unuse function is the > warn when we nest, we can remove the unuse and do something like: > > old = memalloc_use_memcg(memcg); > memalloc_use_memcg(old); > > What do you think? Patch below. It should go in before Dan's patches, > and they in turn need a small update to save and restore active_memcg. > (Since loop's use is from a kworker, it's unlikely that there is an > outer scope. But it's probably best to keep this simple and robust.) > > --- > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@cmpxchg.org> > Date: Tue, 3 Mar 2020 15:15:39 -0500 > Subject: [PATCH] mm: support nesting memalloc_use_memcg() > > The memalloc_use_memcg() function to override the default memcg > accounting context currently doesn't nest. But the patches to make the > loop driver cgroup-aware will end up nesting: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > where loop_process_work() sets the memcg override to the memcg that > submitted the IO request, and alloc_page_buffers() sets the override > to the memcg that instantiated the cache page, which may differ. > > Make memalloc_use_memcg() return the old memcg and convert existing > users to a stacking model. Delete the unused memalloc_unuse_memcg(). Aha, I was thinking about the same appraoch. Looks good to me. Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > --- > fs/buffer.c | 6 ++--- > fs/notify/fanotify/fanotify.c | 5 +++-- > fs/notify/inotify/inotify_fsnotify.c | 5 +++-- > include/linux/sched/mm.h | 33 ++++++++++++---------------- > 4 files changed, 23 insertions(+), 26 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index d8c7242426bb..54d5df14bd36 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > struct buffer_head *bh, *head; > gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; > long offset; > - struct mem_cgroup *memcg; > + struct mem_cgroup *memcg, *oldmemcg; > > if (retry) > gfp |= __GFP_NOFAIL; > > memcg = get_mem_cgroup_from_page(page); > - memalloc_use_memcg(memcg); > + oldmemcg = memalloc_use_memcg(memcg); > > head = NULL; > offset = PAGE_SIZE; > @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > set_bh_page(bh, page, offset); > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > mem_cgroup_put(memcg); > return head; > /* > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 5778d1347b35..cb596ad002d8 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > struct fanotify_event *event = NULL; > gfp_t gfp = GFP_KERNEL_ACCOUNT; > struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); > + struct mem_cgroup *oldmemcg; > > /* > * For queues with unlimited length lost events are not expected and > @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > gfp |= __GFP_RETRY_MAYFAIL; > > /* Whoever is interested in the event, pays for the allocation. */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > > if (fanotify_is_perm_event(mask)) { > struct fanotify_perm_event *pevent; > @@ -334,7 +335,7 @@ init: __maybe_unused > event->path.dentry = NULL; > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > return event; > } > > diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c > index d510223d302c..776ce66aaa47 100644 > --- a/fs/notify/inotify/inotify_fsnotify.c > +++ b/fs/notify/inotify/inotify_fsnotify.c > @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group, > int ret; > int len = 0; > int alloc_len = sizeof(struct inotify_event_info); > + struct mem_cgroup *oldmemcg; > > if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) > return 0; > @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group, > * trigger OOM killer in the target monitoring memcg as it may have > * security repercussion. > */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > > if (unlikely(!event)) { > /* > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index c49257a3b510..ced06c12daf7 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > * given memcg. > * > - * NOTE: This function is not nesting safe. > - */ > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > - WARN_ON_ONCE(current->active_memcg); > - current->active_memcg = memcg; > -} > - > -/** > - * memalloc_unuse_memcg - Ends the remote memcg charging scope. > + * NOTE: This function can nest. Users must save the return value and > + * reset the previous value after their own charging scope is over: > * > - * This function marks the end of the remote memcg charging scope started by > - * memalloc_use_memcg(). > + * old = memalloc_use_memcg(memcg); > + * // ... allocations ... > + * memalloc_use_memcg(old); > */ > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > - current->active_memcg = NULL; > + struct mem_cgroup *old = current->active_memcg; > + > + current->active_memcg = memcg; > + return old; > } > #else > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > -} > - > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > + return NULL; > } > #endif > > -- > 2.24.1 > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 17:47 ` Yang Shi 2020-03-03 18:14 ` Shakeel Butt @ 2020-03-03 18:40 ` Naresh Kamboju 2020-03-03 19:04 ` Yang Shi 1 sibling, 1 reply; 19+ messages in thread From: Naresh Kamboju @ 2020-03-03 18:40 UTC (permalink / raw) To: Yang Shi Cc: Tetsuo Handa, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Shakeel Butt, schatzberg.dan On Tue, 3 Mar 2020 at 23:17, Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > Hello, Naresh. > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 <Trim> > > > Could you please try the below debug patch? This is not the proper > fix, but it may help us narrow down the problem. > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index c49257a..1cc1cdc 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -320,6 +320,10 @@ static inline void > memalloc_nocma_restore(unsigned int flags) > */ > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > { > + if ((current->flags & PF_KTHREAD) && > + current->active_memcg) > + return; > + > WARN_ON_ONCE(current->active_memcg); > current->active_memcg = memcg; > } After applying this patch the reported "warning" did not happen. Here is the full test log. https://lkft.validation.linaro.org/scheduler/job/1265325#L1287 - Naresh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs 2020-03-03 18:40 ` Naresh Kamboju @ 2020-03-03 19:04 ` Yang Shi 0 siblings, 0 replies; 19+ messages in thread From: Yang Shi @ 2020-03-03 19:04 UTC (permalink / raw) To: Naresh Kamboju Cc: Tetsuo Handa, linux-mm, Andrew Morton, Mel Gorman, Michal Hocko, Shakeel Butt, schatzberg.dan On Tue, Mar 3, 2020 at 10:40 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote: > > On Tue, 3 Mar 2020 at 23:17, Yang Shi <shy828301@gmail.com> wrote: > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > > > > Hello, Naresh. > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > <Trim> > > > > > Could you please try the below debug patch? This is not the proper > > fix, but it may help us narrow down the problem. > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > index c49257a..1cc1cdc 100644 > > --- a/include/linux/sched/mm.h > > +++ b/include/linux/sched/mm.h > > @@ -320,6 +320,10 @@ static inline void > > memalloc_nocma_restore(unsigned int flags) > > */ > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > { > > + if ((current->flags & PF_KTHREAD) && > > + current->active_memcg) > > + return; > > + > > WARN_ON_ONCE(current->active_memcg); > > current->active_memcg = memcg; > > } > > After applying this patch the reported "warning" did not happen. > Here is the full test log. > https://lkft.validation.linaro.org/scheduler/job/1265325#L1287 Thanks for doing testing. We could confirm this is caused by paralleled memalloc_use_memcg(). > > - Naresh ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-04-21 5:02 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CA+G9fYs==eMEmY_OpdhyCHO_1Z5f_M8CAQQTh-AOf5xAvBHKAQ@mail.gmail.com> 2020-03-03 10:52 ` fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Tetsuo Handa 2020-03-03 17:47 ` Yang Shi 2020-03-03 18:14 ` Shakeel Butt 2020-03-03 18:34 ` Yang Shi 2020-03-03 19:42 ` Yang Shi 2020-03-03 20:26 ` Shakeel Butt 2020-03-03 20:33 ` Johannes Weiner 2020-03-03 20:59 ` Yang Shi 2020-03-03 20:26 ` Johannes Weiner 2020-03-03 20:40 ` Shakeel Butt 2020-03-03 21:06 ` Johannes Weiner 2020-03-03 23:22 ` Shakeel Butt 2020-03-04 0:29 ` Andrew Morton 2020-04-20 16:41 ` Shakeel Butt 2020-04-20 22:45 ` Dan Schatzberg 2020-04-21 5:02 ` Naresh Kamboju 2020-03-03 20:57 ` Yang Shi 2020-03-03 18:40 ` Naresh Kamboju 2020-03-03 19:04 ` Yang Shi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).