linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Naresh Kamboju <naresh.kamboju@linaro.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Michal Hocko <mhocko@kernel.org>,
	Dan Schatzberg <schatzberg.dan@gmail.com>
Subject: Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs
Date: Tue, 3 Mar 2020 15:26:23 -0500	[thread overview]
Message-ID: <20200303202623.GA68565@cmpxchg.org> (raw)
In-Reply-To: <CALvZod5x6-_q=CF_bzVqhftWYZUpN9Oyxa4k4=9OKj-Ji0VTjw@mail.gmail.com>

On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote:
> On Tue, Mar 3, 2020 at 9:47 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >
> > > Hello, Naresh.
> > >
> > > > [   98.003346] WARNING: CPU: 2 PID: 340 at
> > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288
> > >
> > > This is
> > >
> > > /**
> > >  * memalloc_use_memcg - Starts the remote memcg charging scope.
> > >  * @memcg: memcg to charge.
> > >  *
> > >  * This function marks the beginning of the remote memcg charging scope. All the
> > >  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
> > >  * given memcg.
> > >  *
> > >  * NOTE: This function is not nesting safe.
> > >  */
> > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> > > {
> > >         WARN_ON_ONCE(current->active_memcg);
> > >         current->active_memcg = memcg;
> > > }
> > >
> > > which is about memcg. Redirecting to linux-mm.
> >
> > Isn't this triggered by ("loop: use worker per cgroup instead of
> > kworker") in linux-next, which converted loop driver to use worker per
> > cgroup, so it may have multiple workers work at the mean time?
> >
> > So they may share the same "current", then it may cause kind of nested
> > call to memalloc_use_memcg().
> >
> > Could you please try the below debug patch? This is not the proper
> > fix, but it may help us narrow down the problem.
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index c49257a..1cc1cdc 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -320,6 +320,10 @@ static inline void
> > memalloc_nocma_restore(unsigned int flags)
> >   */
> >  static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
> >  {
> > +       if ((current->flags & PF_KTHREAD) &&
> > +            current->active_memcg)
> > +               return;
> > +
> >         WARN_ON_ONCE(current->active_memcg);
> >         current->active_memcg = memcg;
> >  }
> >
> 
> Maybe it's time to make memalloc_use_memcg() nesting safe.

Yes, I think so. The stack trace:

[   98.137605]  alloc_page_buffers+0x210/0x288
[   98.141799]  __getblk_gfp+0x1d4/0x400
[   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
[   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
[   98.154821]  ext4_mb_init_group+0x270/0x390
[   98.159014]  ext4_mb_good_group+0x264/0x270
[   98.163208]  ext4_mb_regular_allocator+0x480/0x798
[   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
[   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
[   98.176660]  ext4_map_blocks+0x1b8/0x8a0
[   98.180592]  ext4_writepages+0x830/0xf10
[   98.184523]  do_writepages+0xb4/0x198
[   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
[   98.193086]  filemap_write_and_wait_range+0x40/0xb0
[   98.197974]  ext4_punch_hole+0x4a4/0x660
[   98.201907]  ext4_fallocate+0x294/0x1190
[   98.205839]  loop_process_work+0x690/0x1100
[   98.210032]  loop_workfn+0x2c/0x110
[   98.213529]  process_one_work+0x3e0/0x648
[   98.217546]  worker_thread+0x70/0x670
[   98.221217]  kthread+0x1b8/0x1c0
[   98.224452]  ret_from_fork+0x10/0x18

The loop kworker is instantiating cache pages on behalf of who queued
the io request, but if the page already exists, the buffers should be
allocated on behalf of who already owns the page. Nesting makes sense.

Since the only difference between the use and unuse function is the
warn when we nest, we can remove the unuse and do something like:

old = memalloc_use_memcg(memcg);
memalloc_use_memcg(old);

What do you think? Patch below. It should go in before Dan's patches,
and they in turn need a small update to save and restore active_memcg.
(Since loop's use is from a kworker, it's unlikely that there is an
outer scope. But it's probably best to keep this simple and robust.)

---

From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 3 Mar 2020 15:15:39 -0500
Subject: [PATCH] mm: support nesting memalloc_use_memcg()

The memalloc_use_memcg() function to override the default memcg
accounting context currently doesn't nest. But the patches to make the
loop driver cgroup-aware will end up nesting:

[   98.137605]  alloc_page_buffers+0x210/0x288
[   98.141799]  __getblk_gfp+0x1d4/0x400
[   98.145475]  ext4_read_block_bitmap_nowait+0x148/0xbc8
[   98.150628]  ext4_mb_init_cache+0x25c/0x9b0
[   98.154821]  ext4_mb_init_group+0x270/0x390
[   98.159014]  ext4_mb_good_group+0x264/0x270
[   98.163208]  ext4_mb_regular_allocator+0x480/0x798
[   98.168011]  ext4_mb_new_blocks+0x958/0x10f8
[   98.172294]  ext4_ext_map_blocks+0xec8/0x1618
[   98.176660]  ext4_map_blocks+0x1b8/0x8a0
[   98.180592]  ext4_writepages+0x830/0xf10
[   98.184523]  do_writepages+0xb4/0x198
[   98.188195]  __filemap_fdatawrite_range+0x170/0x1c8
[   98.193086]  filemap_write_and_wait_range+0x40/0xb0
[   98.197974]  ext4_punch_hole+0x4a4/0x660
[   98.201907]  ext4_fallocate+0x294/0x1190
[   98.205839]  loop_process_work+0x690/0x1100
[   98.210032]  loop_workfn+0x2c/0x110
[   98.213529]  process_one_work+0x3e0/0x648
[   98.217546]  worker_thread+0x70/0x670
[   98.221217]  kthread+0x1b8/0x1c0
[   98.224452]  ret_from_fork+0x10/0x18

where loop_process_work() sets the memcg override to the memcg that
submitted the IO request, and alloc_page_buffers() sets the override
to the memcg that instantiated the cache page, which may differ.

Make memalloc_use_memcg() return the old memcg and convert existing
users to a stacking model. Delete the unused memalloc_unuse_memcg().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/buffer.c                          |  6 ++---
 fs/notify/fanotify/fanotify.c        |  5 +++--
 fs/notify/inotify/inotify_fsnotify.c |  5 +++--
 include/linux/sched/mm.h             | 33 ++++++++++++----------------
 4 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d8c7242426bb..54d5df14bd36 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 	struct buffer_head *bh, *head;
 	gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
 	long offset;
-	struct mem_cgroup *memcg;
+	struct mem_cgroup *memcg, *oldmemcg;
 
 	if (retry)
 		gfp |= __GFP_NOFAIL;
 
 	memcg = get_mem_cgroup_from_page(page);
-	memalloc_use_memcg(memcg);
+	oldmemcg = memalloc_use_memcg(memcg);
 
 	head = NULL;
 	offset = PAGE_SIZE;
@@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
 		set_bh_page(bh, page, offset);
 	}
 out:
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 	mem_cgroup_put(memcg);
 	return head;
 /*
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 5778d1347b35..cb596ad002d8 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 	struct fanotify_event *event = NULL;
 	gfp_t gfp = GFP_KERNEL_ACCOUNT;
 	struct inode *id = fanotify_fid_inode(inode, mask, data, data_type);
+	struct mem_cgroup *oldmemcg;
 
 	/*
 	 * For queues with unlimited length lost events are not expected and
@@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
 		gfp |= __GFP_RETRY_MAYFAIL;
 
 	/* Whoever is interested in the event, pays for the allocation. */
-	memalloc_use_memcg(group->memcg);
+	oldmemcg = memalloc_use_memcg(group->memcg);
 
 	if (fanotify_is_perm_event(mask)) {
 		struct fanotify_perm_event *pevent;
@@ -334,7 +335,7 @@ init: __maybe_unused
 		event->path.dentry = NULL;
 	}
 out:
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 	return event;
 }
 
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d510223d302c..776ce66aaa47 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group,
 	int ret;
 	int len = 0;
 	int alloc_len = sizeof(struct inotify_event_info);
+	struct mem_cgroup *oldmemcg;
 
 	if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info)))
 		return 0;
@@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group,
 	 * trigger OOM killer in the target monitoring memcg as it may have
 	 * security repercussion.
 	 */
-	memalloc_use_memcg(group->memcg);
+	oldmemcg = memalloc_use_memcg(group->memcg);
 	event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
-	memalloc_unuse_memcg();
+	memalloc_use_memcg(oldmemcg);
 
 	if (unlikely(!event)) {
 		/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index c49257a3b510..ced06c12daf7 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags)
  * __GFP_ACCOUNT allocations till the end of the scope will be charged to the
  * given memcg.
  *
- * NOTE: This function is not nesting safe.
- */
-static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
-{
-	WARN_ON_ONCE(current->active_memcg);
-	current->active_memcg = memcg;
-}
-
-/**
- * memalloc_unuse_memcg - Ends the remote memcg charging scope.
+ * NOTE: This function can nest. Users must save the return value and
+ * reset the previous value after their own charging scope is over:
  *
- * This function marks the end of the remote memcg charging scope started by
- * memalloc_use_memcg().
+ *      old = memalloc_use_memcg(memcg);
+ *      // ... allocations ...
+ *      memalloc_use_memcg(old);
  */
-static inline void memalloc_unuse_memcg(void)
+static inline struct mem_cgroup *__must_check
+memalloc_use_memcg(struct mem_cgroup *memcg)
 {
-	current->active_memcg = NULL;
+	struct mem_cgroup *old = current->active_memcg;
+
+	current->active_memcg = memcg;
+	return old;
 }
 #else
-static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void memalloc_unuse_memcg(void)
+static inline struct mem_cgroup *__must_check
+memalloc_use_memcg(struct mem_cgroup *memcg)
 {
+	return NULL;
 }
 #endif
 
-- 
2.24.1



  parent reply	other threads:[~2020-03-03 20:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+G9fYs==eMEmY_OpdhyCHO_1Z5f_M8CAQQTh-AOf5xAvBHKAQ@mail.gmail.com>
2020-03-03 10:52 ` fs/buffer.c: WARNING: alloc_page_buffers while mke2fs Tetsuo Handa
2020-03-03 17:47   ` Yang Shi
2020-03-03 18:14     ` Shakeel Butt
2020-03-03 18:34       ` Yang Shi
2020-03-03 19:42       ` Yang Shi
2020-03-03 20:26         ` Shakeel Butt
2020-03-03 20:33         ` Johannes Weiner
2020-03-03 20:59           ` Yang Shi
2020-03-03 20:26       ` Johannes Weiner [this message]
2020-03-03 20:40         ` Shakeel Butt
2020-03-03 21:06           ` Johannes Weiner
2020-03-03 23:22             ` Shakeel Butt
2020-03-04  0:29               ` Andrew Morton
2020-04-20 16:41                 ` Shakeel Butt
2020-04-20 22:45                   ` Dan Schatzberg
2020-04-21  5:02                     ` Naresh Kamboju
2020-03-03 20:57         ` Yang Shi
2020-03-03 18:40     ` Naresh Kamboju
2020-03-03 19:04       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200303202623.GA68565@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=schatzberg.dan@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).