From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C606C3F2D1 for ; Tue, 3 Mar 2020 20:40:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A720320870 for ; Tue, 3 Mar 2020 20:40:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pb6JV212" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A720320870 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 535526B0006; Tue, 3 Mar 2020 15:40:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C00E6B000C; Tue, 3 Mar 2020 15:40:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 388A06B000D; Tue, 3 Mar 2020 15:40:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 1FC8B6B0006 for ; Tue, 3 Mar 2020 15:40:48 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AB77C180AD801 for ; Tue, 3 Mar 2020 20:40:47 +0000 (UTC) X-FDA: 76555219734.09.eye16_6d4c83780c58 X-HE-Tag: eye16_6d4c83780c58 X-Filterd-Recvd-Size: 15027 Received: from mail-ot1-f66.google.com (mail-ot1-f66.google.com [209.85.210.66]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 3 Mar 2020 20:40:46 +0000 (UTC) Received: by mail-ot1-f66.google.com with SMTP id x97so4448455ota.6 for ; Tue, 03 Mar 2020 12:40:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=djjiQ8IRAfINXqkJ9kPm9Jl1DcWdet5d4P72N2csD/4=; b=pb6JV212QBOKup7giBMhX51a95i+DN6HUEXxuI5gJvolOdu5/9/VZH2Fv3JKkt8vVz 9bk1DBrmqMIENKgm7OB9h9WnG+jDWsKbvwDipmqEe3H8qdLGYCxKM0fImocYRJzdIrEz FRkIeAWqKbv05jphzTTq8Mtjq4Hd5oupgyuHy7wufyrH1tePq55YIzgY4nHHG2LQOOH1 F7CH024e75153Q6tB1cNBy7MM5mJ0OC7HYzTbPtK+iwg8dZPf0lE4JKox9Crnl8q90K6 IyYpNGFQuJHcJleEavkrFy2R0XyJ6aWhdp3jXNXhbHR8b98EmqVdE5ck13WZG1Ql5hdY 8S6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=djjiQ8IRAfINXqkJ9kPm9Jl1DcWdet5d4P72N2csD/4=; b=dPqrhbhya4oG+8JPcOXzBQzEnQcdsbIRLS2LIzZOgbV13PBMo/sSaMdEifCgZEAXdf aM5svv7TzTR9cP9YztIZab5hcOcYmVuku3ZEBM1Rb00dJW4Vxs8zI+lmoKE9TLArtBD/ QCh+1MGjmWR6n6RHIFMczzEJEWdNi0fs9z+a29MN9lNqU73eynxWzuRKbIpSIF3Tk2wO OUyk4Y5TrI0kuKYpvvAByxJN4uaS/IwNWdervv2+T3xp07JDtfhBkC0Abg9jW7X1Yus5 JdgKOUO2Vht6EY6Fz6BMp07x9vWVbXplgx58YVbLWJY2M6TUmSkgwBYh9cWMPbEWmifM Ascw== X-Gm-Message-State: ANhLgQ3eGenlbd0VBPplHatEkqhW7YiIQoHBCBAcbGPQHNPoJ5humLnW PTYvMnMDEdwxxT1txCrViJn5IqtMsNB3yB25gZa7RYoG X-Google-Smtp-Source: ADFU+vuu8jUDOG3ITuqbybrG7nIy6vbsNVMfkcOrWtPjOcI0SAWiZtZPKa8r17Hgd/VEnfRAG1SHi5zZlNA1NHuXHYQ= X-Received: by 2002:a9d:6:: with SMTP id 6mr4779869ota.191.1583268044756; Tue, 03 Mar 2020 12:40:44 -0800 (PST) MIME-Version: 1.0 References: <0a37bb7d-18a7-c43c-52a5-f13a34decf69@i-love.sakura.ne.jp> <20200303202623.GA68565@cmpxchg.org> In-Reply-To: <20200303202623.GA68565@cmpxchg.org> From: Shakeel Butt Date: Tue, 3 Mar 2020 12:40:33 -0800 Message-ID: Subject: Re: fs/buffer.c: WARNING: alloc_page_buffers while mke2fs To: Johannes Weiner Cc: Yang Shi , Tetsuo Handa , Naresh Kamboju , linux-mm , Andrew Morton , Mel Gorman , Michal Hocko , Dan Schatzberg Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 3, 2020 at 12:26 PM Johannes Weiner wrote: > > On Tue, Mar 03, 2020 at 10:14:49AM -0800, Shakeel Butt wrote: > > On Tue, Mar 3, 2020 at 9:47 AM Yang Shi wrote: > > > > > > On Tue, Mar 3, 2020 at 2:53 AM Tetsuo Handa > > > wrote: > > > > > > > > Hello, Naresh. > > > > > > > > > [ 98.003346] WARNING: CPU: 2 PID: 340 at > > > > > include/linux/sched/mm.h:323 alloc_page_buffers+0x210/0x288 > > > > > > > > This is > > > > > > > > /** > > > > * memalloc_use_memcg - Starts the remote memcg charging scope. > > > > * @memcg: memcg to charge. > > > > * > > > > * This function marks the beginning of the remote memcg charging scope. All the > > > > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > > > > * given memcg. > > > > * > > > > * NOTE: This function is not nesting safe. > > > > */ > > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > > { > > > > WARN_ON_ONCE(current->active_memcg); > > > > current->active_memcg = memcg; > > > > } > > > > > > > > which is about memcg. Redirecting to linux-mm. > > > > > > Isn't this triggered by ("loop: use worker per cgroup instead of > > > kworker") in linux-next, which converted loop driver to use worker per > > > cgroup, so it may have multiple workers work at the mean time? > > > > > > So they may share the same "current", then it may cause kind of nested > > > call to memalloc_use_memcg(). > > > > > > Could you please try the below debug patch? This is not the proper > > > fix, but it may help us narrow down the problem. > > > > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > > > index c49257a..1cc1cdc 100644 > > > --- a/include/linux/sched/mm.h > > > +++ b/include/linux/sched/mm.h > > > @@ -320,6 +320,10 @@ static inline void > > > memalloc_nocma_restore(unsigned int flags) > > > */ > > > static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > > > { > > > + if ((current->flags & PF_KTHREAD) && > > > + current->active_memcg) > > > + return; > > > + > > > WARN_ON_ONCE(current->active_memcg); > > > current->active_memcg = memcg; > > > } > > > > > > > Maybe it's time to make memalloc_use_memcg() nesting safe. > > Yes, I think so. The stack trace: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > The loop kworker is instantiating cache pages on behalf of who queued > the io request, but if the page already exists, the buffers should be > allocated on behalf of who already owns the page. Nesting makes sense. > > Since the only difference between the use and unuse function is the > warn when we nest, we can remove the unuse and do something like: > > old = memalloc_use_memcg(memcg); > memalloc_use_memcg(old); > > What do you think? SGTM. > Patch below. It should go in before Dan's patches, > and they in turn need a small update to save and restore active_memcg. > (Since loop's use is from a kworker, it's unlikely that there is an > outer scope. But it's probably best to keep this simple and robust.) > > --- > > From e0e5ace069af5a36e41eafe3bf21a67966127c04 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Tue, 3 Mar 2020 15:15:39 -0500 > Subject: [PATCH] mm: support nesting memalloc_use_memcg() > > The memalloc_use_memcg() function to override the default memcg > accounting context currently doesn't nest. But the patches to make the > loop driver cgroup-aware will end up nesting: > > [ 98.137605] alloc_page_buffers+0x210/0x288 > [ 98.141799] __getblk_gfp+0x1d4/0x400 > [ 98.145475] ext4_read_block_bitmap_nowait+0x148/0xbc8 > [ 98.150628] ext4_mb_init_cache+0x25c/0x9b0 > [ 98.154821] ext4_mb_init_group+0x270/0x390 > [ 98.159014] ext4_mb_good_group+0x264/0x270 > [ 98.163208] ext4_mb_regular_allocator+0x480/0x798 > [ 98.168011] ext4_mb_new_blocks+0x958/0x10f8 > [ 98.172294] ext4_ext_map_blocks+0xec8/0x1618 > [ 98.176660] ext4_map_blocks+0x1b8/0x8a0 > [ 98.180592] ext4_writepages+0x830/0xf10 > [ 98.184523] do_writepages+0xb4/0x198 > [ 98.188195] __filemap_fdatawrite_range+0x170/0x1c8 > [ 98.193086] filemap_write_and_wait_range+0x40/0xb0 > [ 98.197974] ext4_punch_hole+0x4a4/0x660 > [ 98.201907] ext4_fallocate+0x294/0x1190 > [ 98.205839] loop_process_work+0x690/0x1100 > [ 98.210032] loop_workfn+0x2c/0x110 > [ 98.213529] process_one_work+0x3e0/0x648 > [ 98.217546] worker_thread+0x70/0x670 > [ 98.221217] kthread+0x1b8/0x1c0 > [ 98.224452] ret_from_fork+0x10/0x18 > > where loop_process_work() sets the memcg override to the memcg that > submitted the IO request, and alloc_page_buffers() sets the override > to the memcg that instantiated the cache page, which may differ. > > Make memalloc_use_memcg() return the old memcg and convert existing > users to a stacking model. Delete the unused memalloc_unuse_memcg(). > > Signed-off-by: Johannes Weiner Reviewed-by: Shakeel Butt > --- > fs/buffer.c | 6 ++--- > fs/notify/fanotify/fanotify.c | 5 +++-- > fs/notify/inotify/inotify_fsnotify.c | 5 +++-- > include/linux/sched/mm.h | 33 ++++++++++++---------------- > 4 files changed, 23 insertions(+), 26 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index d8c7242426bb..54d5df14bd36 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -857,13 +857,13 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > struct buffer_head *bh, *head; > gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; > long offset; > - struct mem_cgroup *memcg; > + struct mem_cgroup *memcg, *oldmemcg; > > if (retry) > gfp |= __GFP_NOFAIL; > > memcg = get_mem_cgroup_from_page(page); > - memalloc_use_memcg(memcg); > + oldmemcg = memalloc_use_memcg(memcg); > > head = NULL; > offset = PAGE_SIZE; > @@ -882,7 +882,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, > set_bh_page(bh, page, offset); > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > mem_cgroup_put(memcg); > return head; > /* > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 5778d1347b35..cb596ad002d8 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -284,6 +284,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > struct fanotify_event *event = NULL; > gfp_t gfp = GFP_KERNEL_ACCOUNT; > struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); > + struct mem_cgroup *oldmemcg; > > /* > * For queues with unlimited length lost events are not expected and > @@ -297,7 +298,7 @@ struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, > gfp |= __GFP_RETRY_MAYFAIL; > > /* Whoever is interested in the event, pays for the allocation. */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > > if (fanotify_is_perm_event(mask)) { > struct fanotify_perm_event *pevent; > @@ -334,7 +335,7 @@ init: __maybe_unused > event->path.dentry = NULL; > } > out: > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > return event; > } > > diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c > index d510223d302c..776ce66aaa47 100644 > --- a/fs/notify/inotify/inotify_fsnotify.c > +++ b/fs/notify/inotify/inotify_fsnotify.c > @@ -68,6 +68,7 @@ int inotify_handle_event(struct fsnotify_group *group, > int ret; > int len = 0; > int alloc_len = sizeof(struct inotify_event_info); > + struct mem_cgroup *oldmemcg; > > if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) > return 0; > @@ -95,9 +96,9 @@ int inotify_handle_event(struct fsnotify_group *group, > * trigger OOM killer in the target monitoring memcg as it may have > * security repercussion. > */ > - memalloc_use_memcg(group->memcg); > + oldmemcg = memalloc_use_memcg(group->memcg); > event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); > - memalloc_unuse_memcg(); > + memalloc_use_memcg(oldmemcg); > > if (unlikely(!event)) { > /* > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index c49257a3b510..ced06c12daf7 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -316,31 +316,26 @@ static inline void memalloc_nocma_restore(unsigned int flags) > * __GFP_ACCOUNT allocations till the end of the scope will be charged to the > * given memcg. > * > - * NOTE: This function is not nesting safe. > - */ > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > - WARN_ON_ONCE(current->active_memcg); > - current->active_memcg = memcg; > -} > - > -/** > - * memalloc_unuse_memcg - Ends the remote memcg charging scope. > + * NOTE: This function can nest. Users must save the return value and > + * reset the previous value after their own charging scope is over: Should we mention that this is still not irq safe? At the moment other than skmem we don't do memcg charging in the interrupt and skmem has a memcg already associated with it. Maybe in future there might be a case where we want a specific memcg be charged for kmem in the interrupt context. Until then we can mark that this function should not be used in interrupt. > * > - * This function marks the end of the remote memcg charging scope started by > - * memalloc_use_memcg(). > + * old = memalloc_use_memcg(memcg); > + * // ... allocations ... > + * memalloc_use_memcg(old); > */ > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > - current->active_memcg = NULL; > + struct mem_cgroup *old = current->active_memcg; > + > + current->active_memcg = memcg; > + return old; > } > #else > -static inline void memalloc_use_memcg(struct mem_cgroup *memcg) > -{ > -} > - > -static inline void memalloc_unuse_memcg(void) > +static inline struct mem_cgroup *__must_check > +memalloc_use_memcg(struct mem_cgroup *memcg) > { > + return NULL; > } > #endif > > -- > 2.24.1 >