From: Nikolay Borisov <nborisov@suse.com> To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, peterz@infradead.org Cc: torvalds@linux-foundation.org, davej@codemonkey.org.uk, npiggin@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, mhocko@kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH v2] lockdep: Fix fs_reclaim warning. Date: Mon, 12 Feb 2018 14:08:32 +0200 [thread overview] Message-ID: <e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com> (raw) In-Reply-To: <201802082043.FFJ39503.SVQFFFOJMHLOtO@I-love.SAKURA.ne.jp> On 8.02.2018 13:43, Tetsuo Handa wrote: >>From 361d37a7d36978020dfb4c11ec1f4800937ccb68 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Thu, 8 Feb 2018 10:35:35 +0900 > Subject: [PATCH v2] lockdep: Fix fs_reclaim warning. > > Dave Jones reported fs_reclaim lockdep warnings. > > ============================================ > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > -------------------------------------------- > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > __lock_acquire+0xa09/0x2040 > lock_acquire+0x12e/0x350 > fs_reclaim_acquire.part.102+0x29/0x30 > kmem_cache_alloc+0x3d/0x2c0 > alloc_extent_state+0xa7/0x410 > __clear_extent_bit+0x3ea/0x570 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > btrfs_releasepage+0x161/0x170 > try_to_release_page+0x162/0x1c0 > shrink_page_list+0x1d5a/0x2fb0 > shrink_inactive_list+0x451/0x940 > shrink_node_memcg.constprop.88+0x4c9/0x5e0 > shrink_node+0x12d/0x260 > try_to_free_pages+0x418/0xaf0 > __alloc_pages_slowpath+0x976/0x1790 > __alloc_pages_nodemask+0x52c/0x5c0 > new_slab+0x374/0x3f0 > ___slab_alloc.constprop.81+0x47e/0x5a0 > __slab_alloc.constprop.80+0x32/0x60 > __kmalloc_track_caller+0x267/0x310 > __kmalloc_reserve.isra.40+0x29/0x80 > __alloc_skb+0xee/0x390 > sk_stream_alloc_skb+0xb8/0x340 > tcp_sendmsg_locked+0x8e6/0x1d30 > tcp_sendmsg+0x27/0x40 > inet_sendmsg+0xd0/0x310 > sock_write_iter+0x17a/0x240 > __vfs_write+0x2ab/0x380 > vfs_write+0xfb/0x260 > SyS_write+0xb6/0x140 > do_syscall_64+0x1e5/0xc05 > entry_SYSCALL64_slow_path+0x25/0x25 > I think I've hit another incarnation of that one. The call stack is: http://paste.opensuse.org/3f22d013 The cleaned up callstack of all the ? entries look like: __lock_acquire+0x2d8a/0x4b70 lock_acquire+0x110/0x330 kmem_cache_alloc+0x29/0x2c0 __clear_extent_bit+0x488/0x800 try_release_extent_mapping+0x288/0x3c0 __btrfs_releasepage+0x6c/0x140 shrink_page_list+0x227e/0x3110 shrink_inactive_list+0x414/0xdb0 shrink_node_memcg+0x7c8/0x1250 shrink_node+0x2ae/0xb50 do_try_to_free_pages+0x2b1/0xe20 try_to_free_pages+0x205/0x570 __alloc_pages_nodemask+0xb91/0x2160 new_slab+0x27a/0x4e0 ___slab_alloc+0x355/0x610 __slab_alloc+0x4c/0xa0 kmem_cache_alloc+0x22d/0x2c0 mempool_alloc+0xe1/0x280 bio_alloc_bioset+0x1d7/0x830 ext4_mpage_readpages+0x99f/0x1000 <- __do_page_cache_readahead+0x4be/0x840 filemap_fault+0x8c8/0xfc0 ext4_filemap_fault+0x7d/0xb0 __do_fault+0x7a/0x150 __handle_mm_fault+0x1542/0x29d0 __do_page_fault+0x557/0xa30 async_page_fault+0x4c/0x60 There is no fs stacking going on here and that is 4.15-rc9. > This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework > FS_RECLAIM annotation") which replaced lockdep_set_current_reclaim_state()/ > lockdep_clear_current_reclaim_state() in __perform_reclaim() and > lockdep_trace_alloc() in slab_pre_alloc_hook() with fs_reclaim_acquire()/ > fs_reclaim_release(). Since __kmalloc_reserve() from __alloc_skb() adds > __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, and all reclaim path simply > propagates __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() > is trying to grab the 'fake' lock again when __perform_reclaim() already > grabbed the 'fake' lock. > > The > > /* this guy won't enter reclaim */ > if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) > return false; > > test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock > was added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context > (__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread won't > enter reclaim regardless of __GFP_NOMEMALLOC after commit 341ce06f69abfafa > ("page allocator: calculate the alloc_flags for allocation only once") > added the PF_MEMALLOC safeguard ( > > /* Avoid recursion of direct reclaim */ > if (p->flags & PF_MEMALLOC) > goto nopage; > > in __alloc_pages_slowpath()). > > Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow > __need_fs_reclaim() to return false. > > Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Nick Piggin <npiggin@gmail.com> > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 81e18ce..19fb76b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3590,7 +3590,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask) > return false; > > /* this guy won't enter reclaim */ > - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) > + if (current->flags & PF_MEMALLOC) > return false; > > /* We're only interested __GFP_FS allocations for now */ >
WARNING: multiple messages have this Message-ID (diff)
From: Nikolay Borisov <nborisov@suse.com> To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, peterz@infradead.org Cc: torvalds@linux-foundation.org, davej@codemonkey.org.uk, npiggin@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, mhocko@kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH v2] lockdep: Fix fs_reclaim warning. Date: Mon, 12 Feb 2018 14:08:32 +0200 [thread overview] Message-ID: <e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com> (raw) In-Reply-To: <201802082043.FFJ39503.SVQFFFOJMHLOtO@I-love.SAKURA.ne.jp> On 8.02.2018 13:43, Tetsuo Handa wrote: >>From 361d37a7d36978020dfb4c11ec1f4800937ccb68 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Thu, 8 Feb 2018 10:35:35 +0900 > Subject: [PATCH v2] lockdep: Fix fs_reclaim warning. > > Dave Jones reported fs_reclaim lockdep warnings. > > ============================================ > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > -------------------------------------------- > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > __lock_acquire+0xa09/0x2040 > lock_acquire+0x12e/0x350 > fs_reclaim_acquire.part.102+0x29/0x30 > kmem_cache_alloc+0x3d/0x2c0 > alloc_extent_state+0xa7/0x410 > __clear_extent_bit+0x3ea/0x570 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > btrfs_releasepage+0x161/0x170 > try_to_release_page+0x162/0x1c0 > shrink_page_list+0x1d5a/0x2fb0 > shrink_inactive_list+0x451/0x940 > shrink_node_memcg.constprop.88+0x4c9/0x5e0 > shrink_node+0x12d/0x260 > try_to_free_pages+0x418/0xaf0 > __alloc_pages_slowpath+0x976/0x1790 > __alloc_pages_nodemask+0x52c/0x5c0 > new_slab+0x374/0x3f0 > ___slab_alloc.constprop.81+0x47e/0x5a0 > __slab_alloc.constprop.80+0x32/0x60 > __kmalloc_track_caller+0x267/0x310 > __kmalloc_reserve.isra.40+0x29/0x80 > __alloc_skb+0xee/0x390 > sk_stream_alloc_skb+0xb8/0x340 > tcp_sendmsg_locked+0x8e6/0x1d30 > tcp_sendmsg+0x27/0x40 > inet_sendmsg+0xd0/0x310 > sock_write_iter+0x17a/0x240 > __vfs_write+0x2ab/0x380 > vfs_write+0xfb/0x260 > SyS_write+0xb6/0x140 > do_syscall_64+0x1e5/0xc05 > entry_SYSCALL64_slow_path+0x25/0x25 > I think I've hit another incarnation of that one. The call stack is: http://paste.opensuse.org/3f22d013 The cleaned up callstack of all the ? entries look like: __lock_acquire+0x2d8a/0x4b70 lock_acquire+0x110/0x330 kmem_cache_alloc+0x29/0x2c0 __clear_extent_bit+0x488/0x800 try_release_extent_mapping+0x288/0x3c0 __btrfs_releasepage+0x6c/0x140 shrink_page_list+0x227e/0x3110 shrink_inactive_list+0x414/0xdb0 shrink_node_memcg+0x7c8/0x1250 shrink_node+0x2ae/0xb50 do_try_to_free_pages+0x2b1/0xe20 try_to_free_pages+0x205/0x570 __alloc_pages_nodemask+0xb91/0x2160 new_slab+0x27a/0x4e0 ___slab_alloc+0x355/0x610 __slab_alloc+0x4c/0xa0 kmem_cache_alloc+0x22d/0x2c0 mempool_alloc+0xe1/0x280 bio_alloc_bioset+0x1d7/0x830 ext4_mpage_readpages+0x99f/0x1000 <- __do_page_cache_readahead+0x4be/0x840 filemap_fault+0x8c8/0xfc0 ext4_filemap_fault+0x7d/0xb0 __do_fault+0x7a/0x150 __handle_mm_fault+0x1542/0x29d0 __do_page_fault+0x557/0xa30 async_page_fault+0x4c/0x60 There is no fs stacking going on here and that is 4.15-rc9. > This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework > FS_RECLAIM annotation") which replaced lockdep_set_current_reclaim_state()/ > lockdep_clear_current_reclaim_state() in __perform_reclaim() and > lockdep_trace_alloc() in slab_pre_alloc_hook() with fs_reclaim_acquire()/ > fs_reclaim_release(). Since __kmalloc_reserve() from __alloc_skb() adds > __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, and all reclaim path simply > propagates __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() > is trying to grab the 'fake' lock again when __perform_reclaim() already > grabbed the 'fake' lock. > > The > > /* this guy won't enter reclaim */ > if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) > return false; > > test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock > was added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context > (__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread won't > enter reclaim regardless of __GFP_NOMEMALLOC after commit 341ce06f69abfafa > ("page allocator: calculate the alloc_flags for allocation only once") > added the PF_MEMALLOC safeguard ( > > /* Avoid recursion of direct reclaim */ > if (p->flags & PF_MEMALLOC) > goto nopage; > > in __alloc_pages_slowpath()). > > Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow > __need_fs_reclaim() to return false. > > Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Nick Piggin <npiggin@gmail.com> > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 81e18ce..19fb76b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3590,7 +3590,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask) > return false; > > /* this guy won't enter reclaim */ > - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) > + if (current->flags & PF_MEMALLOC) > return false; > > /* We're only interested __GFP_FS allocations for now */ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-02-12 12:08 UTC|newest] Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-24 1:36 [4.15-rc9] fs_reclaim lockdep trace Dave Jones 2018-01-24 1:36 ` Dave Jones 2018-01-27 22:24 ` Dave Jones 2018-01-27 22:24 ` Dave Jones 2018-01-27 22:43 ` Linus Torvalds 2018-01-27 22:43 ` Linus Torvalds 2018-01-28 1:16 ` Tetsuo Handa 2018-01-28 1:16 ` Tetsuo Handa 2018-01-28 4:25 ` Tetsuo Handa 2018-01-28 4:25 ` Tetsuo Handa 2018-01-28 5:55 ` Tetsuo Handa 2018-01-28 5:55 ` Tetsuo Handa 2018-01-28 5:55 ` Tetsuo Handa 2018-01-29 2:43 ` Dave Jones 2018-01-29 2:43 ` Dave Jones 2018-01-29 10:27 ` Peter Zijlstra 2018-01-29 10:27 ` Peter Zijlstra 2018-01-29 11:47 ` Tetsuo Handa 2018-01-29 11:47 ` Tetsuo Handa 2018-01-29 13:55 ` Peter Zijlstra 2018-01-29 13:55 ` Peter Zijlstra 2018-02-01 11:36 ` Tetsuo Handa 2018-02-01 11:36 ` Tetsuo Handa 2018-02-08 11:43 ` [PATCH v2] lockdep: Fix fs_reclaim warning Tetsuo Handa 2018-02-08 11:43 ` Tetsuo Handa 2018-02-12 12:08 ` Nikolay Borisov [this message] 2018-02-12 12:08 ` Nikolay Borisov 2018-02-12 13:46 ` Tetsuo Handa 2018-02-12 13:46 ` Tetsuo Handa 2018-02-19 11:52 ` Tetsuo Handa 2018-02-19 11:52 ` Tetsuo Handa 2018-02-27 21:50 ` [PATCH v2 (RESEND)] " Tetsuo Handa 2018-03-07 21:44 ` Tetsuo Handa 2018-03-07 23:33 ` Andrew Morton 2018-03-08 15:30 ` Tetsuo Handa
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com \ --to=nborisov@suse.com \ --cc=davej@codemonkey.org.uk \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=netdev@vger.kernel.org \ --cc=npiggin@gmail.com \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ --cc=peterz@infradead.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.