All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, peterz@infradead.org
Cc: torvalds@linux-foundation.org, davej@codemonkey.org.uk,
	npiggin@gmail.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, netdev@vger.kernel.org, mhocko@kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] lockdep: Fix fs_reclaim warning.
Date: Mon, 12 Feb 2018 14:08:32 +0200	[thread overview]
Message-ID: <e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com> (raw)
In-Reply-To: <201802082043.FFJ39503.SVQFFFOJMHLOtO@I-love.SAKURA.ne.jp>



On  8.02.2018 13:43, Tetsuo Handa wrote:
>>From 361d37a7d36978020dfb4c11ec1f4800937ccb68 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 8 Feb 2018 10:35:35 +0900
> Subject: [PATCH v2] lockdep: Fix fs_reclaim warning.
> 
> Dave Jones reported fs_reclaim lockdep warnings.
> 
>   ============================================
>   WARNING: possible recursive locking detected
>   4.15.0-rc9-backup-debug+ #1 Not tainted
>   --------------------------------------------
>   sshd/24800 is trying to acquire lock:
>    (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   but task is already holding lock:
>    (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(fs_reclaim);
>     lock(fs_reclaim);
> 
>    *** DEADLOCK ***
> 
>    May be due to missing lock nesting notation
> 
>   2 locks held by sshd/24800:
>    #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
>    #1:  (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   stack backtrace:
>   CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
>   Call Trace:
>    dump_stack+0xbc/0x13f
>    __lock_acquire+0xa09/0x2040
>    lock_acquire+0x12e/0x350
>    fs_reclaim_acquire.part.102+0x29/0x30
>    kmem_cache_alloc+0x3d/0x2c0
>    alloc_extent_state+0xa7/0x410
>    __clear_extent_bit+0x3ea/0x570
>    try_release_extent_mapping+0x21a/0x260
>    __btrfs_releasepage+0xb0/0x1c0
>    btrfs_releasepage+0x161/0x170
>    try_to_release_page+0x162/0x1c0
>    shrink_page_list+0x1d5a/0x2fb0
>    shrink_inactive_list+0x451/0x940
>    shrink_node_memcg.constprop.88+0x4c9/0x5e0
>    shrink_node+0x12d/0x260
>    try_to_free_pages+0x418/0xaf0
>    __alloc_pages_slowpath+0x976/0x1790
>    __alloc_pages_nodemask+0x52c/0x5c0
>    new_slab+0x374/0x3f0
>    ___slab_alloc.constprop.81+0x47e/0x5a0
>    __slab_alloc.constprop.80+0x32/0x60
>    __kmalloc_track_caller+0x267/0x310
>    __kmalloc_reserve.isra.40+0x29/0x80
>    __alloc_skb+0xee/0x390
>    sk_stream_alloc_skb+0xb8/0x340
>    tcp_sendmsg_locked+0x8e6/0x1d30
>    tcp_sendmsg+0x27/0x40
>    inet_sendmsg+0xd0/0x310
>    sock_write_iter+0x17a/0x240
>    __vfs_write+0x2ab/0x380
>    vfs_write+0xfb/0x260
>    SyS_write+0xb6/0x140
>    do_syscall_64+0x1e5/0xc05
>    entry_SYSCALL64_slow_path+0x25/0x25
> 

I think I've hit another incarnation of that one. The call stack is:
http://paste.opensuse.org/3f22d013

The cleaned up callstack of all the ? entries look like:

__lock_acquire+0x2d8a/0x4b70
lock_acquire+0x110/0x330
kmem_cache_alloc+0x29/0x2c0
__clear_extent_bit+0x488/0x800
try_release_extent_mapping+0x288/0x3c0
__btrfs_releasepage+0x6c/0x140
shrink_page_list+0x227e/0x3110
shrink_inactive_list+0x414/0xdb0
shrink_node_memcg+0x7c8/0x1250
shrink_node+0x2ae/0xb50
do_try_to_free_pages+0x2b1/0xe20
try_to_free_pages+0x205/0x570
 __alloc_pages_nodemask+0xb91/0x2160
new_slab+0x27a/0x4e0
___slab_alloc+0x355/0x610
 __slab_alloc+0x4c/0xa0
kmem_cache_alloc+0x22d/0x2c0
mempool_alloc+0xe1/0x280
bio_alloc_bioset+0x1d7/0x830
ext4_mpage_readpages+0x99f/0x1000 <-
__do_page_cache_readahead+0x4be/0x840
filemap_fault+0x8c8/0xfc0
ext4_filemap_fault+0x7d/0xb0
__do_fault+0x7a/0x150
__handle_mm_fault+0x1542/0x29d0
__do_page_fault+0x557/0xa30
async_page_fault+0x4c/0x60


There is no fs stacking going on here and that is 4.15-rc9.


> This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework
> FS_RECLAIM annotation") which replaced lockdep_set_current_reclaim_state()/
> lockdep_clear_current_reclaim_state() in __perform_reclaim() and
> lockdep_trace_alloc() in slab_pre_alloc_hook() with fs_reclaim_acquire()/
> fs_reclaim_release(). Since __kmalloc_reserve() from __alloc_skb() adds
> __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, and all reclaim path simply
> propagates __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook()
> is trying to grab the 'fake' lock again when __perform_reclaim() already
> grabbed the 'fake' lock.
> 
> The
> 
>   /* this guy won't enter reclaim */
>   if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
>           return false;
> 
> test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
> was added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
> (__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread won't
> enter reclaim regardless of __GFP_NOMEMALLOC after commit 341ce06f69abfafa
> ("page allocator: calculate the alloc_flags for allocation only once")
> added the PF_MEMALLOC safeguard (
> 
>   /* Avoid recursion of direct reclaim */
>   if (p->flags & PF_MEMALLOC)
>           goto nopage;
> 
> in __alloc_pages_slowpath()).
> 
> Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow
> __need_fs_reclaim() to return false.
> 
> Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 81e18ce..19fb76b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3590,7 +3590,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask)
>  		return false;
>  
>  	/* this guy won't enter reclaim */
> -	if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
> +	if (current->flags & PF_MEMALLOC)
>  		return false;
>  
>  	/* We're only interested __GFP_FS allocations for now */
> 

WARNING: multiple messages have this Message-ID (diff)
From: Nikolay Borisov <nborisov@suse.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, peterz@infradead.org
Cc: torvalds@linux-foundation.org, davej@codemonkey.org.uk,
	npiggin@gmail.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, netdev@vger.kernel.org, mhocko@kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] lockdep: Fix fs_reclaim warning.
Date: Mon, 12 Feb 2018 14:08:32 +0200	[thread overview]
Message-ID: <e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com> (raw)
In-Reply-To: <201802082043.FFJ39503.SVQFFFOJMHLOtO@I-love.SAKURA.ne.jp>



On  8.02.2018 13:43, Tetsuo Handa wrote:
>>From 361d37a7d36978020dfb4c11ec1f4800937ccb68 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 8 Feb 2018 10:35:35 +0900
> Subject: [PATCH v2] lockdep: Fix fs_reclaim warning.
> 
> Dave Jones reported fs_reclaim lockdep warnings.
> 
>   ============================================
>   WARNING: possible recursive locking detected
>   4.15.0-rc9-backup-debug+ #1 Not tainted
>   --------------------------------------------
>   sshd/24800 is trying to acquire lock:
>    (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   but task is already holding lock:
>    (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(fs_reclaim);
>     lock(fs_reclaim);
> 
>    *** DEADLOCK ***
> 
>    May be due to missing lock nesting notation
> 
>   2 locks held by sshd/24800:
>    #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
>    #1:  (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
> 
>   stack backtrace:
>   CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
>   Call Trace:
>    dump_stack+0xbc/0x13f
>    __lock_acquire+0xa09/0x2040
>    lock_acquire+0x12e/0x350
>    fs_reclaim_acquire.part.102+0x29/0x30
>    kmem_cache_alloc+0x3d/0x2c0
>    alloc_extent_state+0xa7/0x410
>    __clear_extent_bit+0x3ea/0x570
>    try_release_extent_mapping+0x21a/0x260
>    __btrfs_releasepage+0xb0/0x1c0
>    btrfs_releasepage+0x161/0x170
>    try_to_release_page+0x162/0x1c0
>    shrink_page_list+0x1d5a/0x2fb0
>    shrink_inactive_list+0x451/0x940
>    shrink_node_memcg.constprop.88+0x4c9/0x5e0
>    shrink_node+0x12d/0x260
>    try_to_free_pages+0x418/0xaf0
>    __alloc_pages_slowpath+0x976/0x1790
>    __alloc_pages_nodemask+0x52c/0x5c0
>    new_slab+0x374/0x3f0
>    ___slab_alloc.constprop.81+0x47e/0x5a0
>    __slab_alloc.constprop.80+0x32/0x60
>    __kmalloc_track_caller+0x267/0x310
>    __kmalloc_reserve.isra.40+0x29/0x80
>    __alloc_skb+0xee/0x390
>    sk_stream_alloc_skb+0xb8/0x340
>    tcp_sendmsg_locked+0x8e6/0x1d30
>    tcp_sendmsg+0x27/0x40
>    inet_sendmsg+0xd0/0x310
>    sock_write_iter+0x17a/0x240
>    __vfs_write+0x2ab/0x380
>    vfs_write+0xfb/0x260
>    SyS_write+0xb6/0x140
>    do_syscall_64+0x1e5/0xc05
>    entry_SYSCALL64_slow_path+0x25/0x25
> 

I think I've hit another incarnation of that one. The call stack is:
http://paste.opensuse.org/3f22d013

The cleaned up callstack of all the ? entries look like:

__lock_acquire+0x2d8a/0x4b70
lock_acquire+0x110/0x330
kmem_cache_alloc+0x29/0x2c0
__clear_extent_bit+0x488/0x800
try_release_extent_mapping+0x288/0x3c0
__btrfs_releasepage+0x6c/0x140
shrink_page_list+0x227e/0x3110
shrink_inactive_list+0x414/0xdb0
shrink_node_memcg+0x7c8/0x1250
shrink_node+0x2ae/0xb50
do_try_to_free_pages+0x2b1/0xe20
try_to_free_pages+0x205/0x570
 __alloc_pages_nodemask+0xb91/0x2160
new_slab+0x27a/0x4e0
___slab_alloc+0x355/0x610
 __slab_alloc+0x4c/0xa0
kmem_cache_alloc+0x22d/0x2c0
mempool_alloc+0xe1/0x280
bio_alloc_bioset+0x1d7/0x830
ext4_mpage_readpages+0x99f/0x1000 <-
__do_page_cache_readahead+0x4be/0x840
filemap_fault+0x8c8/0xfc0
ext4_filemap_fault+0x7d/0xb0
__do_fault+0x7a/0x150
__handle_mm_fault+0x1542/0x29d0
__do_page_fault+0x557/0xa30
async_page_fault+0x4c/0x60


There is no fs stacking going on here and that is 4.15-rc9.


> This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework
> FS_RECLAIM annotation") which replaced lockdep_set_current_reclaim_state()/
> lockdep_clear_current_reclaim_state() in __perform_reclaim() and
> lockdep_trace_alloc() in slab_pre_alloc_hook() with fs_reclaim_acquire()/
> fs_reclaim_release(). Since __kmalloc_reserve() from __alloc_skb() adds
> __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, and all reclaim path simply
> propagates __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook()
> is trying to grab the 'fake' lock again when __perform_reclaim() already
> grabbed the 'fake' lock.
> 
> The
> 
>   /* this guy won't enter reclaim */
>   if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
>           return false;
> 
> test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
> was added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
> (__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread won't
> enter reclaim regardless of __GFP_NOMEMALLOC after commit 341ce06f69abfafa
> ("page allocator: calculate the alloc_flags for allocation only once")
> added the PF_MEMALLOC safeguard (
> 
>   /* Avoid recursion of direct reclaim */
>   if (p->flags & PF_MEMALLOC)
>           goto nopage;
> 
> in __alloc_pages_slowpath()).
> 
> Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow
> __need_fs_reclaim() to return false.
> 
> Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 81e18ce..19fb76b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3590,7 +3590,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask)
>  		return false;
>  
>  	/* this guy won't enter reclaim */
> -	if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
> +	if (current->flags & PF_MEMALLOC)
>  		return false;
>  
>  	/* We're only interested __GFP_FS allocations for now */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-02-12 12:08 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24  1:36 [4.15-rc9] fs_reclaim lockdep trace Dave Jones
2018-01-24  1:36 ` Dave Jones
2018-01-27 22:24 ` Dave Jones
2018-01-27 22:24   ` Dave Jones
2018-01-27 22:43   ` Linus Torvalds
2018-01-27 22:43     ` Linus Torvalds
2018-01-28  1:16     ` Tetsuo Handa
2018-01-28  1:16       ` Tetsuo Handa
2018-01-28  4:25       ` Tetsuo Handa
2018-01-28  4:25         ` Tetsuo Handa
2018-01-28  5:55         ` Tetsuo Handa
2018-01-28  5:55           ` Tetsuo Handa
2018-01-28  5:55           ` Tetsuo Handa
2018-01-29  2:43           ` Dave Jones
2018-01-29  2:43             ` Dave Jones
2018-01-29 10:27           ` Peter Zijlstra
2018-01-29 10:27             ` Peter Zijlstra
2018-01-29 11:47             ` Tetsuo Handa
2018-01-29 11:47               ` Tetsuo Handa
2018-01-29 13:55               ` Peter Zijlstra
2018-01-29 13:55                 ` Peter Zijlstra
2018-02-01 11:36                 ` Tetsuo Handa
2018-02-01 11:36                   ` Tetsuo Handa
2018-02-08 11:43                   ` [PATCH v2] lockdep: Fix fs_reclaim warning Tetsuo Handa
2018-02-08 11:43                     ` Tetsuo Handa
2018-02-12 12:08                     ` Nikolay Borisov [this message]
2018-02-12 12:08                       ` Nikolay Borisov
2018-02-12 13:46                       ` Tetsuo Handa
2018-02-12 13:46                         ` Tetsuo Handa
2018-02-19 11:52                     ` Tetsuo Handa
2018-02-19 11:52                       ` Tetsuo Handa
2018-02-27 21:50                     ` [PATCH v2 (RESEND)] " Tetsuo Handa
2018-03-07 21:44                       ` Tetsuo Handa
2018-03-07 23:33                       ` Andrew Morton
2018-03-08 15:30                         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6f5dc9b-2066-4f31-8e0f-c713f53e6592@suse.com \
    --to=nborisov@suse.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.