linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
@ 2018-02-06  0:49 Joel Fernandes
  2018-02-06 22:01 ` Minchan Kim
  2018-02-07  8:07 ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Joel Fernandes @ 2018-02-06  0:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joel Fernandes, Peter Zilstra, mhocko, minchan, linux-mm

During invocation of ashmem shrinker under memory pressure, ashmem
calls into VFS code via vfs_fallocate. We however make sure we
don't enter it if the allocation was GFP_FS to prevent looping
into filesystem code. However lockdep doesn't know this and prints
a lockdep splat as below.

This patch fixes the issue by releasing the reclaim_fs lock after
checking for GFP_FS but before calling into the VFS path, and
reacquiring it after so that lockdep can continue reporting any
reclaim issues later.

[ 2115.359650] -(1)[106:kswapd0]=================================
[ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
[ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G        W  O
[ 2115.359699] -(1)[106:kswapd0]---------------------------------
[ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
{IN-RECLAIM_FS-W} usage.
[ 2115.359732] -(1)[106:kswapd0]kswapd0/106 [HC0[0]:SC0[0]:HE1:SE1]
takes:
[ 2115.359748] (&sb->s_type->i_mutex_key#9){++++?+}, at:
[<ffffff9008470644>] shmem_fallocate+0x104/0x880
[ 2115.359809] -(1)[106:kswapd0]{RECLAIM_FS-ON-W} state was registered
at:
[ 2115.359828] -(1)[106:kswapd0]  mark_lock+0x2a4/0x10c0
[ 2115.359845] -(1)[106:kswapd0]  mark_held_locks+0xc0/0x128
[ 2115.359862] -(1)[106:kswapd0]  lockdep_trace_alloc+0x284/0x368
[ 2115.359881] -(1)[106:kswapd0]  kmem_cache_alloc+0x3c/0x368
[ 2115.359900] -(1)[106:kswapd0]  __d_alloc+0x3c/0x7a8
[ 2115.359917] -(1)[106:kswapd0]  d_alloc+0x34/0x140
[ 2115.359934] -(1)[106:kswapd0]  d_alloc_parallel+0xfc/0x1480
[ 2115.359953] -(1)[106:kswapd0]  lookup_open+0x3c4/0x12a8
[ 2115.359971] -(1)[106:kswapd0]  path_openat+0xb40/0x1dc0
[ 2115.359987] -(1)[106:kswapd0]  do_filp_open+0x170/0x258
[ 2115.360006] -(1)[106:kswapd0]  do_sys_open+0x1b8/0x2f0
[ 2115.360023] -(1)[106:kswapd0]  SyS_openat+0x10/0x18
[ 2115.360041] -(1)[106:kswapd0]  el0_svc_naked+0x24/0x28
[ 2115.360056] -(1)[106:kswapd0]irq event stamp: 2437365
[ 2115.360079] -(1)[106:kswapd0]hardirqs last  enabled at (2437365):
[<ffffff900a0ec6ec>] mutex_trylock+0x224/0x460
[ 2115.360098] -(1)[106:kswapd0]hardirqs last disabled at (2437364):
[<ffffff900a0ec5b0>] mutex_trylock+0xe8/0x460
[ 2115.360116] -(1)[106:kswapd0]softirqs last  enabled at (2436534):
[<ffffff90080822a0>] __do_softirq+0xc38/0x1190
[ 2115.360138] -(1)[106:kswapd0]softirqs last disabled at (2436515):
[<ffffff90080de05c>] irq_exit+0x1ac/0x228
[ 2115.360153] -(1)[106:kswapd0]\x0aother info that might help us debug
this:
[ 2115.360169] -(1)[106:kswapd0] Possible unsafe locking scenario:\x0a
[ 2115.360184] -(1)[106:kswapd0]       CPU0
[ 2115.360198] -(1)[106:kswapd0]       ----
[ 2115.360211] -(1)[106:kswapd0]  lock(&sb->s_type->i_mutex_key#9);
[ 2115.360252] -(1)[106:kswapd0]  <Interrupt>
[ 2115.360265] -(1)[106:kswapd0]    lock(&sb->s_type->i_mutex_key#9);
[ 2115.360304] -(1)[106:kswapd0]\x0a *** DEADLOCK ***\x0a
[ 2115.360322] -(1)[106:kswapd0] #0:  (shrinker_rwsem){++++..}, at:
[<ffffff9008459e08>] shrink_slab.part.15.constprop.28+0xb0/0xe10
[ 2115.360382] -(1)[106:kswapd0] #1:  (ashmem_mutex){+.+.+.}, at:
[<ffffff90098c4ba0>] ashmem_shrink_scan+0x80/0x308
[ 2115.360439] -(1)[106:kswapd0]\x0astack backtrace:
[ 2115.360462] -(1)[106:kswapd0]CPU: 1 PID: 106 Comm: kswapd0 Tainted: G
W  O    4.9.60+ #2
[ 2115.360478] -(1)[106:kswapd0]Hardware name: MT6765 (DT)
[ 2115.360494] -(1)[106:kswapd0]Call trace:
[ 2115.360515] -(1)[106:kswapd0][<ffffff9008092938>]
dump_backtrace+0x0/0x400
[ 2115.360533] -(1)[106:kswapd0][<ffffff900809302c>]
show_stack+0x14/0x20
[ 2115.360555] -(1)[106:kswapd0][<ffffff9008988bc0>]
dump_stack+0xb0/0xe8
[ 2115.360576] -(1)[106:kswapd0][<ffffff900841e8b0>]
print_usage_bug.part.24+0x548/0x568
[ 2115.360595] -(1)[106:kswapd0][<ffffff90082398bc>]
mark_lock+0x494/0x10c0
[ 2115.360613] -(1)[106:kswapd0][<ffffff900823bf1c>]
__lock_acquire+0xc94/0x58e8
[ 2115.360630] -(1)[106:kswapd0][<ffffff9008241ba0>]
lock_acquire+0x1d0/0x708
[ 2115.360650] -(1)[106:kswapd0][<ffffff900a0f27a8>]
down_write+0x48/0xd0
[ 2115.360669] -(1)[106:kswapd0][<ffffff9008470644>]
shmem_fallocate+0x104/0x880
[ 2115.360688] -(1)[106:kswapd0][<ffffff90098c4cc4>]
ashmem_shrink_scan+0x1a4/0x308
[ 2115.360709] -(1)[106:kswapd0][<ffffff900845a088>]
shrink_slab.part.15.constprop.28+0x330/0xe10
[ 2115.360729] -(1)[106:kswapd0][<ffffff9008463b34>]
shrink_node+0x1b4/0x588
[ 2115.360747] -(1)[106:kswapd0][<ffffff90084655f4>] kswapd+0x774/0x1640
[ 2115.360767] -(1)[106:kswapd0][<ffffff9008128a4c>] kthread+0x28c/0x310
[ 2115.360786] -(1)[106:kswapd0][<ffffff9008083f00>]
ret_from_fork+0x10/0x50

Cc: Peter Zilstra <peterz@infradead.org>
Cc: mhocko@kernel.org
Cc: minchan@kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 drivers/staging/android/ashmem.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 372ce9913e6d..7e060f32aaa8 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -32,6 +32,7 @@
 #include <linux/bitops.h>
 #include <linux/mutex.h>
 #include <linux/shmem_fs.h>
+#include <linux/sched/mm.h>
 #include "ashmem.h"
 
 #define ASHMEM_NAME_PREFIX "dev/ashmem/"
@@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	if (!(sc->gfp_mask & __GFP_FS))
 		return SHRINK_STOP;
 
-	if (!mutex_trylock(&ashmem_mutex))
+	/*
+	 * Release reclaim-fs marking since we've already checked GFP_FS, This
+	 * will prevent lockdep's reclaim recursion deadlock false positives.
+	 * We'll renable it before returning from this function.
+	 */
+	fs_reclaim_release(sc->gfp_mask);
+
+	if (!mutex_trylock(&ashmem_mutex)) {
+		fs_reclaim_acquire(sc->gfp_mask);
 		return -1;
+	}
 
 	list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
 		loff_t start = range->pgstart * PAGE_SIZE;
@@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 			break;
 	}
 	mutex_unlock(&ashmem_mutex);
+
+	fs_reclaim_acquire(sc->gfp_mask);
 	return freed;
 }
 
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-06  0:49 [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive Joel Fernandes
@ 2018-02-06 22:01 ` Minchan Kim
  2018-02-06 22:32   ` Joel Fernandes
  2018-02-07  8:07 ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2018-02-06 22:01 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-kernel, Peter Zilstra, mhocko, linux-mm

Hi Joel,

On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> During invocation of ashmem shrinker under memory pressure, ashmem
> calls into VFS code via vfs_fallocate. We however make sure we
> don't enter it if the allocation was GFP_FS to prevent looping
> into filesystem code. However lockdep doesn't know this and prints
> a lockdep splat as below.
> 
> This patch fixes the issue by releasing the reclaim_fs lock after
> checking for GFP_FS but before calling into the VFS path, and
> reacquiring it after so that lockdep can continue reporting any
> reclaim issues later.

At first glance, it looks reasonable. However, Couldn't we return
just 0 in ashmem_shrink_count when the context is under FS?

> 
> [ 2115.359650] -(1)[106:kswapd0]=================================
> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G        W  O
> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> {IN-RECLAIM_FS-W} usage.
> [ 2115.359732] -(1)[106:kswapd0]kswapd0/106 [HC0[0]:SC0[0]:HE1:SE1]
> takes:
> [ 2115.359748] (&sb->s_type->i_mutex_key#9){++++?+}, at:
> [<ffffff9008470644>] shmem_fallocate+0x104/0x880
> [ 2115.359809] -(1)[106:kswapd0]{RECLAIM_FS-ON-W} state was registered
> at:
> [ 2115.359828] -(1)[106:kswapd0]  mark_lock+0x2a4/0x10c0
> [ 2115.359845] -(1)[106:kswapd0]  mark_held_locks+0xc0/0x128
> [ 2115.359862] -(1)[106:kswapd0]  lockdep_trace_alloc+0x284/0x368
> [ 2115.359881] -(1)[106:kswapd0]  kmem_cache_alloc+0x3c/0x368
> [ 2115.359900] -(1)[106:kswapd0]  __d_alloc+0x3c/0x7a8
> [ 2115.359917] -(1)[106:kswapd0]  d_alloc+0x34/0x140
> [ 2115.359934] -(1)[106:kswapd0]  d_alloc_parallel+0xfc/0x1480
> [ 2115.359953] -(1)[106:kswapd0]  lookup_open+0x3c4/0x12a8
> [ 2115.359971] -(1)[106:kswapd0]  path_openat+0xb40/0x1dc0
> [ 2115.359987] -(1)[106:kswapd0]  do_filp_open+0x170/0x258
> [ 2115.360006] -(1)[106:kswapd0]  do_sys_open+0x1b8/0x2f0
> [ 2115.360023] -(1)[106:kswapd0]  SyS_openat+0x10/0x18
> [ 2115.360041] -(1)[106:kswapd0]  el0_svc_naked+0x24/0x28
> [ 2115.360056] -(1)[106:kswapd0]irq event stamp: 2437365
> [ 2115.360079] -(1)[106:kswapd0]hardirqs last  enabled at (2437365):
> [<ffffff900a0ec6ec>] mutex_trylock+0x224/0x460
> [ 2115.360098] -(1)[106:kswapd0]hardirqs last disabled at (2437364):
> [<ffffff900a0ec5b0>] mutex_trylock+0xe8/0x460
> [ 2115.360116] -(1)[106:kswapd0]softirqs last  enabled at (2436534):
> [<ffffff90080822a0>] __do_softirq+0xc38/0x1190
> [ 2115.360138] -(1)[106:kswapd0]softirqs last disabled at (2436515):
> [<ffffff90080de05c>] irq_exit+0x1ac/0x228
> [ 2115.360153] -(1)[106:kswapd0]\x0aother info that might help us debug
> this:
> [ 2115.360169] -(1)[106:kswapd0] Possible unsafe locking scenario:\x0a
> [ 2115.360184] -(1)[106:kswapd0]       CPU0
> [ 2115.360198] -(1)[106:kswapd0]       ----
> [ 2115.360211] -(1)[106:kswapd0]  lock(&sb->s_type->i_mutex_key#9);
> [ 2115.360252] -(1)[106:kswapd0]  <Interrupt>
> [ 2115.360265] -(1)[106:kswapd0]    lock(&sb->s_type->i_mutex_key#9);
> [ 2115.360304] -(1)[106:kswapd0]\x0a *** DEADLOCK ***\x0a
> [ 2115.360322] -(1)[106:kswapd0] #0:  (shrinker_rwsem){++++..}, at:
> [<ffffff9008459e08>] shrink_slab.part.15.constprop.28+0xb0/0xe10
> [ 2115.360382] -(1)[106:kswapd0] #1:  (ashmem_mutex){+.+.+.}, at:
> [<ffffff90098c4ba0>] ashmem_shrink_scan+0x80/0x308
> [ 2115.360439] -(1)[106:kswapd0]\x0astack backtrace:
> [ 2115.360462] -(1)[106:kswapd0]CPU: 1 PID: 106 Comm: kswapd0 Tainted: G
> W  O    4.9.60+ #2
> [ 2115.360478] -(1)[106:kswapd0]Hardware name: MT6765 (DT)
> [ 2115.360494] -(1)[106:kswapd0]Call trace:
> [ 2115.360515] -(1)[106:kswapd0][<ffffff9008092938>]
> dump_backtrace+0x0/0x400
> [ 2115.360533] -(1)[106:kswapd0][<ffffff900809302c>]
> show_stack+0x14/0x20
> [ 2115.360555] -(1)[106:kswapd0][<ffffff9008988bc0>]
> dump_stack+0xb0/0xe8
> [ 2115.360576] -(1)[106:kswapd0][<ffffff900841e8b0>]
> print_usage_bug.part.24+0x548/0x568
> [ 2115.360595] -(1)[106:kswapd0][<ffffff90082398bc>]
> mark_lock+0x494/0x10c0
> [ 2115.360613] -(1)[106:kswapd0][<ffffff900823bf1c>]
> __lock_acquire+0xc94/0x58e8
> [ 2115.360630] -(1)[106:kswapd0][<ffffff9008241ba0>]
> lock_acquire+0x1d0/0x708
> [ 2115.360650] -(1)[106:kswapd0][<ffffff900a0f27a8>]
> down_write+0x48/0xd0
> [ 2115.360669] -(1)[106:kswapd0][<ffffff9008470644>]
> shmem_fallocate+0x104/0x880
> [ 2115.360688] -(1)[106:kswapd0][<ffffff90098c4cc4>]
> ashmem_shrink_scan+0x1a4/0x308
> [ 2115.360709] -(1)[106:kswapd0][<ffffff900845a088>]
> shrink_slab.part.15.constprop.28+0x330/0xe10
> [ 2115.360729] -(1)[106:kswapd0][<ffffff9008463b34>]
> shrink_node+0x1b4/0x588
> [ 2115.360747] -(1)[106:kswapd0][<ffffff90084655f4>] kswapd+0x774/0x1640
> [ 2115.360767] -(1)[106:kswapd0][<ffffff9008128a4c>] kthread+0x28c/0x310
> [ 2115.360786] -(1)[106:kswapd0][<ffffff9008083f00>]
> ret_from_fork+0x10/0x50
> 
> Cc: Peter Zilstra <peterz@infradead.org>
> Cc: mhocko@kernel.org
> Cc: minchan@kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  drivers/staging/android/ashmem.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
> index 372ce9913e6d..7e060f32aaa8 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -32,6 +32,7 @@
>  #include <linux/bitops.h>
>  #include <linux/mutex.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/sched/mm.h>
>  #include "ashmem.h"
>  
>  #define ASHMEM_NAME_PREFIX "dev/ashmem/"
> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	if (!(sc->gfp_mask & __GFP_FS))
>  		return SHRINK_STOP;
>  
> -	if (!mutex_trylock(&ashmem_mutex))
> +	/*
> +	 * Release reclaim-fs marking since we've already checked GFP_FS, This
> +	 * will prevent lockdep's reclaim recursion deadlock false positives.
> +	 * We'll renable it before returning from this function.
> +	 */
> +	fs_reclaim_release(sc->gfp_mask);
> +
> +	if (!mutex_trylock(&ashmem_mutex)) {
> +		fs_reclaim_acquire(sc->gfp_mask);
>  		return -1;
> +	}
>  
>  	list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
>  		loff_t start = range->pgstart * PAGE_SIZE;
> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  			break;
>  	}
>  	mutex_unlock(&ashmem_mutex);
> +
> +	fs_reclaim_acquire(sc->gfp_mask);
>  	return freed;
>  }
>  
> -- 
> 2.16.0.rc1.238.g530d649a79-goog
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-06 22:01 ` Minchan Kim
@ 2018-02-06 22:32   ` Joel Fernandes
  2018-02-06 22:55     ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2018-02-06 22:32 UTC (permalink / raw)
  To: Minchan Kim
  Cc: LKML, Peter Zilstra, Michal Hocko, open list:MEMORY MANAGEMENT

Hi Minchan,

On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <minchan@kernel.org> wrote:
[...]
> On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>> During invocation of ashmem shrinker under memory pressure, ashmem
>> calls into VFS code via vfs_fallocate. We however make sure we
>> don't enter it if the allocation was GFP_FS to prevent looping
>> into filesystem code. However lockdep doesn't know this and prints
>> a lockdep splat as below.
>>
>> This patch fixes the issue by releasing the reclaim_fs lock after
>> checking for GFP_FS but before calling into the VFS path, and
>> reacquiring it after so that lockdep can continue reporting any
>> reclaim issues later.
>
> At first glance, it looks reasonable. However, Couldn't we return
> just 0 in ashmem_shrink_count when the context is under FS?
>

We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
though, did I miss something?

The problem is not that there is a deadlock that occurs, the problem
that even when we're not under FS, lockdep reports an issue that can't
happen. The fix is for the lockdep false positive that occurs.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-06 22:32   ` Joel Fernandes
@ 2018-02-06 22:55     ` Minchan Kim
  2018-02-06 23:16       ` Joel Fernandes
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2018-02-06 22:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Peter Zilstra, Michal Hocko, open list:MEMORY MANAGEMENT

On Tue, Feb 06, 2018 at 02:32:13PM -0800, Joel Fernandes wrote:
> Hi Minchan,
> 
> On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <minchan@kernel.org> wrote:
> [...]
> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> >> During invocation of ashmem shrinker under memory pressure, ashmem
> >> calls into VFS code via vfs_fallocate. We however make sure we
> >> don't enter it if the allocation was GFP_FS to prevent looping
> >> into filesystem code. However lockdep doesn't know this and prints
> >> a lockdep splat as below.
> >>
> >> This patch fixes the issue by releasing the reclaim_fs lock after
> >> checking for GFP_FS but before calling into the VFS path, and
> >> reacquiring it after so that lockdep can continue reporting any
> >> reclaim issues later.
> >
> > At first glance, it looks reasonable. However, Couldn't we return
> > just 0 in ashmem_shrink_count when the context is under FS?
> >
> 
> We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
> though, did I miss something?

I understand your concern now. Apart from that, if ashmem_shrink_count
is called under GFP_FS, you can just return 0 for removing pointless
ashmem_shrink_scan calling. But it might be trivial so up to you. :)

> 
> The problem is not that there is a deadlock that occurs, the problem
> that even when we're not under FS, lockdep reports an issue that can't
> happen. The fix is for the lockdep false positive that occurs.

Yub, you are right. I am happy to add

Reviewed-by: Minchan Kim <minchan@kernel.org?

Other than that, I thought a while we could make it in generic so we
can add SHRINKER_FS_AWARE like that so VM code itself can do for
preventing such false positive instead of doing it in each driver
itself.

However, if driver can do by itself, it could be more flexible.
Also, at this moment, my suggestion would be also overengineering so
I'm not against you. 

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-06 22:55     ` Minchan Kim
@ 2018-02-06 23:16       ` Joel Fernandes
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2018-02-06 23:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: LKML, Peter Zilstra, Michal Hocko, open list:MEMORY MANAGEMENT

Hi Minchan,

On Tue, Feb 6, 2018 at 2:55 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Tue, Feb 06, 2018 at 02:32:13PM -0800, Joel Fernandes wrote:
[...]
>> On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <minchan@kernel.org> wrote:
>> [...]
>> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>> >> During invocation of ashmem shrinker under memory pressure, ashmem
>> >> calls into VFS code via vfs_fallocate. We however make sure we
>> >> don't enter it if the allocation was GFP_FS to prevent looping
>> >> into filesystem code. However lockdep doesn't know this and prints
>> >> a lockdep splat as below.
>> >>
>> >> This patch fixes the issue by releasing the reclaim_fs lock after
>> >> checking for GFP_FS but before calling into the VFS path, and
>> >> reacquiring it after so that lockdep can continue reporting any
>> >> reclaim issues later.
>> >
>> > At first glance, it looks reasonable. However, Couldn't we return
>> > just 0 in ashmem_shrink_count when the context is under FS?
>> >
>>
>> We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
>> though, did I miss something?
>
> I understand your concern now. Apart from that, if ashmem_shrink_count
> is called under GFP_FS, you can just return 0 for removing pointless
> ashmem_shrink_scan calling. But it might be trivial so up to you. :)

Yes, I think we can do that in a subsequent patch since that's a
different optimization. Thanks for the suggestion.

>>
>> The problem is not that there is a deadlock that occurs, the problem
>> that even when we're not under FS, lockdep reports an issue that can't
>> happen. The fix is for the lockdep false positive that occurs.
>
> Yub, you are right. I am happy to add
>
> Reviewed-by: Minchan Kim <minchan@kernel.org?

Great, thanks!

>
> Other than that, I thought a while we could make it in generic so we
> can add SHRINKER_FS_AWARE like that so VM code itself can do for
> preventing such false positive instead of doing it in each driver
> itself.
>
> However, if driver can do by itself, it could be more flexible.

Yes, off the top it feels like something that driver you should do at
more finer grained level, since probably only driver knows that/if we
will be looping into FS.

Thanks for the review,

- Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-06  0:49 [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive Joel Fernandes
  2018-02-06 22:01 ` Minchan Kim
@ 2018-02-07  8:07 ` Peter Zijlstra
  2018-02-07 16:09   ` Joel Fernandes
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-07  8:07 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-kernel, mhocko, minchan, linux-mm

On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:

> [ 2115.359650] -(1)[106:kswapd0]=================================
> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G        W  O
> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> {IN-RECLAIM_FS-W} usage.

Please don't wrap log output, this is unreadable :/

Also, the output is from an ancient kernel and doesn't match the current
code.

> ---
>  drivers/staging/android/ashmem.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
> index 372ce9913e6d..7e060f32aaa8 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -32,6 +32,7 @@
>  #include <linux/bitops.h>
>  #include <linux/mutex.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/sched/mm.h>
>  #include "ashmem.h"
>  
>  #define ASHMEM_NAME_PREFIX "dev/ashmem/"
> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	if (!(sc->gfp_mask & __GFP_FS))
>  		return SHRINK_STOP;
>  
> -	if (!mutex_trylock(&ashmem_mutex))
> +	/*
> +	 * Release reclaim-fs marking since we've already checked GFP_FS, This
> +	 * will prevent lockdep's reclaim recursion deadlock false positives.
> +	 * We'll renable it before returning from this function.
> +	 */
> +	fs_reclaim_release(sc->gfp_mask);
> +
> +	if (!mutex_trylock(&ashmem_mutex)) {
> +		fs_reclaim_acquire(sc->gfp_mask);
>  		return -1;
> +	}
>  
>  	list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
>  		loff_t start = range->pgstart * PAGE_SIZE;
> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  			break;
>  	}
>  	mutex_unlock(&ashmem_mutex);
> +
> +	fs_reclaim_acquire(sc->gfp_mask);
>  	return freed;
>  }

Yuck that is horrible.. so if GFP_FS was set, we bail, but if GFP_FS
wasn't set, why is fs_reclaim_*() doing anything at all?

That is, __need_fd_reclaim() should return false when !GFP_FS.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-07  8:07 ` Peter Zijlstra
@ 2018-02-07 16:09   ` Joel Fernandes
  2018-02-07 16:58     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2018-02-07 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Michal Hocko, Minchan Kim, open list:MEMORY MANAGEMENT

Hi Peter,

On Wed, Feb 7, 2018 at 12:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>
>> [ 2115.359650] -(1)[106:kswapd0]=================================
>> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
>> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G        W  O
>> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
>> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
>> {IN-RECLAIM_FS-W} usage.
>
> Please don't wrap log output, this is unreadable :/

Sorry about that, here's the unwrapped output, I'll fix the commit
message in next rev: https://pastebin.com/e0BNGkaN

>
> Also, the output is from an ancient kernel and doesn't match the current
> code.

Right, however the driver hasn't changed and I don't see immediately
how lockdep handles this differently upstream, so I thought of fixing
it upstream.

>> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
>> index 372ce9913e6d..7e060f32aaa8 100644
>> --- a/drivers/staging/android/ashmem.c
>> +++ b/drivers/staging/android/ashmem.c
>> @@ -32,6 +32,7 @@
>>  #include <linux/bitops.h>
>>  #include <linux/mutex.h>
>>  #include <linux/shmem_fs.h>
>> +#include <linux/sched/mm.h>
>>  #include "ashmem.h"
>>
>>  #define ASHMEM_NAME_PREFIX "dev/ashmem/"
>> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>       if (!(sc->gfp_mask & __GFP_FS))
>>               return SHRINK_STOP;
>>
>> -     if (!mutex_trylock(&ashmem_mutex))
>> +     /*
>> +      * Release reclaim-fs marking since we've already checked GFP_FS, This
>> +      * will prevent lockdep's reclaim recursion deadlock false positives.
>> +      * We'll renable it before returning from this function.
>> +      */
>> +     fs_reclaim_release(sc->gfp_mask);
>> +
>> +     if (!mutex_trylock(&ashmem_mutex)) {
>> +             fs_reclaim_acquire(sc->gfp_mask);
>>               return -1;
>> +     }
>>
>>       list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
>>               loff_t start = range->pgstart * PAGE_SIZE;
>> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>                       break;
>>       }
>>       mutex_unlock(&ashmem_mutex);
>> +
>> +     fs_reclaim_acquire(sc->gfp_mask);
>>       return freed;
>>  }
>
> Yuck that is horrible.. so if GFP_FS was set, we bail, but if GFP_FS
> wasn't set, why is fs_reclaim_*() doing anything at all?
>
> That is, __need_fd_reclaim() should return false when !GFP_FS.

So my patch is wrong, very sorry about that. That's why I marked it as
RFC and wanted to get your expert eyes on it.
The bail out happens when GFP_FS is *not* set. Lockdep reports this
issue when GFP_FS is infact set, and we enter this path and acquire
the lock. So lockdep seems to be doing the right thing however by
design it is reporting a false-positive.

The real issue is that the lock being acquired is of the same lock
class and a different lock instance is acquired under GFP_FS that
happens to be of the same class.

So the issue seems to me to be:
Process A          kswapd
---------          ------
acquire i_mutex    Enter RECLAIM_FS

Enter RECLAIM_FS   acquire different i_mutex

Neil tried to fix this sometime back:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg623909.html
but it was kind of NAK'ed.

Any thoughts on how we can fix this?

Thanks Peter,

- Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-07 16:09   ` Joel Fernandes
@ 2018-02-07 16:58     ` Peter Zijlstra
  2018-02-07 22:27       ` Joel Fernandes
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-07 16:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Michal Hocko, Minchan Kim, open list:MEMORY MANAGEMENT

On Wed, Feb 07, 2018 at 08:09:36AM -0800, Joel Fernandes wrote:
> Hi Peter,
> 
> On Wed, Feb 7, 2018 at 12:07 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> >
> >> [ 2115.359650] -(1)[106:kswapd0]=================================
> >> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> >> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G        W  O
> >> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> >> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> >> {IN-RECLAIM_FS-W} usage.
> >
> > Please don't wrap log output, this is unreadable :/
> 
> Sorry about that, here's the unwrapped output, I'll fix the commit
> message in next rev: https://pastebin.com/e0BNGkaN

So if you trim that leading garbage: "[ 2115.359650] -(1)[106:kswapd0]"
you instantly have half you screen back.

> > Also, the output is from an ancient kernel and doesn't match the current
> > code.
> 
> Right, however the driver hasn't changed and I don't see immediately
> how lockdep handles this differently upstream, so I thought of fixing
> it upstream.

Well, the annotation got a complete rewrite. Granted, it _should_ be
similar, but the output will be different.


> The bail out happens when GFP_FS is *not* set.

Argh, reading is hard.

> Lockdep reports this issue when GFP_FS is infact set, and we enter
> this path and acquire the lock. So lockdep seems to be doing the right
> thing however by design it is reporting a false-positive.

So I'm not seeing how its a false positive. fs/inode.c sets a different
lock class per filesystem type. So recursing on an i_mutex within a
filesystem does sound dodgy.

> The real issue is that the lock being acquired is of the same lock
> class and a different lock instance is acquired under GFP_FS that
> happens to be of the same class.
> 
> So the issue seems to me to be:
> Process A          kswapd
> ---------          ------
> acquire i_mutex    Enter RECLAIM_FS
> 
> Enter RECLAIM_FS   acquire different i_mutex

That's not a false positive, that's a 2 process way of writing i_mutex
recursion.

What are the rules of acquiring two i_mutexes within a filesystem?

> Neil tried to fix this sometime back:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg623909.html
> but it was kind of NAK'ed.

So that got nacked because Neil tried to fix it in the vfs core. Also
not entirely sure that's the same problem.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-07 16:58     ` Peter Zijlstra
@ 2018-02-07 22:27       ` Joel Fernandes
  2018-02-08  0:35         ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Fernandes @ 2018-02-07 22:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Michal Hocko, Minchan Kim, open list:MEMORY MANAGEMENT,
	Theodore Ts'o, linux-fsdevel, Neil Brown, Dave Chinner

Hi Peter,

On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
[...]
>
>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>> this path and acquire the lock. So lockdep seems to be doing the right
>> thing however by design it is reporting a false-positive.
>
> So I'm not seeing how its a false positive. fs/inode.c sets a different
> lock class per filesystem type. So recursing on an i_mutex within a
> filesystem does sound dodgy.

But directory inodes and file inodes in the same filesystem share the
same lock class right? All the issues I've seen (both our's and
Neil's) are similar in that a directory inode's lock is held followed
by a RECLAIM_FS allocation, and in parallel to that, memory reclaim
involving the same FS is going on in another thread. In the splat I
shared, during the VFS lookup- the d_alloc is called with an inode's
lock held (I am guessing this the lock of the directory inode which is
locked just before the d_alloc), and in parallel (kswapd or some other
thread) is doing memory reclaim.

>> The real issue is that the lock being acquired is of the same lock
>> class and a different lock instance is acquired under GFP_FS that
>> happens to be of the same class.
>>
>> So the issue seems to me to be:
>> Process A          kswapd
>> ---------          ------
>> acquire i_mutex    Enter RECLAIM_FS
>>
>> Enter RECLAIM_FS   acquire different i_mutex
>
> That's not a false positive, that's a 2 process way of writing i_mutex
> recursion.

Yes, but I mention false positive since the kswapd->ashmem_shrink_scan
path can never acquire the mutex of a directory inode AFAIK. So from
that perspective it seems a false-positive.

>
> What are the rules of acquiring two i_mutexes within a filesystem?
>

I am not fully sure. I am CC'ing Ted and linux-fs-devel as well for
any input on this question.

>> Neil tried to fix this sometime back:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg623909.html
>> but it was kind of NAK'ed.
>
> So that got nacked because Neil tried to fix it in the vfs core. Also
> not entirely sure that's the same problem.

Yes, a similar fix was proposed internally here, I would say the
signature of the problem reported there is quite similar (its just
that there its nfsd mentioned as doing the reclaim instead of kswapd).

thanks,

- Joel

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg623986.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-07 22:27       ` Joel Fernandes
@ 2018-02-08  0:35         ` NeilBrown
  2018-02-08  2:29           ` Joel Fernandes
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2018-02-08  0:35 UTC (permalink / raw)
  To: Joel Fernandes, Peter Zijlstra
  Cc: LKML, Michal Hocko, Minchan Kim, open list:MEMORY MANAGEMENT,
	Theodore Ts'o, linux-fsdevel, Dave Chinner

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

On Wed, Feb 07 2018, Joel Fernandes wrote:

> Hi Peter,
>
> On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> [...]
>>
>>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>>> this path and acquire the lock. So lockdep seems to be doing the right
>>> thing however by design it is reporting a false-positive.
>>
>> So I'm not seeing how its a false positive. fs/inode.c sets a different
>> lock class per filesystem type. So recursing on an i_mutex within a
>> filesystem does sound dodgy.
>
> But directory inodes and file inodes in the same filesystem share the
> same lock class right?

Not since v2.6.24
Commit: 14358e6ddaed ("lockdep: annotate dir vs file i_mutex")

You were using 4.9.60. so they should be separate....

Maybe shmem_get_inode() needs to call unlock_new_inode() or just
lockdep_annotate_inode_mutex_key() after inode_init_owner().

Maybe inode_init_owner() should call lockdep_annotate_inode_mutex_key()
directly.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
  2018-02-08  0:35         ` NeilBrown
@ 2018-02-08  2:29           ` Joel Fernandes
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Fernandes @ 2018-02-08  2:29 UTC (permalink / raw)
  To: NeilBrown
  Cc: Peter Zijlstra, LKML, Michal Hocko, Minchan Kim,
	open list:MEMORY MANAGEMENT, Theodore Ts'o, linux-fsdevel,
	Dave Chinner

On Wed, Feb 7, 2018 at 4:35 PM, NeilBrown <neilb@suse.com> wrote:
>> On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> [...]
>>>
>>>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>>>> this path and acquire the lock. So lockdep seems to be doing the right
>>>> thing however by design it is reporting a false-positive.
>>>
>>> So I'm not seeing how its a false positive. fs/inode.c sets a different
>>> lock class per filesystem type. So recursing on an i_mutex within a
>>> filesystem does sound dodgy.
>>
>> But directory inodes and file inodes in the same filesystem share the
>> same lock class right?
>
> Not since v2.6.24
> Commit: 14358e6ddaed ("lockdep: annotate dir vs file i_mutex")
>
> You were using 4.9.60. so they should be separate....
>
> Maybe shmem_get_inode() needs to call unlock_new_inode() or just
> lockdep_annotate_inode_mutex_key() after inode_init_owner().
>
> Maybe inode_init_owner() should call lockdep_annotate_inode_mutex_key()
> directly.

Thanks for the ideas! I will test lockdep_annotate_inode_mutex_key
after inode_init_owner in shmem and let you know if the issue goes
away. It seems hugetlbfs does this too (I think for similar reasons).

- Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-08  2:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-06  0:49 [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive Joel Fernandes
2018-02-06 22:01 ` Minchan Kim
2018-02-06 22:32   ` Joel Fernandes
2018-02-06 22:55     ` Minchan Kim
2018-02-06 23:16       ` Joel Fernandes
2018-02-07  8:07 ` Peter Zijlstra
2018-02-07 16:09   ` Joel Fernandes
2018-02-07 16:58     ` Peter Zijlstra
2018-02-07 22:27       ` Joel Fernandes
2018-02-08  0:35         ` NeilBrown
2018-02-08  2:29           ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).