linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
@ 2017-06-12  0:47 Sahitya Tummala
  2017-06-12 13:11 ` Jan Kara
  2017-06-15 21:05 ` Andrew Morton
  0 siblings, 2 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-12  0:47 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

__list_lru_walk_one() can hold the spin lock for longer duration
if there are more number of entries to be isolated.

This results in "BUG: spinlock lockup suspected" in the below path -

[<ffffff8eca0fb0bc>] spin_bug+0x90
[<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
[<ffffff8ecafb7798>] _raw_spin_lock+0x28
[<ffffff8eca1ae884>] list_lru_add+0x28
[<ffffff8eca1f5dac>] dput+0x1c8
[<ffffff8eca1eb46c>] path_put+0x20
[<ffffff8eca1eb73c>] terminate_walk+0x3c
[<ffffff8eca1eee58>] path_lookupat+0x100
[<ffffff8eca1f00fc>] filename_lookup+0x6c
[<ffffff8eca1f0264>] user_path_at_empty+0x54
[<ffffff8eca1e066c>] SyS_faccessat+0xd0
[<ffffff8eca084e30>] el0_svc_naked+0x24

This nlru->lock has been acquired by another CPU in this path -

[<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
[<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
[<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
[<ffffff8eca1aec34>] list_lru_walk_node+0x40
[<ffffff8eca1f6620>] shrink_dcache_sb+0x60
[<ffffff8eca1e56a8>] do_remount_sb+0xbc
[<ffffff8eca1e583c>] do_emergency_remount+0xb0
[<ffffff8eca0ba510>] process_one_work+0x228
[<ffffff8eca0bb158>] worker_thread+0x2e0
[<ffffff8eca0c040c>] kthread+0xf4
[<ffffff8eca084dd0>] ret_from_fork+0x10

Link: http://marc.info/?t=149511514800002&r=1&w=2
Fix-suggested-by: Jan kara <jack@suse.cz>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
 mm/list_lru.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 5d8dffd..1af0709 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -249,6 +249,8 @@ restart:
 		default:
 			BUG();
 		}
+		if (cond_resched_lock(&nlru->lock))
+			goto restart;
 	}
 
 	spin_unlock(&nlru->lock);
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
  2017-06-12  0:47 [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock Sahitya Tummala
@ 2017-06-12 13:11 ` Jan Kara
  2017-06-15 21:05 ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Kara @ 2017-06-12 13:11 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	linux-mm, linux-kernel, linux-fsdevel

On Mon 12-06-17 06:17:20, Sahitya Tummala wrote:
> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
  2017-06-12  0:47 [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock Sahitya Tummala
  2017-06-12 13:11 ` Jan Kara
@ 2017-06-15 21:05 ` Andrew Morton
  2017-06-16 14:44   ` Sahitya Tummala
  2017-06-17 11:14   ` Vladimir Davydov
  1 sibling, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2017-06-15 21:05 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Vladimir Davydov, Jan Kara, linux-mm,
	linux-kernel, linux-fsdevel

On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:

> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);

This is rather worrying.

a) Why are we spending so long holding that lock that this is occurring?

b) With this patch, we're restarting the entire scan.  Are there
   situations in which this loop will never terminate, or will take a
   very long time?  Suppose that this process is getting rescheds
   blasted at it for some reason?

IOW this looks like a bit of a band-aid and a deeper analysis and
understanding might be needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
  2017-06-15 21:05 ` Andrew Morton
@ 2017-06-16 14:44   ` Sahitya Tummala
  2017-06-17 11:14   ` Vladimir Davydov
  1 sibling, 0 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-16 14:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Polakov, Vladimir Davydov, Jan Kara, linux-mm,
	linux-kernel, linux-fsdevel

On 6/16/2017 2:35 AM, Andrew Morton wrote:

> diff --git a/mm/list_lru.c b/mm/list_lru.c
>> index 5d8dffd..1af0709 100644
>> --- a/mm/list_lru.c
>> +++ b/mm/list_lru.c
>> @@ -249,6 +249,8 @@ restart:
>>   		default:
>>   			BUG();
>>   		}
>> +		if (cond_resched_lock(&nlru->lock))
>> +			goto restart;
>>   	}
>>   
>>   	spin_unlock(&nlru->lock);
> This is rather worrying.
>
> a) Why are we spending so long holding that lock that this is occurring?

At the time of crash I see that __list_lru_walk_one() shows number of
entries isolated as 1774475 with nr_items still pending as 130748. On my
system, I see that for dentries of 100000, it takes around 75ms for
__list_lru_walk_one() to complete. So for a total of 1900000 dentries as
in issue scenario, it will take upto 1425ms, which explains why the spin
lockup condition got hit on the other CPU.

It looks like __list_lru_walk_one() is expected to take more time if
there are more number of dentries present. And I think it is a valid
scenario to have those many number dentries.

> b) With this patch, we're restarting the entire scan.  Are there
>     situations in which this loop will never terminate, or will take a
>     very long time?  Suppose that this process is getting rescheds
>     blasted at it for some reason?

In the above scenario, I observed that the dentry entries from lru list
are removedall the time i.e LRU_REMOVED is returned from the
isolate (dentry_lru_isolate()) callback. I don't know if there is any case
where we skip several entries in the lru list and restartseveral times due
to this cond_resched_lock(). This can happen even with theexisting code
if LRU_RETRY is returned often from the isolate callback.
> IOW this looks like a bit of a band-aid and a deeper analysis and
> understanding might be needed.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
  2017-06-15 21:05 ` Andrew Morton
  2017-06-16 14:44   ` Sahitya Tummala
@ 2017-06-17 11:14   ` Vladimir Davydov
  2017-06-20  2:52     ` Sahitya Tummala
  1 sibling, 1 reply; 19+ messages in thread
From: Vladimir Davydov @ 2017-06-17 11:14 UTC (permalink / raw)
  To: Andrew Morton, Sahitya Tummala
  Cc: Alexander Polakov, Jan Kara, linux-mm, linux-kernel, linux-fsdevel

Hello,

On Thu, Jun 15, 2017 at 02:05:23PM -0700, Andrew Morton wrote:
> On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:
> 
> > __list_lru_walk_one() can hold the spin lock for longer duration
> > if there are more number of entries to be isolated.
> > 
> > This results in "BUG: spinlock lockup suspected" in the below path -
> > 
> > [<ffffff8eca0fb0bc>] spin_bug+0x90
> > [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> > [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> > [<ffffff8eca1ae884>] list_lru_add+0x28
> > [<ffffff8eca1f5dac>] dput+0x1c8
> > [<ffffff8eca1eb46c>] path_put+0x20
> > [<ffffff8eca1eb73c>] terminate_walk+0x3c
> > [<ffffff8eca1eee58>] path_lookupat+0x100
> > [<ffffff8eca1f00fc>] filename_lookup+0x6c
> > [<ffffff8eca1f0264>] user_path_at_empty+0x54
> > [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> > [<ffffff8eca084e30>] el0_svc_naked+0x24
> > 
> > This nlru->lock has been acquired by another CPU in this path -
> > 
> > [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> > [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> > [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> > [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> > [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> > [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> > [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> > [<ffffff8eca0ba510>] process_one_work+0x228
> > [<ffffff8eca0bb158>] worker_thread+0x2e0
> > [<ffffff8eca0c040c>] kthread+0xf4
> > [<ffffff8eca084dd0>] ret_from_fork+0x10
> > 
> > Link: http://marc.info/?t=149511514800002&r=1&w=2
> > Fix-suggested-by: Jan kara <jack@suse.cz>
> > Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> > ---
> >  mm/list_lru.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/list_lru.c b/mm/list_lru.c
> > index 5d8dffd..1af0709 100644
> > --- a/mm/list_lru.c
> > +++ b/mm/list_lru.c
> > @@ -249,6 +249,8 @@ restart:
> >  		default:
> >  			BUG();
> >  		}
> > +		if (cond_resched_lock(&nlru->lock))
> > +			goto restart;
> >  	}
> >  
> >  	spin_unlock(&nlru->lock);
> 
> This is rather worrying.
> 
> a) Why are we spending so long holding that lock that this is occurring?
> 
> b) With this patch, we're restarting the entire scan.  Are there
>    situations in which this loop will never terminate, or will take a
>    very long time?  Suppose that this process is getting rescheds
>    blasted at it for some reason?
> 
> IOW this looks like a bit of a band-aid and a deeper analysis and
> understanding might be needed.

The goal of list_lru_walk is removing inactive entries from the lru list
(LRU_REMOVED). Memory shrinkers may also choose to move active entries
to the tail of the lru list (LRU_ROTATED). LRU_SKIP is supposed to be
returned only to avoid a possible deadlock. So I don't see how
restarting lru walk could have adverse effects.

However, I do find this patch kinda ugly, because:

 - list_lru_walk already gives you a way to avoid a lockup - just make
   the callback reschedule and return LRU_RETRY every now and then, see
   shadow_lru_isolate() for an example. Alternatively, you can limit the
   number of entries scanned in one go (nr_to_walk) and reschedule
   between calls. This is what shrink_slab() does: the number of
   dentries scanned without releasing the lock is limited to 1024, see
   how super_block::s_shrink is initialized.

 - Someone might want to call list_lru_walk with a spin lock held, and I
   don't see anything wrong in doing that. With your patch it can't be
   done anymore.

That said, I think it would be better to patch shrink_dcache_sb() or
dentry_lru_isolate_shrink() instead of list_lru_walk() in order to fix
this lockup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
  2017-06-17 11:14   ` Vladimir Davydov
@ 2017-06-20  2:52     ` Sahitya Tummala
  2017-06-21  6:39       ` [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
  0 siblings, 1 reply; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-20  2:52 UTC (permalink / raw)
  To: Vladimir Davydov, Andrew Morton
  Cc: Alexander Polakov, Jan Kara, linux-mm, linux-kernel, linux-fsdevel

Hello,

On 6/17/2017 4:44 PM, Vladimir Davydov wrote:

>
> That said, I think it would be better to patch shrink_dcache_sb() or
> dentry_lru_isolate_shrink() instead of list_lru_walk() in order to fix
> this lockup.

Thanks for the review. I will enhance the patch as per your suggestion.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-20  2:52     ` Sahitya Tummala
@ 2017-06-21  6:39       ` Sahitya Tummala
  2017-06-21 16:31         ` Vladimir Davydov
  0 siblings, 1 reply; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-21  6:39 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	viro, linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
longer duration if there are more number of items in the lru list.
As per the current code, it can hold the spin lock for upto maximum
UINT_MAX entries at a time. So if there are more number of items in
the lru list, then "BUG: spinlock lockup suspected" is observed in
the below path -

[<ffffff8eca0fb0bc>] spin_bug+0x90
[<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
[<ffffff8ecafb7798>] _raw_spin_lock+0x28
[<ffffff8eca1ae884>] list_lru_add+0x28
[<ffffff8eca1f5dac>] dput+0x1c8
[<ffffff8eca1eb46c>] path_put+0x20
[<ffffff8eca1eb73c>] terminate_walk+0x3c
[<ffffff8eca1eee58>] path_lookupat+0x100
[<ffffff8eca1f00fc>] filename_lookup+0x6c
[<ffffff8eca1f0264>] user_path_at_empty+0x54
[<ffffff8eca1e066c>] SyS_faccessat+0xd0
[<ffffff8eca084e30>] el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

[<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
[<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
[<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
[<ffffff8eca1aec34>] list_lru_walk_node+0x40
[<ffffff8eca1f6620>] shrink_dcache_sb+0x60
[<ffffff8eca1e56a8>] do_remount_sb+0xbc
[<ffffff8eca1e583c>] do_emergency_remount+0xb0
[<ffffff8eca0ba510>] process_one_work+0x228
[<ffffff8eca0bb158>] worker_thread+0x2e0
[<ffffff8eca0c040c>] kthread+0xf4
[<ffffff8eca084dd0>] ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked
from the lru list to 1024 at once. Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Fix-suggested-by: Jan kara <jack@suse.cz>
Fix-suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
v2: patch shrink_dcache_sb() instead of list_lru_walk()
---
 fs/dcache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index cddf397..c8ca150 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1133,10 +1133,11 @@ void shrink_dcache_sb(struct super_block *sb)
 		LIST_HEAD(dispose);
 
 		freed = list_lru_walk(&sb->s_dentry_lru,
-			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
+			dentry_lru_isolate_shrink, &dispose, 1024);
 
 		this_cpu_sub(nr_dentry_unused, freed);
 		shrink_dentry_list(&dispose);
+		cond_resched();
 	} while (freed > 0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-21  6:39       ` [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
@ 2017-06-21 16:31         ` Vladimir Davydov
  2017-06-22 16:31           ` Sahitya Tummala
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Davydov @ 2017-06-21 16:31 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Wed, Jun 21, 2017 at 12:09:15PM +0530, Sahitya Tummala wrote:
> __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
> longer duration if there are more number of items in the lru list.
> As per the current code, it can hold the spin lock for upto maximum
> UINT_MAX entries at a time. So if there are more number of items in
> the lru list, then "BUG: spinlock lockup suspected" is observed in
> the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock is acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Fix this lockup by reducing the number of entries to be shrinked
> from the lru list to 1024 at once. Also, add cond_resched() before
> processing the lru list again.
> 
> Link: http://marc.info/?t=149722864900001&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Fix-suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
> v2: patch shrink_dcache_sb() instead of list_lru_walk()
> ---
>  fs/dcache.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index cddf397..c8ca150 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1133,10 +1133,11 @@ void shrink_dcache_sb(struct super_block *sb)
>  		LIST_HEAD(dispose);
>  
>  		freed = list_lru_walk(&sb->s_dentry_lru,
> -			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
> +			dentry_lru_isolate_shrink, &dispose, 1024);
>  
>  		this_cpu_sub(nr_dentry_unused, freed);
>  		shrink_dentry_list(&dispose);
> +		cond_resched();
>  	} while (freed > 0);

In an extreme case, a single invocation of list_lru_walk() can skip all
1024 dentries, in which case 'freed' will be 0 forcing us to break the
loop prematurely. I think we should loop until there's at least one
dentry left on the LRU, i.e.

	while (list_lru_count(&sb->s_dentry_lru) > 0)

However, even that wouldn't be quite correct, because list_lru_count()
iterates over all memory cgroups to sum list_lru_one->nr_items, which
can race with memcg offlining code migrating dentries off a dead cgroup
(see memcg_drain_all_list_lrus()). So it looks like to make this check
race-free, we need to account the number of entries on the LRU not only
per memcg, but also per node, i.e. add list_lru_node->nr_items.
Fortunately, list_lru entries can't be migrated between NUMA nodes.

>  }
>  EXPORT_SYMBOL(shrink_dcache_sb);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-21 16:31         ` Vladimir Davydov
@ 2017-06-22 16:31           ` Sahitya Tummala
  2017-06-22 17:49             ` Vladimir Davydov
  0 siblings, 1 reply; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-22 16:31 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel



On 6/21/2017 10:01 PM, Vladimir Davydov wrote:
>
>> index cddf397..c8ca150 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1133,10 +1133,11 @@ void shrink_dcache_sb(struct super_block *sb)
>>   		LIST_HEAD(dispose);
>>   
>>   		freed = list_lru_walk(&sb->s_dentry_lru,
>> -			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
>> +			dentry_lru_isolate_shrink, &dispose, 1024);
>>   
>>   		this_cpu_sub(nr_dentry_unused, freed);
>>   		shrink_dentry_list(&dispose);
>> +		cond_resched();
>>   	} while (freed > 0);
> In an extreme case, a single invocation of list_lru_walk() can skip all
> 1024 dentries, in which case 'freed' will be 0 forcing us to break the
> loop prematurely. I think we should loop until there's at least one
> dentry left on the LRU, i.e.
>
> 	while (list_lru_count(&sb->s_dentry_lru) > 0)
>
> However, even that wouldn't be quite correct, because list_lru_count()
> iterates over all memory cgroups to sum list_lru_one->nr_items, which
> can race with memcg offlining code migrating dentries off a dead cgroup
> (see memcg_drain_all_list_lrus()). So it looks like to make this check
> race-free, we need to account the number of entries on the LRU not only
> per memcg, but also per node, i.e. add list_lru_node->nr_items.
> Fortunately, list_lru entries can't be migrated between NUMA nodes.
It looks like list_lru_count() is iterating per node before iterating 
over all memory
cgroups as below -

unsigned long list_lru_count_node(struct list_lru *lru, int nid)
{
         long count = 0;
         int memcg_idx;

         count += __list_lru_count_one(lru, nid, -1);
         if (list_lru_memcg_aware(lru)) {
                 for_each_memcg_cache_index(memcg_idx)
                         count += __list_lru_count_one(lru, nid, memcg_idx);
         }
         return count;
}

The first call to __list_lru_count_one() is iterating all the items per 
node i.e, nlru->lru->nr_items.
Is my understanding correct? If not, could you please clarify on how to 
get the lru items per node?

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-22 16:31           ` Sahitya Tummala
@ 2017-06-22 17:49             ` Vladimir Davydov
  2017-06-28  6:07               ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Sahitya Tummala
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Davydov @ 2017-06-22 17:49 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Jun 22, 2017 at 10:01:39PM +0530, Sahitya Tummala wrote:
> 
> 
> On 6/21/2017 10:01 PM, Vladimir Davydov wrote:
> >
> >>index cddf397..c8ca150 100644
> >>--- a/fs/dcache.c
> >>+++ b/fs/dcache.c
> >>@@ -1133,10 +1133,11 @@ void shrink_dcache_sb(struct super_block *sb)
> >>  		LIST_HEAD(dispose);
> >>  		freed = list_lru_walk(&sb->s_dentry_lru,
> >>-			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
> >>+			dentry_lru_isolate_shrink, &dispose, 1024);
> >>  		this_cpu_sub(nr_dentry_unused, freed);
> >>  		shrink_dentry_list(&dispose);
> >>+		cond_resched();
> >>  	} while (freed > 0);
> >In an extreme case, a single invocation of list_lru_walk() can skip all
> >1024 dentries, in which case 'freed' will be 0 forcing us to break the
> >loop prematurely. I think we should loop until there's at least one
> >dentry left on the LRU, i.e.
> >
> >	while (list_lru_count(&sb->s_dentry_lru) > 0)
> >
> >However, even that wouldn't be quite correct, because list_lru_count()
> >iterates over all memory cgroups to sum list_lru_one->nr_items, which
> >can race with memcg offlining code migrating dentries off a dead cgroup
> >(see memcg_drain_all_list_lrus()). So it looks like to make this check
> >race-free, we need to account the number of entries on the LRU not only
> >per memcg, but also per node, i.e. add list_lru_node->nr_items.
> >Fortunately, list_lru entries can't be migrated between NUMA nodes.
> It looks like list_lru_count() is iterating per node before iterating over
> all memory
> cgroups as below -
> 
> unsigned long list_lru_count_node(struct list_lru *lru, int nid)
> {
>         long count = 0;
>         int memcg_idx;
> 
>         count += __list_lru_count_one(lru, nid, -1);
>         if (list_lru_memcg_aware(lru)) {
>                 for_each_memcg_cache_index(memcg_idx)
>                         count += __list_lru_count_one(lru, nid, memcg_idx);
>         }
>         return count;
> }
> 
> The first call to __list_lru_count_one() is iterating all the items per node
> i.e, nlru->lru->nr_items.

lru->node[nid].lru.nr_items returned by __list_lru_count_one(lru, nid, -1)
only counts items accounted to the root cgroup, not the total number of
entries on the node.

> Is my understanding correct? If not, could you please clarify on how to get
> the lru items per node?

What I mean is iterating over list_lru_node->memcg_lrus to count the
number of entries on the node is racy. For example, suppose you have
three cgroups with the following values of list_lru_one->nr_items:

  0   0   10

While list_lru_count_node() is at #1, cgroup #2 is offlined and its
list_lru_one is drained, i.e. its entries are migrated to the parent
cgroup, which happens to be #0, i.e. we see the following picture:

 10   0   0

     ^^^
  memcg_ids points here in list_lru_count_node() 

Then the count returned by list_lru_count_node() will be 0, although
there are still 10 entries on the list.

To avoid this race, we could keep list_lru_node->lock locked while
walking over list_lru_node->memcg_lrus, but that's too heavy. I'd prefer
adding list_lru_node->nr_count which would be equal to the total number
of list_lru entries on the node, i.e. sum of list_lru_node->lru.nr_lrus
and list_lru_node->memcg_lrus->lru[]->nr_items.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free
  2017-06-22 17:49             ` Vladimir Davydov
@ 2017-06-28  6:07               ` Sahitya Tummala
  2017-06-28  6:07                 ` [PATCH v3 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
  2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
  0 siblings, 2 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-28  6:07 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	viro, linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

list_lru_count_node() iterates over all memcgs to get
the total number of entries on the node but it can race with
memcg_drain_all_list_lrus(), which migrates the entries from
a dead cgroup to another. This can return incorrect number of
entries from list_lru_count_node().

Fix this by keeping track of entries per node and simply return
it in list_lru_count_node().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
 include/linux/list_lru.h |  1 +
 mm/list_lru.c            | 14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index cb0ba9f..eff61bc 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -44,6 +44,7 @@ struct list_lru_node {
 	/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
 	struct list_lru_memcg	*memcg_lrus;
 #endif
+	long nr_count;
 } ____cacheline_aligned_in_smp;
 
 struct list_lru {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 234676e..d417b9f 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -117,6 +117,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
 		l = list_lru_from_kmem(nlru, item);
 		list_add_tail(item, &l->list);
 		l->nr_items++;
+		nlru->nr_count++;
 		spin_unlock(&nlru->lock);
 		return true;
 	}
@@ -136,6 +137,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
 		l = list_lru_from_kmem(nlru, item);
 		list_del_init(item);
 		l->nr_items--;
+		nlru->nr_count--;
 		spin_unlock(&nlru->lock);
 		return true;
 	}
@@ -183,15 +185,10 @@ unsigned long list_lru_count_one(struct list_lru *lru,
 
 unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 {
-	long count = 0;
-	int memcg_idx;
+	struct list_lru_node *nlru;
 
-	count += __list_lru_count_one(lru, nid, -1);
-	if (list_lru_memcg_aware(lru)) {
-		for_each_memcg_cache_index(memcg_idx)
-			count += __list_lru_count_one(lru, nid, memcg_idx);
-	}
-	return count;
+	nlru = &lru->node[nid];
+	return nlru->nr_count;
 }
 EXPORT_SYMBOL_GPL(list_lru_count_node);
 
@@ -226,6 +223,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			assert_spin_locked(&nlru->lock);
 		case LRU_REMOVED:
 			isolated++;
+			nlru->nr_count--;
 			/*
 			 * If the lru lock has been dropped, our list
 			 * traversal is now invalid and so we have to
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-28  6:07               ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Sahitya Tummala
@ 2017-06-28  6:07                 ` Sahitya Tummala
  2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
  1 sibling, 0 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-28  6:07 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	viro, linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
longer duration if there are more number of items in the lru list.
As per the current code, it can hold the spin lock for upto maximum
UINT_MAX entries at a time. So if there are more number of items in
the lru list, then "BUG: spinlock lockup suspected" is observed in
the below path -

[<ffffff8eca0fb0bc>] spin_bug+0x90
[<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
[<ffffff8ecafb7798>] _raw_spin_lock+0x28
[<ffffff8eca1ae884>] list_lru_add+0x28
[<ffffff8eca1f5dac>] dput+0x1c8
[<ffffff8eca1eb46c>] path_put+0x20
[<ffffff8eca1eb73c>] terminate_walk+0x3c
[<ffffff8eca1eee58>] path_lookupat+0x100
[<ffffff8eca1f00fc>] filename_lookup+0x6c
[<ffffff8eca1f0264>] user_path_at_empty+0x54
[<ffffff8eca1e066c>] SyS_faccessat+0xd0
[<ffffff8eca084e30>] el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

[<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
[<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
[<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
[<ffffff8eca1aec34>] list_lru_walk_node+0x40
[<ffffff8eca1f6620>] shrink_dcache_sb+0x60
[<ffffff8eca1e56a8>] do_remount_sb+0xbc
[<ffffff8eca1e583c>] do_emergency_remount+0xb0
[<ffffff8eca0ba510>] process_one_work+0x228
[<ffffff8eca0bb158>] worker_thread+0x2e0
[<ffffff8eca0c040c>] kthread+0xf4
[<ffffff8eca084dd0>] ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked
from the lru list to 1024 at once. Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Fix-suggested-by: Jan kara <jack@suse.cz>
Fix-suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
v3: use list_lru_count() instead of freed in while loop to
cover an extreme case where a single invocation of list_lru_walk()
can skip all 1024 dentries, in which case 'freed' will be 0 forcing
us to break the loop prematurely. 

v2: patch shrink_dcache_sb() instead of list_lru_walk()
---

 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a9f995f..1161390 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1133,11 +1133,12 @@ void shrink_dcache_sb(struct super_block *sb)
 		LIST_HEAD(dispose);
 
 		freed = list_lru_walk(&sb->s_dentry_lru,
-			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
+			dentry_lru_isolate_shrink, &dispose, 1024);
 
 		this_cpu_sub(nr_dentry_unused, freed);
 		shrink_dentry_list(&dispose);
-	} while (freed > 0);
+		cond_resched();
+	} while (list_lru_count(&sb->s_dentry_lru) > 0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free
  2017-06-28  6:07               ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Sahitya Tummala
  2017-06-28  6:07                 ` [PATCH v3 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
@ 2017-06-28 17:18                 ` Vladimir Davydov
  2017-06-29  3:39                   ` [PATCH v4 " Sahitya Tummala
  2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
  1 sibling, 2 replies; 19+ messages in thread
From: Vladimir Davydov @ 2017-06-28 17:18 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Wed, Jun 28, 2017 at 11:37:23AM +0530, Sahitya Tummala wrote:
> list_lru_count_node() iterates over all memcgs to get
> the total number of entries on the node but it can race with
> memcg_drain_all_list_lrus(), which migrates the entries from
> a dead cgroup to another. This can return incorrect number of
> entries from list_lru_count_node().
> 
> Fix this by keeping track of entries per node and simply return
> it in list_lru_count_node().
> 
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
>  include/linux/list_lru.h |  1 +
>  mm/list_lru.c            | 14 ++++++--------
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index cb0ba9f..eff61bc 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -44,6 +44,7 @@ struct list_lru_node {
>  	/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
>  	struct list_lru_memcg	*memcg_lrus;
>  #endif
> +	long nr_count;

'nr_count' sounds awkward. I think it should be called 'nr_items'.

Other than that, looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free
  2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
@ 2017-06-29  3:39                   ` Sahitya Tummala
  2017-07-01 16:28                     ` Vladimir Davydov
  2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
  1 sibling, 1 reply; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-29  3:39 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	viro, linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

list_lru_count_node() iterates over all memcgs to get
the total number of entries on the node but it can race with
memcg_drain_all_list_lrus(), which migrates the entries from
a dead cgroup to another. This can return incorrect number of
entries from list_lru_count_node().

Fix this by keeping track of entries per node and simply return
it in list_lru_count_node().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
 include/linux/list_lru.h |  1 +
 mm/list_lru.c            | 14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index cb0ba9f..fa7fd03 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -44,6 +44,7 @@ struct list_lru_node {
 	/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
 	struct list_lru_memcg	*memcg_lrus;
 #endif
+	long nr_items;
 } ____cacheline_aligned_in_smp;
 
 struct list_lru {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 234676e..7a40fa2 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -117,6 +117,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
 		l = list_lru_from_kmem(nlru, item);
 		list_add_tail(item, &l->list);
 		l->nr_items++;
+		nlru->nr_items++;
 		spin_unlock(&nlru->lock);
 		return true;
 	}
@@ -136,6 +137,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
 		l = list_lru_from_kmem(nlru, item);
 		list_del_init(item);
 		l->nr_items--;
+		nlru->nr_items--;
 		spin_unlock(&nlru->lock);
 		return true;
 	}
@@ -183,15 +185,10 @@ unsigned long list_lru_count_one(struct list_lru *lru,
 
 unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 {
-	long count = 0;
-	int memcg_idx;
+	struct list_lru_node *nlru;
 
-	count += __list_lru_count_one(lru, nid, -1);
-	if (list_lru_memcg_aware(lru)) {
-		for_each_memcg_cache_index(memcg_idx)
-			count += __list_lru_count_one(lru, nid, memcg_idx);
-	}
-	return count;
+	nlru = &lru->node[nid];
+	return nlru->nr_items;
 }
 EXPORT_SYMBOL_GPL(list_lru_count_node);
 
@@ -226,6 +223,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			assert_spin_locked(&nlru->lock);
 		case LRU_REMOVED:
 			isolated++;
+			nlru->nr_items--;
 			/*
 			 * If the lru lock has been dropped, our list
 			 * traversal is now invalid and so we have to
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
  2017-06-29  3:39                   ` [PATCH v4 " Sahitya Tummala
@ 2017-06-29  3:39                   ` Sahitya Tummala
  2017-06-29 22:48                     ` Andrew Morton
  2017-07-01 16:28                     ` Vladimir Davydov
  1 sibling, 2 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-29  3:39 UTC (permalink / raw)
  To: Alexander Polakov, Andrew Morton, Vladimir Davydov, Jan Kara,
	viro, linux-mm, linux-kernel, linux-fsdevel
  Cc: Sahitya Tummala

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
longer duration if there are more number of items in the lru list.
As per the current code, it can hold the spin lock for upto maximum
UINT_MAX entries at a time. So if there are more number of items in
the lru list, then "BUG: spinlock lockup suspected" is observed in
the below path -

[<ffffff8eca0fb0bc>] spin_bug+0x90
[<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
[<ffffff8ecafb7798>] _raw_spin_lock+0x28
[<ffffff8eca1ae884>] list_lru_add+0x28
[<ffffff8eca1f5dac>] dput+0x1c8
[<ffffff8eca1eb46c>] path_put+0x20
[<ffffff8eca1eb73c>] terminate_walk+0x3c
[<ffffff8eca1eee58>] path_lookupat+0x100
[<ffffff8eca1f00fc>] filename_lookup+0x6c
[<ffffff8eca1f0264>] user_path_at_empty+0x54
[<ffffff8eca1e066c>] SyS_faccessat+0xd0
[<ffffff8eca084e30>] el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

[<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
[<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
[<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
[<ffffff8eca1aec34>] list_lru_walk_node+0x40
[<ffffff8eca1f6620>] shrink_dcache_sb+0x60
[<ffffff8eca1e56a8>] do_remount_sb+0xbc
[<ffffff8eca1e583c>] do_emergency_remount+0xb0
[<ffffff8eca0ba510>] process_one_work+0x228
[<ffffff8eca0bb158>] worker_thread+0x2e0
[<ffffff8eca0c040c>] kthread+0xf4
[<ffffff8eca084dd0>] ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked
from the lru list to 1024 at once. Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Fix-suggested-by: Jan kara <jack@suse.cz>
Fix-suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a9f995f..1161390 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1133,11 +1133,12 @@ void shrink_dcache_sb(struct super_block *sb)
 		LIST_HEAD(dispose);
 
 		freed = list_lru_walk(&sb->s_dentry_lru,
-			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
+			dentry_lru_isolate_shrink, &dispose, 1024);
 
 		this_cpu_sub(nr_dentry_unused, freed);
 		shrink_dentry_list(&dispose);
-	} while (freed > 0);
+		cond_resched();
+	} while (list_lru_count(&sb->s_dentry_lru) > 0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
@ 2017-06-29 22:48                     ` Andrew Morton
  2017-06-30  3:16                       ` Sahitya Tummala
  2017-07-01 16:28                     ` Vladimir Davydov
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2017-06-29 22:48 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Vladimir Davydov, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, 29 Jun 2017 09:09:35 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:

> __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
> longer duration if there are more number of items in the lru list.
> As per the current code, it can hold the spin lock for upto maximum
> UINT_MAX entries at a time. So if there are more number of items in
> the lru list, then "BUG: spinlock lockup suspected" is observed in
> the below path -
> 
> ...
>
> Fix this lockup by reducing the number of entries to be shrinked
> from the lru list to 1024 at once. Also, add cond_resched() before
> processing the lru list again.
> 
> ...
>
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1133,11 +1133,12 @@ void shrink_dcache_sb(struct super_block *sb)
>  		LIST_HEAD(dispose);
>  
>  		freed = list_lru_walk(&sb->s_dentry_lru,
> -			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
> +			dentry_lru_isolate_shrink, &dispose, 1024);
>  
>  		this_cpu_sub(nr_dentry_unused, freed);
>  		shrink_dentry_list(&dispose);
> -	} while (freed > 0);
> +		cond_resched();
> +	} while (list_lru_count(&sb->s_dentry_lru) > 0);
>  }
>  EXPORT_SYMBOL(shrink_dcache_sb);

I'll add a cc:stable to this one - a large dentry list is a relatively
common thing.

I'm assumng that [1/2] does not need to be backported, OK?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-29 22:48                     ` Andrew Morton
@ 2017-06-30  3:16                       ` Sahitya Tummala
  0 siblings, 0 replies; 19+ messages in thread
From: Sahitya Tummala @ 2017-06-30  3:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Polakov, Vladimir Davydov, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel


On 6/30/2017 4:18 AM, Andrew Morton wrote:
>
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1133,11 +1133,12 @@ void shrink_dcache_sb(struct super_block *sb)
>>   		LIST_HEAD(dispose);
>>   
>>   		freed = list_lru_walk(&sb->s_dentry_lru,
>> -			dentry_lru_isolate_shrink, &dispose, UINT_MAX);
>> +			dentry_lru_isolate_shrink, &dispose, 1024);
>>   
>>   		this_cpu_sub(nr_dentry_unused, freed);
>>   		shrink_dentry_list(&dispose);
>> -	} while (freed > 0);
>> +		cond_resched();
>> +	} while (list_lru_count(&sb->s_dentry_lru) > 0);
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_sb);
> I'll add a cc:stable to this one - a large dentry list is a relatively
> common thing.
>
> I'm assumng that [1/2] does not need to be backported, OK?

I think we should include [1/2] as well along with this patch, as this patch
is using list_lru_count(), which can return incorrect count if [1/2] is 
not included.

Also, all the previous patches submitted for fixing this issue must be 
dropped i.e,
mm/list_lru.c: use cond_resched_lock() for nlru->lock must be dropped.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free
  2017-06-29  3:39                   ` [PATCH v4 " Sahitya Tummala
@ 2017-07-01 16:28                     ` Vladimir Davydov
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Davydov @ 2017-07-01 16:28 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Jun 29, 2017 at 09:09:15AM +0530, Sahitya Tummala wrote:
> list_lru_count_node() iterates over all memcgs to get
> the total number of entries on the node but it can race with
> memcg_drain_all_list_lrus(), which migrates the entries from
> a dead cgroup to another. This can return incorrect number of
> entries from list_lru_count_node().
> 
> Fix this by keeping track of entries per node and simply return
> it in list_lru_count_node().
> 
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock
  2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
  2017-06-29 22:48                     ` Andrew Morton
@ 2017-07-01 16:28                     ` Vladimir Davydov
  1 sibling, 0 replies; 19+ messages in thread
From: Vladimir Davydov @ 2017-07-01 16:28 UTC (permalink / raw)
  To: Sahitya Tummala
  Cc: Alexander Polakov, Andrew Morton, Jan Kara, viro, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Jun 29, 2017 at 09:09:35AM +0530, Sahitya Tummala wrote:
> __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for
> longer duration if there are more number of items in the lru list.
> As per the current code, it can hold the spin lock for upto maximum
> UINT_MAX entries at a time. So if there are more number of items in
> the lru list, then "BUG: spinlock lockup suspected" is observed in
> the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock is acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Fix this lockup by reducing the number of entries to be shrinked
> from the lru list to 1024 at once. Also, add cond_resched() before
> processing the lru list again.
> 
> Link: http://marc.info/?t=149722864900001&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Fix-suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-07-01 16:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12  0:47 [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock Sahitya Tummala
2017-06-12 13:11 ` Jan Kara
2017-06-15 21:05 ` Andrew Morton
2017-06-16 14:44   ` Sahitya Tummala
2017-06-17 11:14   ` Vladimir Davydov
2017-06-20  2:52     ` Sahitya Tummala
2017-06-21  6:39       ` [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-21 16:31         ` Vladimir Davydov
2017-06-22 16:31           ` Sahitya Tummala
2017-06-22 17:49             ` Vladimir Davydov
2017-06-28  6:07               ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Sahitya Tummala
2017-06-28  6:07                 ` [PATCH v3 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
2017-06-29  3:39                   ` [PATCH v4 " Sahitya Tummala
2017-07-01 16:28                     ` Vladimir Davydov
2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-29 22:48                     ` Andrew Morton
2017-06-30  3:16                       ` Sahitya Tummala
2017-07-01 16:28                     ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).