All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Sahitya Tummala <stummala@codeaurora.org>
Cc: Alexander Polakov <apolyakov@beget.ru>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
Date: Thu, 15 Jun 2017 14:05:23 -0700	[thread overview]
Message-ID: <20170615140523.76f8fc3ca21dae3704f06a56@linux-foundation.org> (raw)
In-Reply-To: <1497228440-10349-1-git-send-email-stummala@codeaurora.org>

On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:

> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);

This is rather worrying.

a) Why are we spending so long holding that lock that this is occurring?

b) With this patch, we're restarting the entire scan.  Are there
   situations in which this loop will never terminate, or will take a
   very long time?  Suppose that this process is getting rescheds
   blasted at it for some reason?

IOW this looks like a bit of a band-aid and a deeper analysis and
understanding might be needed.

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Sahitya Tummala <stummala@codeaurora.org>
Cc: Alexander Polakov <apolyakov@beget.ru>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock
Date: Thu, 15 Jun 2017 14:05:23 -0700	[thread overview]
Message-ID: <20170615140523.76f8fc3ca21dae3704f06a56@linux-foundation.org> (raw)
In-Reply-To: <1497228440-10349-1-git-send-email-stummala@codeaurora.org>

On Mon, 12 Jun 2017 06:17:20 +0530 Sahitya Tummala <stummala@codeaurora.org> wrote:

> __list_lru_walk_one() can hold the spin lock for longer duration
> if there are more number of entries to be isolated.
> 
> This results in "BUG: spinlock lockup suspected" in the below path -
> 
> [<ffffff8eca0fb0bc>] spin_bug+0x90
> [<ffffff8eca0fb220>] do_raw_spin_lock+0xfc
> [<ffffff8ecafb7798>] _raw_spin_lock+0x28
> [<ffffff8eca1ae884>] list_lru_add+0x28
> [<ffffff8eca1f5dac>] dput+0x1c8
> [<ffffff8eca1eb46c>] path_put+0x20
> [<ffffff8eca1eb73c>] terminate_walk+0x3c
> [<ffffff8eca1eee58>] path_lookupat+0x100
> [<ffffff8eca1f00fc>] filename_lookup+0x6c
> [<ffffff8eca1f0264>] user_path_at_empty+0x54
> [<ffffff8eca1e066c>] SyS_faccessat+0xd0
> [<ffffff8eca084e30>] el0_svc_naked+0x24
> 
> This nlru->lock has been acquired by another CPU in this path -
> 
> [<ffffff8eca1f5fd0>] d_lru_shrink_move+0x34
> [<ffffff8eca1f6180>] dentry_lru_isolate_shrink+0x48
> [<ffffff8eca1aeafc>] __list_lru_walk_one.isra.10+0x94
> [<ffffff8eca1aec34>] list_lru_walk_node+0x40
> [<ffffff8eca1f6620>] shrink_dcache_sb+0x60
> [<ffffff8eca1e56a8>] do_remount_sb+0xbc
> [<ffffff8eca1e583c>] do_emergency_remount+0xb0
> [<ffffff8eca0ba510>] process_one_work+0x228
> [<ffffff8eca0bb158>] worker_thread+0x2e0
> [<ffffff8eca0c040c>] kthread+0xf4
> [<ffffff8eca084dd0>] ret_from_fork+0x10
> 
> Link: http://marc.info/?t=149511514800002&r=1&w=2
> Fix-suggested-by: Jan kara <jack@suse.cz>
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
>  mm/list_lru.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 5d8dffd..1af0709 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -249,6 +249,8 @@ restart:
>  		default:
>  			BUG();
>  		}
> +		if (cond_resched_lock(&nlru->lock))
> +			goto restart;
>  	}
>  
>  	spin_unlock(&nlru->lock);

This is rather worrying.

a) Why are we spending so long holding that lock that this is occurring?

b) With this patch, we're restarting the entire scan.  Are there
   situations in which this loop will never terminate, or will take a
   very long time?  Suppose that this process is getting rescheds
   blasted at it for some reason?

IOW this looks like a bit of a band-aid and a deeper analysis and
understanding might be needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-06-15 21:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12  0:47 [PATCH] mm/list_lru.c: use cond_resched_lock() for nlru->lock Sahitya Tummala
2017-06-12  0:47 ` Sahitya Tummala
2017-06-12 13:11 ` Jan Kara
2017-06-12 13:11   ` Jan Kara
2017-06-15 21:05 ` Andrew Morton [this message]
2017-06-15 21:05   ` Andrew Morton
2017-06-16 14:44   ` Sahitya Tummala
2017-06-16 14:44     ` Sahitya Tummala
2017-06-17 11:14   ` Vladimir Davydov
2017-06-17 11:14     ` Vladimir Davydov
2017-06-20  2:52     ` Sahitya Tummala
2017-06-20  2:52       ` Sahitya Tummala
2017-06-21  6:39       ` [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-21  6:39         ` Sahitya Tummala
2017-06-21 16:31         ` Vladimir Davydov
2017-06-21 16:31           ` Vladimir Davydov
2017-06-22 16:31           ` Sahitya Tummala
2017-06-22 16:31             ` Sahitya Tummala
2017-06-22 17:49             ` Vladimir Davydov
2017-06-22 17:49               ` Vladimir Davydov
2017-06-28  6:07               ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Sahitya Tummala
2017-06-28  6:07                 ` Sahitya Tummala
2017-06-28  6:07                 ` [PATCH v3 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-28  6:07                   ` Sahitya Tummala
2017-06-28 17:18                 ` [PATCH v3 1/2] mm/list_lru.c: fix list_lru_count_node() to be race free Vladimir Davydov
2017-06-28 17:18                   ` Vladimir Davydov
2017-06-29  3:39                   ` [PATCH v4 " Sahitya Tummala
2017-06-29  3:39                     ` Sahitya Tummala
2017-07-01 16:28                     ` Vladimir Davydov
2017-07-01 16:28                       ` Vladimir Davydov
2017-06-29  3:39                   ` [PATCH v4 2/2] fs/dcache.c: fix spin lockup issue on nlru->lock Sahitya Tummala
2017-06-29  3:39                     ` Sahitya Tummala
2017-06-29 22:48                     ` Andrew Morton
2017-06-29 22:48                       ` Andrew Morton
2017-06-30  3:16                       ` Sahitya Tummala
2017-06-30  3:16                         ` Sahitya Tummala
2017-07-01 16:28                     ` Vladimir Davydov
2017-07-01 16:28                       ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615140523.76f8fc3ca21dae3704f06a56@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=apolyakov@beget.ru \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stummala@codeaurora.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.