From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
Chris Down <chris@chrisdown.name>,
Yang Shi <yang.shi@linux.alibaba.com>, Tejun Heo <tj@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Konstantin Khorenko <khorenko@virtuozzo.com>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Andrey Ryabinin <aryabinin@virtuozzo.com>,
Dave Chinner <david@fromorbit.com>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna.schumaker@netapp.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Chuck Lever <chuck.lever@oracle.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab
Date: Thu, 19 Dec 2019 10:20:36 +0000 [thread overview]
Message-ID: <f3fb2869-e89f-707b-c6e2-05d6e3abfb74@virtuozzo.com> (raw)
In-Reply-To: <20191204083514.GC25242@dhcp22.suse.cz>
On 12/4/19 11:35 AM, Michal Hocko wrote:
> On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote:
>> We have a problem that shrinker_rwsem can be held for a long time for
>> read in shrink_slab, at the same time any process which is trying to
>> manage shrinkers hangs.
>>
>> The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list.
>> It tries to shrink something on nfs (hard) but nfs server is dead at
>> these moment already and rpc will never succeed. Generally any shrinker
>> can take significant time to do_shrink_slab, so it's a bad idea to hold
>> the list lock here.
>
> Yes, this is a known problem and people have already tried to address it
> in the past. Have you checked previous attempts? SRCU based one
> http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@localhost.localdomain
> but I believe there were others (I only had this one in my notes).
> Please make sure to Cc Dave Chinner when posting a next version because
> he had some concerns about the change of the behavior.
The approach of the patch you are referencing is quiet different, it
will still hold global srcu_read_lock(&srcu) when diving in
do_shrink_slab and hanging nfs will still block all [un]register_shrinker.
>
>> We have a similar problem in shrink_slab_memcg, except that we are
>> traversing shrinker_map+shrinker_idr there.
>>
>> The idea of the patch is to inc a refcount to the chosen shrinker so it
>> won't disappear and release shrinker_rwsem while we are in
>> do_shrink_slab, after that we will reacquire shrinker_rwsem, dec
>> the refcount and continue the traversal.
>
> The reference count part makes sense to me. RCU role needs a better
> explanation.
I have 2 rcu's in patch, 1-st is to protect shrinker_map same as it was
before in memcg_set_shrinker_bit, 2-nd is to protect shrinker struct in
put_shrinker from being freed, as unregister_shrinker can see refcnt==0
without actually going to schedule().
> Also do you have any reason to not use completion for
> the final step? Openconding essentially the same concept sounds a bit
> awkward to me.
Thanks for a good hint, from the first glance we can rework wait_event
part to wait_for_completion.
>
>> We also need a wait_queue so that unregister_shrinker can wait for the
>> refcnt to become zero. Only after these we can safely remove the
>> shrinker from list and idr, and free the shrinker.
> [...]
>> crash> bt ...
>> PID: 18739 TASK: ... CPU: 3 COMMAND: "bash"
>> #0 [...] __schedule at ...
>> #1 [...] schedule at ...
>> #2 [...] rpc_wait_bit_killable at ... [sunrpc]
>> #3 [...] __wait_on_bit at ...
>> #4 [...] out_of_line_wait_on_bit at ...
>> #5 [...] _nfs4_proc_delegreturn at ... [nfsv4]
>> #6 [...] nfs4_proc_delegreturn at ... [nfsv4]
>> #7 [...] nfs_do_return_delegation at ... [nfsv4]
>> #8 [...] nfs4_evict_inode at ... [nfsv4]
>> #9 [...] evict at ...
>> #10 [...] dispose_list at ...
>> #11 [...] prune_icache_sb at ...
>> #12 [...] super_cache_scan at ...
>> #13 [...] do_shrink_slab at ...
>
> Are NFS people aware of this? Because this is simply not acceptable
> behavior. Memory reclaim cannot be block indefinitely or for a long
> time. There must be a way to simply give up if the underlying inode
> cannot be reclaimed.
Sorry that I didn't cc nfs people from the begining.
>
> I still have to think about the proposed solution. It sounds a bit over
> complicated to me.
>
--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
next prev parent reply other threads:[~2019-12-19 10:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-29 21:45 [PATCH] mm: fix hanging shrinker management on long do_shrink_slab Pavel Tikhomirov
2019-12-02 16:36 ` Andrey Ryabinin
2019-12-03 0:13 ` Shakeel Butt
2019-12-03 11:03 ` Kirill Tkhai
2019-12-05 3:13 ` Pavel Tikhomirov
2019-12-06 2:09 ` Dave Chinner
2019-12-06 10:09 ` Michal Hocko
2019-12-06 17:11 ` Shakeel Butt
2019-12-10 1:20 ` Dave Chinner
2019-12-19 10:35 ` Pavel Tikhomirov
2019-12-03 11:38 ` Kirill Tkhai
2019-12-05 3:29 ` Pavel Tikhomirov
2019-12-04 8:35 ` Michal Hocko
2019-12-19 10:20 ` Pavel Tikhomirov [this message]
2022-10-18 3:48 ` Zhang Tianci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f3fb2869-e89f-707b-c6e2-05d6e3abfb74@virtuozzo.com \
--to=ptikhomirov@virtuozzo.com \
--cc=akpm@linux-foundation.org \
--cc=anna.schumaker@netapp.com \
--cc=aryabinin@virtuozzo.com \
--cc=bfields@fieldses.org \
--cc=cgroups@vger.kernel.org \
--cc=chris@chrisdown.name \
--cc=chuck.lever@oracle.com \
--cc=david@fromorbit.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=khorenko@virtuozzo.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=shakeelb@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=trond.myklebust@hammerspace.com \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).