From: Pavel Tikhomirov <email@example.com> To: Michal Hocko <firstname.lastname@example.org> Cc: Andrew Morton <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, Johannes Weiner <firstname.lastname@example.org>, Vladimir Davydov <email@example.com>, Roman Gushchin <firstname.lastname@example.org>, Shakeel Butt <email@example.com>, Chris Down <firstname.lastname@example.org>, Yang Shi <email@example.com>, Tejun Heo <firstname.lastname@example.org>, Thomas Gleixner <email@example.com>, "Kirill A . Shutemov" <firstname.lastname@example.org>, Konstantin Khorenko <email@example.com>, Kirill Tkhai <firstname.lastname@example.org>, Andrey Ryabinin <email@example.com>, Dave Chinner <firstname.lastname@example.org>, Trond Myklebust <email@example.com>, Anna Schumaker <firstname.lastname@example.org>, "J. Bruce Fields" <email@example.com>, Chuck Lever <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, Alexander Viro <email@example.com>, linux-fsdevel <firstname.lastname@example.org> Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab Date: Thu, 19 Dec 2019 10:20:36 +0000 Message-ID: <email@example.com> (raw) In-Reply-To: <20191204083514.GC25242@dhcp22.suse.cz> On 12/4/19 11:35 AM, Michal Hocko wrote: > On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote: >> We have a problem that shrinker_rwsem can be held for a long time for >> read in shrink_slab, at the same time any process which is trying to >> manage shrinkers hangs. >> >> The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. >> It tries to shrink something on nfs (hard) but nfs server is dead at >> these moment already and rpc will never succeed. Generally any shrinker >> can take significant time to do_shrink_slab, so it's a bad idea to hold >> the list lock here. > > Yes, this is a known problem and people have already tried to address it > in the past. Have you checked previous attempts? SRCU based one > http://firstname.lastname@example.org > but I believe there were others (I only had this one in my notes). > Please make sure to Cc Dave Chinner when posting a next version because > he had some concerns about the change of the behavior. The approach of the patch you are referencing is quiet different, it will still hold global srcu_read_lock(&srcu) when diving in do_shrink_slab and hanging nfs will still block all [un]register_shrinker. > >> We have a similar problem in shrink_slab_memcg, except that we are >> traversing shrinker_map+shrinker_idr there. >> >> The idea of the patch is to inc a refcount to the chosen shrinker so it >> won't disappear and release shrinker_rwsem while we are in >> do_shrink_slab, after that we will reacquire shrinker_rwsem, dec >> the refcount and continue the traversal. > > The reference count part makes sense to me. RCU role needs a better > explanation. I have 2 rcu's in patch, 1-st is to protect shrinker_map same as it was before in memcg_set_shrinker_bit, 2-nd is to protect shrinker struct in put_shrinker from being freed, as unregister_shrinker can see refcnt==0 without actually going to schedule(). > Also do you have any reason to not use completion for > the final step? Openconding essentially the same concept sounds a bit > awkward to me. Thanks for a good hint, from the first glance we can rework wait_event part to wait_for_completion. > >> We also need a wait_queue so that unregister_shrinker can wait for the >> refcnt to become zero. Only after these we can safely remove the >> shrinker from list and idr, and free the shrinker. > [...] >> crash> bt ... >> PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" >> #0 [...] __schedule at ... >> #1 [...] schedule at ... >> #2 [...] rpc_wait_bit_killable at ... [sunrpc] >> #3 [...] __wait_on_bit at ... >> #4 [...] out_of_line_wait_on_bit at ... >> #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] >> #6 [...] nfs4_proc_delegreturn at ... [nfsv4] >> #7 [...] nfs_do_return_delegation at ... [nfsv4] >> #8 [...] nfs4_evict_inode at ... [nfsv4] >> #9 [...] evict at ... >> #10 [...] dispose_list at ... >> #11 [...] prune_icache_sb at ... >> #12 [...] super_cache_scan at ... >> #13 [...] do_shrink_slab at ... > > Are NFS people aware of this? Because this is simply not acceptable > behavior. Memory reclaim cannot be block indefinitely or for a long > time. There must be a way to simply give up if the underlying inode > cannot be reclaimed. Sorry that I didn't cc nfs people from the begining. > > I still have to think about the proposed solution. It sounds a bit over > complicated to me. > -- Best regards, Tikhomirov Pavel Software Developer, Virtuozzo.
prev parent reply index Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <email@example.com> 2019-12-02 16:36 ` Andrey Ryabinin 2019-12-03 0:13 ` Shakeel Butt 2019-12-03 11:03 ` Kirill Tkhai 2019-12-06 2:09 ` Dave Chinner 2019-12-06 10:09 ` Michal Hocko 2019-12-06 17:11 ` Shakeel Butt 2019-12-10 1:20 ` Dave Chinner 2019-12-19 10:35 ` Pavel Tikhomirov [not found] ` <20191204083514.GC25242@dhcp22.suse.cz> 2019-12-19 10:20 ` Pavel Tikhomirov [this message]
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-NFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \ email@example.com public-inbox-index linux-nfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git