From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B432FC0044D for ; Wed, 4 Dec 2019 08:35:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 633A42068E for ; Wed, 4 Dec 2019 08:35:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 633A42068E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9C236B099E; Wed, 4 Dec 2019 03:35:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A4C2D6B099F; Wed, 4 Dec 2019 03:35:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 913336B09A0; Wed, 4 Dec 2019 03:35:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 7B8D16B099E for ; Wed, 4 Dec 2019 03:35:18 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 00C414FF4 for ; Wed, 4 Dec 2019 08:35:18 +0000 (UTC) X-FDA: 76226799516.12.seed97_4b4628949ed0e X-HE-Tag: seed97_4b4628949ed0e X-Filterd-Recvd-Size: 5617 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Dec 2019 08:35:17 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id z7so7359577wrl.13 for ; Wed, 04 Dec 2019 00:35:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=qV8BnIZcKdGe8g/B3hjF/qhzHGhiwqhh1CS3JHi7+oY=; b=Pb87kMNE5UgIZwOU5eoKh1PvjbG3Y64+c+jKU8BUJqg3u3Ul4MR1qxHbYl973GVjMb 1OMiH1X3LbivrIn0GZAyPgAOTQ4ju3ULyABMPdYNTR901mpQHXC5ozCpqm6c7wq4dmKn LSvzxfWi67QVK6/Sos3UWR97RlIEFNwxU9i8ObaKefLMs2xskATtvJSXVGBKJ+XrsvM5 nxITZqfqCg1E18j/NL528PyO4Beodz7F6d57HMoSEyaLbsKT9DCM67ctkNE5jb1rtY3q EKgaQEtLcGIExWln57bRyOm7O1N3x2y7FfdTreXiyWkYg86aAokU96P8TYDH0SOfV6f5 0PDQ== X-Gm-Message-State: APjAAAXilk/PdNxf2YuPq+10TkOCIvD+HPESu/WfAC3OK+I+OUgWvocO GUIaNk/e1BfDndtP49T1zvk= X-Google-Smtp-Source: APXvYqzFdwhcGWtledJWcOOqneG+EdFvBn9gWujUMfZNGoZusAuGEFf87XEmN8KRoxRmuwTOLCbijA== X-Received: by 2002:adf:dd52:: with SMTP id u18mr2622697wrm.131.1575448515963; Wed, 04 Dec 2019 00:35:15 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id w13sm7529074wru.38.2019.12.04.00.35.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2019 00:35:15 -0800 (PST) Date: Wed, 4 Dec 2019 09:35:14 +0100 From: Michal Hocko To: Pavel Tikhomirov Cc: Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Tejun Heo , Thomas Gleixner , "Kirill A . Shutemov" , Konstantin Khorenko , Kirill Tkhai , Andrey Ryabinin Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab Message-ID: <20191204083514.GC25242@dhcp22.suse.cz> References: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote: > We have a problem that shrinker_rwsem can be held for a long time for > read in shrink_slab, at the same time any process which is trying to > manage shrinkers hangs. > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > It tries to shrink something on nfs (hard) but nfs server is dead at > these moment already and rpc will never succeed. Generally any shrinker > can take significant time to do_shrink_slab, so it's a bad idea to hold > the list lock here. Yes, this is a known problem and people have already tried to address it in the past. Have you checked previous attempts? SRCU based one http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@localhost.localdomain but I believe there were others (I only had this one in my notes). Please make sure to Cc Dave Chinner when posting a next version because he had some concerns about the change of the behavior. > We have a similar problem in shrink_slab_memcg, except that we are > traversing shrinker_map+shrinker_idr there. > > The idea of the patch is to inc a refcount to the chosen shrinker so it > won't disappear and release shrinker_rwsem while we are in > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > the refcount and continue the traversal. The reference count part makes sense to me. RCU role needs a better explanation. Also do you have any reason to not use completion for the final step? Openconding essentially the same concept sounds a bit awkward to me. > We also need a wait_queue so that unregister_shrinker can wait for the > refcnt to become zero. Only after these we can safely remove the > shrinker from list and idr, and free the shrinker. [...] > crash> bt ... > PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" > #0 [...] __schedule at ... > #1 [...] schedule at ... > #2 [...] rpc_wait_bit_killable at ... [sunrpc] > #3 [...] __wait_on_bit at ... > #4 [...] out_of_line_wait_on_bit at ... > #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] > #6 [...] nfs4_proc_delegreturn at ... [nfsv4] > #7 [...] nfs_do_return_delegation at ... [nfsv4] > #8 [...] nfs4_evict_inode at ... [nfsv4] > #9 [...] evict at ... > #10 [...] dispose_list at ... > #11 [...] prune_icache_sb at ... > #12 [...] super_cache_scan at ... > #13 [...] do_shrink_slab at ... Are NFS people aware of this? Because this is simply not acceptable behavior. Memory reclaim cannot be block indefinitely or for a long time. There must be a way to simply give up if the underlying inode cannot be reclaimed. I still have to think about the proposed solution. It sounds a bit over complicated to me. -- Michal Hocko SUSE Labs