From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BE27C432C3 for ; Tue, 3 Dec 2019 00:14:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C91752070B for ; Tue, 3 Dec 2019 00:14:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UdtIHWXs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725944AbfLCAOK (ORCPT ); Mon, 2 Dec 2019 19:14:10 -0500 Received: from mail-oi1-f193.google.com ([209.85.167.193]:34585 "EHLO mail-oi1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725954AbfLCAOJ (ORCPT ); Mon, 2 Dec 2019 19:14:09 -0500 Received: by mail-oi1-f193.google.com with SMTP id l136so1603527oig.1 for ; Mon, 02 Dec 2019 16:14:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jU8zQM7dQsl7j+C2HShiZHxrCqA16lgqBbZmIQjuFsE=; b=UdtIHWXsgA1cgpVIddyD/Sekot8kmzIvm9AkSrHSOZneyL3PfXwLCGrfd2C7oZrezJ e8qaDNGxII/RljQThXGhEK53xQlKGoHxvaU6wwSyag3/jkfaemJH/tfN/uNdLN94XfOL aVFVvIau+de3pcuRbNR7xT4Zu8AUADkU1oGkw1GizVGGrbAZB7rMydqYyNTn5c5au9iu YF28TyNBvvlTW+2V6dmrnMmGITVDsNi6ylMGS8G0Ym7Rx20w4FJf7W18LzPkd4ix1ubm lTY6XBoQjMu5H2nB1yRLjVmVwFFJbQN437b5r37RMCynvnUwjOF/bmGTunHUMdy83uvP 3eeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jU8zQM7dQsl7j+C2HShiZHxrCqA16lgqBbZmIQjuFsE=; b=cA0c7NuZWP1/BVOhETVPLN7t4D7+fGpfXKrJF1T4mo++e0xZeT+V8xEOKMwCojo4Fm 7iwysn7AKPD3+aOMJ/idHbKVDLljdrZXk58PRwNUCiyKm8bQls957pd6NibKEZcW3QoZ +o3D5axphzrnN1lNaoojvXy6NH+SOB7jD84Be87T0qcuDDRg+HFGQ6nGTUVwAd3IpJ6j Bra6dpy1gNXadWoRQl02BPJfHDZf1ZqbDzAuBHDNBYjjWwSoMDBiD5v+JbYhGpX3GFXT FzbpNTgqthhjxbi1RWFmayoaUDt8fnZRx9s0EGgG0faU5Y7SBD57tE1wIw9Wtz2jj/rU HtLQ== X-Gm-Message-State: APjAAAWRTkhGPfeIoBu6Xpgt/uZ6QIhjZ/CIoWyc0XBxs4kaz8OCfUUu W3D9FfC6nLIb8BqGKY2lXakJi9p2bezA8OitJGMK8Q== X-Google-Smtp-Source: APXvYqxHMoMNJtSea2KbnT6kI5KkGt3Sur+DvCkMAz+DyiTipaX++WIG/yH7EvjF/HQkpI5hCJ3YCuHSS9BQklKkN6Y= X-Received: by 2002:aca:670b:: with SMTP id z11mr1401696oix.79.1575332047908; Mon, 02 Dec 2019 16:14:07 -0800 (PST) MIME-Version: 1.0 References: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> <4e2d959a-0b0e-30aa-59b4-8e37728e9793@virtuozzo.com> In-Reply-To: <4e2d959a-0b0e-30aa-59b4-8e37728e9793@virtuozzo.com> From: Shakeel Butt Date: Mon, 2 Dec 2019 16:13:56 -0800 Message-ID: Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab To: Andrey Ryabinin Cc: Pavel Tikhomirov , Andrew Morton , LKML , Cgroups , Linux MM , Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Yang Shi , Tejun Heo , Thomas Gleixner , "Kirill A . Shutemov" , Konstantin Khorenko , Kirill Tkhai , Trond Myklebust , Anna Schumaker , "J. Bruce Fields" , Chuck Lever , linux-nfs@vger.kernel.org, Alexander Viro , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Dec 2, 2019 at 8:37 AM Andrey Ryabinin wrote: > > > On 11/30/19 12:45 AM, Pavel Tikhomirov wrote: > > We have a problem that shrinker_rwsem can be held for a long time for > > read in shrink_slab, at the same time any process which is trying to > > manage shrinkers hangs. > > > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > > It tries to shrink something on nfs (hard) but nfs server is dead at > > these moment already and rpc will never succeed. Generally any shrinker > > can take significant time to do_shrink_slab, so it's a bad idea to hold > > the list lock here. > > > > We have a similar problem in shrink_slab_memcg, except that we are > > traversing shrinker_map+shrinker_idr there. > > > > The idea of the patch is to inc a refcount to the chosen shrinker so it > > won't disappear and release shrinker_rwsem while we are in > > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > > the refcount and continue the traversal. > > > > We also need a wait_queue so that unregister_shrinker can wait for the > > refcnt to become zero. Only after these we can safely remove the > > shrinker from list and idr, and free the shrinker. > > > > I've reproduced the nfs hang in do_shrink_slab with the patch applied on > > ms kernel, all other mount/unmount pass fine without any hang. > > > > Here is a reproduction on kernel without patch: > > > > 1) Setup nfs on server node with some files in it (e.g. 200) > > > > [server]# cat /etc/exports > > /vz/nfs2 *(ro,no_root_squash,no_subtree_check,async) > > > > 2) Hard mount it on client node > > > > [client]# mount -ohard 10.94.3.40:/vz/nfs2 /mnt > > > > 3) Open some (e.g. 200) files on the mount > > > > [client]# for i in $(find /mnt/ -type f | head -n 200); \ > > do setsid sleep 1000 &>/dev/null <$i & done > > > > 4) Kill all openers > > > > [client]# killall sleep -9 > > > > 5) Put your network cable out on client node > > > > 6) Drop caches on the client, it will hang on nfs while holding > > shrinker_rwsem lock for read > > > > [client]# echo 3 > /proc/sys/vm/drop_caches > > > > crash> bt ... > > PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" > > #0 [...] __schedule at ... > > #1 [...] schedule at ... > > #2 [...] rpc_wait_bit_killable at ... [sunrpc] > > #3 [...] __wait_on_bit at ... > > #4 [...] out_of_line_wait_on_bit at ... > > #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] > > #6 [...] nfs4_proc_delegreturn at ... [nfsv4] > > #7 [...] nfs_do_return_delegation at ... [nfsv4] > > #8 [...] nfs4_evict_inode at ... [nfsv4] > > #9 [...] evict at ... > > #10 [...] dispose_list at ... > > #11 [...] prune_icache_sb at ... > > #12 [...] super_cache_scan at ... > > #13 [...] do_shrink_slab at ... > > #14 [...] shrink_slab at ... > > #15 [...] drop_slab_node at ... > > #16 [...] drop_slab at ... > > #17 [...] drop_caches_sysctl_handler at ... > > #18 [...] proc_sys_call_handler at ... > > #19 [...] vfs_write at ... > > #20 [...] ksys_write at ... > > #21 [...] do_syscall_64 at ... > > #22 [...] entry_SYSCALL_64_after_hwframe at ... > > > > 7) All other mount/umount activity now hangs with no luck to take > > shrinker_rwsem for write. > > > > [client]# mount -t tmpfs tmpfs /tmp > > > > crash> bt ... > > PID: 5464 TASK: ... CPU: 3 COMMAND: "mount" > > #0 [...] __schedule at ... > > #1 [...] schedule at ... > > #2 [...] rwsem_down_write_slowpath at ... > > #3 [...] prealloc_shrinker at ... > > #4 [...] alloc_super at ... > > #5 [...] sget at ... > > #6 [...] mount_nodev at ... > > #7 [...] legacy_get_tree at ... > > #8 [...] vfs_get_tree at ... > > #9 [...] do_mount at ... > > #10 [...] ksys_mount at ... > > #11 [...] __x64_sys_mount at ... > > #12 [...] do_syscall_64 at ... > > #13 [...] entry_SYSCALL_64_after_hwframe at ... > > > > > I don't think this patch solves the problem, it only fixes one minor symptom of it. > The actual problem here the reclaim hang in the nfs. > It means that any process, including kswapd, may go into nfs inode reclaim and stuck there. > > Even mount() itself has GFP_KERNEL allocations in its path, so it still might stuck there even with your patch. > > I think this should be handled on nfs/vfs level by making inode eviction during reclaim more asynchronous. Though I agree that we should be fixing shrinkers to not get stuck (and be more async), I still think the problem this patch is solving is worth fixing. On machines running multiple workloads, one job stuck in slab shrinker and blocking all other unrelated jobs wanting shrinker_rwsem, breaks the isolation and causes DoS. Shakeel