Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

From: Michael Wakabayashi <mwakabayashi@vmware.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
Date: Fri, 21 May 2021 19:11:28 +0000	[thread overview]
Message-ID: <CO1PR05MB8101F210648CF41D02510E23B7299@CO1PR05MB8101.namprd05.prod.outlook.com> (raw)
In-Reply-To: <CAN-5tyGmq=LrO4SgqjJdhJkEgNfAHEbVNkNtEeuA3vvW7rjV=g@mail.gmail.com>

Hi Steve and Olga,

We run multiple Kubernetes clusters.
These clusters are composed of hundreds of Kubernetes nodes.
Any of these nodes can NFS mount on behalf of the containers running on these nodes.
We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).

Having several hundred NFS mounts blocked on a node causes the Kubernetes node to become unstable and **severely** degrades service.

We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly/healthy.

Can this behavior be changed?

Thanks, Mike

From: Olga Kornievskaia <aglo@umich.edu>
Sent: Thursday, May 20, 2021 4:51 PM
To: Michael Wakabayashi <mwakabayashi@vmware.com>
Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine 

Hi Mike,

Ok so I can reproduce it but the scenario is not exact what you are
stating. This is not an unreachable server. It requires that the
server is reachable but not responsive.

Using iptables I can say drop packets to port 2049 on one machine.
Then I mount that IP from the client. Your stack requires an "active"
but not responsive connection. And indeed until the 1st mount times
out (which it does), no other mounts go through.

But I think I agree with the previous comment that Trond made about
the need to wait for any clients that are in the middle initializing
until making any matches with it. I think a match can be made but the
initializing client can fail the initialization (based on what the
server returns).

My conclusion is: an unresponsive server will block other mounts but
only until timeout is reached.

On Thu, May 20, 2021 at 6:43 AM Michael Wakabayashi
<mwakabayashi@vmware.com> wrote:
>
> Hi Olga,
>
> If you are able to run privileged Docker containers
> you might be able to reproduce the issue running
> the following docker commands.
>
> docker pull ubuntu:hirsute-20210514 # ubuntu hirsute is the latest version of ubuntu
>
> # Run this to find the id of the ubuntu image  which is needed in the next command
> docker image ls # image id looks like "274cadba4412"
>
> # Run the ubuntu container and start a bash shell.
> # Replace <ubuntu_hirsute_image_id> with your ubuntu image id from the previous step.
> docker container run --rm -it --privileged <ubuntu_hirsute_image_id> /bin/bash
>
>
> # You should be inside the ubuntu container now and can run these Linux commands
> apt-get update # this is needed, otherwise the next command fails
>
> # This installs mount.nfs. Answer the two questions about geographic area and city.
> # Ignore all the debconf error message (readline, dialog, frontend, etc)
> apt install -y nfs-common
>
> # execute mount commands
> mkdir /tmp/mnt1 /tmp/mnt2
> mount.nfs 1.1.1.1:/ /tmp/mnt1 &
> mount.nfs <accessible-nfs-host:path> /tmp/mnt2 &
> jobs  # shows both mounts are hung
>
> Thanks, Mike
>
>
> From: Michael Wakabayashi <mwakabayashi@vmware.com>
> Sent: Thursday, May 20, 2021 2:51 AM
> To: Olga Kornievskaia <aglo@umich.edu>
> Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> Hi Orna,
>
> Thank you for looking.
>
> I spent a couple of hours trying to get various
> SystemTap NFS scripts working but mostly got errors.
>
> For example:
> > root@mikes-ubuntu-21-04:~/src/systemtap-scripts/tracepoints# stap nfs4_fsinfo.stp
> > semantic error: unable to find tracepoint variable '$status' (alternatives: $$parms, $$vars, $task, $$name): identifier '$status' at nfs4_fsinfo.stp:7:11
> >         source: terror = $status
> >                         ^
> > Pass 2: analysis failed.  [man error::pass2]
>
> If you have any stap scripts that work on Ubuntu
> that you'd like me to run or have pointers on how
> to setup my Ubuntu environment to run them
> successfully, please let me know and I can try again..
>
>
> Here's the call trace for the mount.nfs command
> mounting the bad NFS server/10.1.1.1:
>
> [Thu May 20 08:53:35 2021] task:mount.nfs       state:D stack:    0 pid:13903 ppid: 13900 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021]  ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021]  __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021]  ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021]  schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021]  rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
> [Thu May 20 08:53:35 2021]  __wait_on_bit+0x33/0xa0
> [Thu May 20 08:53:35 2021]  ? call_reserveresult+0xa0/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021]  out_of_line_wait_on_bit+0x8d/0xb0
> [Thu May 20 08:53:35 2021]  ? var_wake_function+0x30/0x30
> [Thu May 20 08:53:35 2021]  __rpc_execute+0xd4/0x290 [sunrpc]
> [Thu May 20 08:53:35 2021]  rpc_execute+0x5e/0x80 [sunrpc]
> [Thu May 20 08:53:35 2021]  rpc_run_task+0x13d/0x180 [sunrpc]
> [Thu May 20 08:53:35 2021]  rpc_call_sync+0x51/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021]  rpc_create_xprt+0x177/0x1c0 [sunrpc]
> [Thu May 20 08:53:35 2021]  rpc_create+0x11f/0x220 [sunrpc]
> [Thu May 20 08:53:35 2021]  ? __memcg_kmem_charge+0x7d/0xf0
> [Thu May 20 08:53:35 2021]  ? _cond_resched+0x1a/0x50
> [Thu May 20 08:53:35 2021]  nfs_create_rpc_client+0x13a/0x180 [nfs]
> [Thu May 20 08:53:35 2021]  nfs4_init_client+0x205/0x290 [nfsv4]
> [Thu May 20 08:53:35 2021]  ? __fscache_acquire_cookie+0x10a/0x210 [fscache]
> [Thu May 20 08:53:35 2021]  ? nfs_fscache_get_client_cookie+0xa9/0x120 [nfs]
> [Thu May 20 08:53:35 2021]  ? nfs_match_client+0x37/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021]  nfs_get_client+0x14d/0x190 [nfs]
> [Thu May 20 08:53:35 2021]  nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021]  vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021]  do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021]  path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021]  __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021]  do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
> Here's the call trace for the mount.nfs command
> mounting an available NFS server/10.188.76.67 (which was
> blocked by the first mount.nfs command above):
> [Thu May 20 08:53:35 2021] task:mount.nfs       state:D stack:    0 pid:13910 ppid: 13907 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021]  __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021]  schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021]  nfs_wait_client_init_complete+0x5a/0x90 [nfs]
> [Thu May 20 08:53:35 2021]  ? wait_woken+0x80/0x80
> [Thu May 20 08:53:35 2021]  nfs_match_client+0x1de/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021]  ? pcpu_block_update_hint_alloc+0xcc/0x2d0
> [Thu May 20 08:53:35 2021]  nfs_get_client+0x62/0x190 [nfs]
> [Thu May 20 08:53:35 2021]  nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021]  nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021]  vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021]  do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021]  path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021]  __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021]  do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> I've pasted the entire dmesg output here: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F90QJyAL9&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274079272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IRCXJjIXojFm1xaFuSy11LW0QFbMN04C%2BjzTwI3FCVM%3D&amp;reserved=0
>
>
> This is the command I ran to mount an unreachable NFS server:
> date; time strace mount.nfs 10.1.1.1:/nopath /tmp/mnt.dead; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F5yVhm77u&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274079272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NMI42XLcCZiECy5VUAQ7VyAWepTVLX22%2BBnBWaDOD4s%3D&amp;reserved=0
>
> This is the command I ran to mount the available NFS server:
> date; time strace mount.nfs 10.188.76.67:/ /tmp/mnt.alive ; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FkTimQ6vH&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jy7dGBjQrViwXNJFYowg7x%2BLJZjxf6PUwvEQV7Ql5JM%3D&amp;reserved=0
>
> The procedure:
> - run dmesg -C to clear dmesg logs
> - run mount.nfs on 10.1.1.1 (this IP address is down/not responding to ping) which hung
> - run mount.nfs on 10.188.76.67  which also hung
> - "echo t > /proc/sysrq-trigger" to dump the call traces for hung processes
> - dmesg -T > dmesg.log to save the dmesg logs
> - control-Z the mount.nfs command to 10.1.1.1
> - "kill -9 %1" in the terminal to kill the mount.nfs to 10.1.1.1
> - mount.nfs to 10.188.76.67 immediately mounts successfully
>   after the first mount is killed (we can see this by the timestamp in the logs files)
>
>
> Thanks, Mike
>
>
>
> From: Olga Kornievskaia <aglo@umich.edu>
> Sent: Wednesday, May 19, 2021 12:15 PM
> To: Michael Wakabayashi <mwakabayashi@vmware.com>
> Cc: linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> <mwakabayashi@vmware.com> wrote:
> >
> > Hi,
> >
> > We're seeing what looks like an NFSv4 issue.
> >
> > Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> >
> > If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> >
> >
> > I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9U%2FiCu0sH6gKkMk8Jq4nM6B5Mwhx7LYYb8%2FtvtezoCY%3D&amp;reserved=0
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=58zX0jg4xJrwxb%2BDtw5uzyOxhEvdxR25bYka65XNdyI%3D&amp;reserved=0
> >
> >
> > The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > > [ ... cut ... ]
> > > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> >
> > Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> >
> >
> > We reported this issue under util-linux here:
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vzhKPE8kaesr1jGRCghvn%2Be84%2FUXXXj7CbivwIrpLUE%3D&amp;reserved=0
> > [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> >
> > I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4kr9PxATC2Sv74njqJcajMGterOIIKCvtrZQHgGkqPE%3D&amp;reserved=0
> > [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> >
> > Thanks, Mike
>
> Hi Mike,
>
> This is not a helpful reply but I was curious if I could reproduce
> your issue but was not successful. I'm able to initiate a mount to an
> unreachable-IP-address which hangs and then do another mount to an
> existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> I tried upstream 5.11 and I tried the latest upstream nfs-utils
> (instead of what my distro has which was an older version).
>
> To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> Or also get output from dmesg after doing “echo t >
> /proc/sysrq-trigger” to see where the mounts are hanging.