Re: rpc.statd dies because of pacemaker monitoring

From: Indivar Nair <indivar.nair@techterra.in>
To: Scott Mayhew <smayhew@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: rpc.statd dies because of pacemaker monitoring
Date: Sat, 13 Jul 2019 20:31:06 +0530	[thread overview]
Message-ID: <CALuPYL13mQDg_F4rTv2FK8SEgyS56aRnv2MgNu=KUL0W3BmhEQ@mail.gmail.com> (raw)
In-Reply-To: <20190712141657.GB4131@coeurl.usersys.redhat.com>

Thanks once again, Scott,

The patch seems like a neat solution
It will declare rpc.statd dead only after checking multiple times. Super.
I will patch the nfsserver resource file.
(It should probably be added to the original source code.)

We have a proper entry for 127.0.0.1 in the hosts file and the
nsswtich.conf file says, "files dns".
So if it checks the /etc/hosts first, why would the pacemaker's check timeout?
Shouldn't pacemaker get a quick response?

Localhost entries in the /etc/hosts file -
-------------------------------------------------------------------------------------------------------------------------------------
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
-------------------------------------------------------------------------------------------------------------------------------------

Regards,

Indivar Nair

On Fri, Jul 12, 2019 at 7:47 PM Scott Mayhew <smayhew@redhat.com> wrote:
>
> On Fri, 12 Jul 2019, Indivar Nair wrote:
>
> > Hi Scott,
> >
> > Thanks a lot.
> > Yes, it is a 10+ year old AD setup, which was migrated to Samba4AD
> > (samba+named) a few years ago.
> > It has lot of stale entries, and fwd - rev lookup mismatches.
> >
> > Will start cleaning up DNS right away.
> >
> > In the meantime, is there any way to increase the rpc ping timeout?
>
> You could build the rpcinfo.c program from source (it's in the rpcbind
> git tree) with a longer timeout.
>
> > OR
> > Is there any way to temporarily disable DNS lookups by lockd?
>
> rpc.statd is doing the DNS lookups, and no there's not a way to disable it.
> Doing so would probably make the reboot notifications less reliable, and
> clients wouldn't know to reclaim locks, and the risk of data corruption
> goes up.
>
> You could prevent the DNS lookups from occurring by adding entries (properly
> formatted... see the hosts(5) man page) for all your clients into the
> /etc/hosts file on your NFS server nodes.  That's assuming that your nsswitch
> configuration has "files" before "dns" for host lookups.  Depending on how
> many clients you have, that might be the easiest option.
>
> Another option might be to try adding a simple retry mechanism to the part
> of the nfsserver resource agent that checks rpc.statd, something like:
>
> diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver
> index bf59da98..c9dcc74e 100755
> --- a/heartbeat/nfsserver
> +++ b/heartbeat/nfsserver
> @@ -334,8 +334,13 @@ nfsserver_systemd_monitor()
>         fi
>
>         ocf_log debug "Status: rpc-statd"
> -       rpcinfo -t localhost 100024 > /dev/null 2>&1
> -       rc=$?
> +       for i in `seq 1 3`; do
> +               rpcinfo -t localhost 100024 >/dev/null 2>&1
> +               rc=$?
> +               if [ $rc -eq 0 ]; then
> +                       break
> +               fi
> +       done
>         if [ "$rc" -ne "0" ]; then
>                 ocf_exit_reason "rpc-statd is not running"
>                 return $OCF_NOT_RUNNING
> >
> > Regards,
> >
> >
> > Indivar Nair
> >
> > On Thu, Jul 11, 2019 at 10:19 PM Scott Mayhew <smayhew@redhat.com> wrote:
> > >
> > > On Thu, 11 Jul 2019, Indivar Nair wrote:
> > >
> > > > Hi ...,
> > > >
> > > > I have a 2 node Pacemaker cluster built using CentOS 7.6.1810
> > > > It serves files using NFS and Samba.
> > > >
> > > > Every 15 - 20 minutes, the rpc.statd service fails, and the whole NFS
> > > > service is restarted.
> > > > After investigation, it was found that the service fails after a few
> > > > rounds of monitoring by Pacemaker.
> > > > The Pacemaker's script runs the following command to check whether all
> > > > the services are running -
> > > > ---------------------------------------------------------------------------------------------------------------------------------------
> > > >     rpcinfo > /dev/null 2>&1
> > > >     rpcinfo -t localhost 100005 > /dev/null 2>&1
> > > >     nfs_exec status nfs-idmapd > $fn 2>&1
> > > >     rpcinfo -t localhost 100024 > /dev/null 2>&1
> > >
> > > I would check to make sure your DNS setup is working properly.
> > > rpc.statd uses the canonical hostnames for comparison purposes whenever
> > > it gets an SM_MON or SM_UNMON request from lockd and when it gets an
> > > SM_NOTIFY from a rebooted NFS client.  That involves calls to
> > > getaddrinfo() and getnameinfo() which in turn could result in requests
> > > to a DNS server.  rpc.statd is single-threaded, so if it's blocked
> > > waiting for one of those requests, then it's unable to respond to the
> > > RPC ping (which has a timeout of 10 seconds) generated by the rpcinfo
> > > program.
> > >
> > > I ran into a similar scenario in the past where a client was launching
> > > multiple instances of rpc.statd.  When the client does a v3 mount it
> > > does a similar RPC ping (with a more aggressive timeout) to see if
> > > rpc.statd is running... if not then it calls out to
> > > /usr/sbin/start-statd (which in the past simply called 'exec rpc.statd
> > > --no-notify' but now has additional checks).  Likewise rpc.statd does
> > > it's own RPC ping to make sure there's not one already running.  It
> > > wound up that the user had a flakey DNS server and requests were taking
> > > over 30 seconds to time out, thus thwarting all those additional checks,
> > > and they wound up with multiple copies of rpc.statd running.
> > >
> > > You could be running into a similar scenario here and pacemaker could be
> > > deciding that rpc.statd's not running when it's actually fine.
> > >
> > > -Scott
> > >
> > > > ---------------------------------------------------------------------------------------------------------------------------------------
> > > > The script is scheduled to check every 20 seconds.
> > > >
> > > > This is the message we get in the logs -
> > > > -------------------------------------------------------------------------------------------------------------------------------------
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by
> > > > 127.0.0.1 ALLOWED
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request
> > > > from 127.0.0.1
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by
> > > > 127.0.0.1 ALLOWED (cached)
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request
> > > > from 127.0.0.1
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by
> > > > 127.0.0.1 ALLOWED (cached)
> > > > Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request
> > > > from 127.0.0.1
> > > > -------------------------------------------------------------------------------------------------------------------------------------
> > > >
> > > > After 10 seconds, we get his message -
> > > > -------------------------------------------------------------------------------------------------------------------------------------
> > > > Jul 09 07:34:09 virat-nd01 nfsserver(virat-nfs-daemon)[54087]: ERROR:
> > > > rpc-statd is not running
> > > > -------------------------------------------------------------------------------------------------------------------------------------
> > > > Once we get this error, the NFS service is automatically restarted.
> > > >
> > > > "ERROR: rpc-statd is not running" message is from the pacemaker's
> > > > monitoring script.
> > > > I have pasted that part of the script below.
> > > >
> > > > I disabled monitoring and everything is working fine, since then.
> > > >
> > > > I cant keep the cluster monitoring disabled forever.
> > > >
> > > > Kindly help.
> > > >
> > > > Regards,
> > > >
> > > >
> > > > Indivar Nair
> > > >
> > > > Part of the pacemaker script that does the monitoring
> > > > (/usr/lib/ocf/resources.d/heartbeat/nfsserver)
> > > > =======================================================================
> > > > nfsserver_systemd_monitor()
> > > > {
> > > >     local threads_num
> > > >     local rc
> > > >     local fn
> > > >
> > > >     ocf_log debug "Status: rpcbind"
> > > >     rpcinfo > /dev/null 2>&1
> > > >     rc=$?
> > > >     if [ "$rc" -ne "0" ]; then
> > > >         ocf_exit_reason "rpcbind is not running"
> > > >         return $OCF_NOT_RUNNING
> > > >     fi
> > > >
> > > >     ocf_log debug "Status: nfs-mountd"
> > > >     rpcinfo -t localhost 100005 > /dev/null 2>&1
> > > >     rc=$?
> > > >     if [ "$rc" -ne "0" ]; then
> > > >         ocf_exit_reason "nfs-mountd is not running"
> > > >         return $OCF_NOT_RUNNING
> > > >     fi
> > > >
> > > >     ocf_log debug "Status: nfs-idmapd"
> > > >     fn=`mktemp`
> > > >     nfs_exec status nfs-idmapd > $fn 2>&1
> > > >     rc=$?
> > > >     ocf_log debug "$(cat $fn)"
> > > >     rm -f $fn
> > > >     if [ "$rc" -ne "0" ]; then
> > > >         ocf_exit_reason "nfs-idmapd is not running"
> > > >         return $OCF_NOT_RUNNING
> > > >     fi
> > > >
> > > >     ocf_log debug "Status: rpc-statd"
> > > >     rpcinfo -t localhost 100024 > /dev/null 2>&1
> > > >     rc=$?
> > > >     if [ "$rc" -ne "0" ]; then
> > > >         ocf_exit_reason "rpc-statd is not running"
> > > >         return $OCF_NOT_RUNNING
> > > >     fi
> > > >
> > > >     nfs_exec is-active nfs-server
> > > >     rc=$?
> > > >
> > > >     # Now systemctl is-active can't detect the failure of kernel
> > > > process like nfsd.
> > > >     # So, if the return value of systemctl is-active is 0, check the
> > > > threads number
> > > >     # to make sure the process is running really.
> > > >     # /proc/fs/nfsd/threads has the numbers of the nfsd threads.
> > > >     if [ $rc -eq 0 ]; then
> > > >         threads_num=`cat /proc/fs/nfsd/threads 2>/dev/null`
> > > >         if [ $? -eq 0 ]; then
> > > >             if [ $threads_num -gt 0 ]; then
> > > >                 return $OCF_SUCCESS
> > > >             else
> > > >                 return 3
> > > >             fi
> > > >         else
> > > >             return $OCF_ERR_GENERIC
> > > >         fi
> > > >     fi
> > > >
> > > >     return $rc
> > > > }
> > > > =======================================================================