linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kraus, Sebastian" <sebastian.kraus@tu-berlin.de>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: RPC Pipefs: Frequent parsing errors in client database
Date: Sat, 20 Jun 2020 11:35:56 +0000	[thread overview]
Message-ID: <28a44712b25c4420909360bd813f8bfd@tu-berlin.de> (raw)
In-Reply-To: <20200619220434.GB1594@fieldses.org>

Hi Bruce,
OK, here the somehow longer story 8-O:

I am maintaining virtualized several NFS server instances running on VMware ESXi hypervisor. The operating system is Debian Stretch/Buster.
Most of the time, the NFS servers are nearly idling and there is only moderate CPU load during rush hours. So, servers far from being overloaded.

Anyway, more than a year ago, the rpc.gssd daemon started getting unstable in production use.
The daemon provokes segmentations violations, serveral times a day and on an irregular basis. Unfortunately without any obvious reason. :-(

The observed violations look like this:
Jun 11 21:52:08 all kernel: rpc.gssd[12043]: segfault at 0 ip 000056065d50e38e sp 00007fde27ffe880 error 4 in rpc.gssd[56065d50b000+9000]
or that:
Mar 17 10:32:10 all kernel: rpc.gssd[25793]: segfault at ffffffffffffffc0 ip 00007ffa61f246e4 sp 00007ffa6145f0f8 error 5 in libc-2.24.so[7ffa61ea4000+195000]

In order to manage the problem in a quick and dirty way, I activated automatic restart of the rpc-gssd.service unit for "on-fail" reasons.


Several monthes ago, I decided to investigate the problem further by launching rpc.svcgssd and rpc.gssd daemons with enhanced debug level from their service units.
Sadly, this didn't help me to get any clue of the root cause of these strange segmentations violations.

Some of my colleagues urged me to migrate the server instances from Debian Stretch (current oldstable) to Debian Buster (current stable). 
They argued, rpc.gssd's crashes possibly being rooted in NFS stack instabilities. And about three weeks ago, I upgraded two of my server instances.
Unexpectedly, not only the problem did not disappear, but moreover frequency of the segmentation violations increased slightly.

Debian Stretch ships with nfs-common v1.3.4-2.1 and Buster with nfs-common v1.3.4-2.5 . So, both based the same nfs-common point release.


In consequence, about a week ago, I decided to investigate the problem in a deep manner by stracing the rpc.gssd daemon while running.
Since then, the segementation violations were gone, but now lots of complaints of the following type appear in the system log:

 Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt3bb/info: No such file or directory
 Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info


This behaviour seems somehow strange to me.
But, one possible explanation could be: The execution speed of rpc.gssd slows down while being straced and the "true" reason for the segmentation violations pops up.
I would argue, rpc.gssd trying to parse non-existing files points anyway to an insane and defective behaviour of the RPC GSS user space daemon implementation.


Best and a nice weekend
Sebastian


Sebastian Kraus
Team IT am Institut für Chemie
Gebäude C, Straße des 17. Juni 115, Raum C7

Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin

________________________________________
From: J. Bruce Fields <bfields@fieldses.org>
Sent: Saturday, June 20, 2020 00:04
To: Kraus, Sebastian
Cc: linux-nfs@vger.kernel.org
Subject: Re: RPC Pipefs: Frequent parsing errors in client database

On Fri, Jun 19, 2020 at 09:24:27PM +0000, Kraus, Sebastian wrote:
> Hi all,
> since several weeks, I am seeing, on a regular basis, errors like the following in the system log of one of my NFSv4 file servers:
>
> Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt3bb/info: No such file or directory
> Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info

I'm not sure what exactly is happening.

Are the log messages the only problem you're seeing, or is there some
other problem?

--b.

>
> Looks like premature closing of client connections.
> The security flavor of the NFS export is set to krb5p (integrity+privacy).
>
> Anyone a hint how to efficiently track down the problem?
>
>
> Best and thanks
> Sebastian
>
>
> Sebastian Kraus
> Team IT am Institut für Chemie
> Gebäude C, Straße des 17. Juni 115, Raum C7
>
> Technische Universität Berlin
> Fakultät II
> Institut für Chemie
> Sekretariat C3
> Straße des 17. Juni 135
> 10623 Berlin

  reply	other threads:[~2020-06-20 11:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 21:24 RPC Pipefs: Frequent parsing errors in client database Kraus, Sebastian
2020-06-19 22:04 ` J. Bruce Fields
2020-06-20 11:35   ` Kraus, Sebastian [this message]
2020-06-20 17:03     ` J. Bruce Fields
2020-06-20 21:08       ` Kraus, Sebastian
2020-06-22 22:36         ` J. Bruce Fields
2020-06-25 17:43           ` Strange segmentation violations of rpc.gssd in Debian Buster Kraus, Sebastian
2020-06-25 20:14             ` J. Bruce Fields
2020-06-25 21:44             ` Doug Nazar
2020-06-26 12:31               ` Kraus, Sebastian
2020-06-26 17:23                 ` Doug Nazar
2020-06-26 19:46                   ` J. Bruce Fields
2020-06-26 20:15                     ` Doug Nazar
2020-06-26 21:02                       ` J. Bruce Fields
2020-06-26 21:30                         ` [PATCH v2] " Doug Nazar
2020-06-26 21:44                           ` J. Bruce Fields
2020-06-29  5:39                           ` Kraus, Sebastian
2020-06-29 14:09                             ` Doug Nazar
2020-07-01  7:39                               ` Kraus, Sebastian
2020-07-01  8:13                                 ` [PATCH v2] " Peter Eriksson
2020-07-01 18:45                                 ` [PATCH v2] " Doug Nazar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28a44712b25c4420909360bd813f8bfd@tu-berlin.de \
    --to=sebastian.kraus@tu-berlin.de \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).