linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kraus, Sebastian" <sebastian.kraus@tu-berlin.de>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: RPC Pipefs: Frequent parsing errors in client database
Date: Sat, 20 Jun 2020 21:08:55 +0000	[thread overview]
Message-ID: <5c45562c90404838944ee71a1d926c74@tu-berlin.de> (raw)
In-Reply-To: <20200620170316.GH1514@fieldses.org>

Hi Bruce,

>> But I think it'd be more useful to stay focused on the segfaults.

is it a clever idea to analyze core dumps? Or are there other much better debugging techniques w.r.t. RPC daemons?
I now do more tests while fiddling around with the time-out parameters "-T" and "-t" on the command line of rpc.gssd.

There are several things I do not really understand about the trace shown below:

1) How can it be useful that the rpc.gssd daemon tries to parse the info file although it knows about its absence beforehand?
2) Why are there two identifiers clnt36e and clnt36f being used for the same client?
3) What does the <?> in "inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)" mean?


Jun 18 18:55:07 server  rpcbind[30464]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server  rpcbind[30465]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server  rpcbind[30466]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server  rpcbind[30467]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server  rpcbind[30468]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server  rpcbind[30469]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server  rpcbind[30470]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server  rpcbind[30471]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server  rpc.svcgssd[20418]: leaving poll
Jun 18 18:55:07 server  rpc.svcgssd[20418]: handling null request
Jun 18 18:55:07 server  rpc.svcgssd[20418]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Jun 18 18:55:07 server  rpc.svcgssd[20418]: sname = host/client.domain.tu-berlin.de@TU-BERLIN.DE
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: calling nsswitch->princ_to_ids
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nss_getpwnam: name 'host/client.domain.tu-berlin.de@TU-BERLIN.DE' domain '(null)': resulting localname 'host/test0815.chem.tu-berlin.de'
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nss_ldap: reconnecting to LDAP server...
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nss_ldap: reconnected to LDAP server  ldaps://ldap-server.tu-berlin.de after 1 attempt
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: nsswitch->princ_to_ids returned -2
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: final return value is -2
Jun 18 18:55:07 server  rpc.svcgssd[20418]: DEBUG: serialize_krb5_ctx: lucid version!
Jun 18 18:55:07 server  rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: protocol 1
Jun 18 18:55:07 server  rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32
Jun 18 18:55:07 server  rpc.svcgssd[20418]: doing downcall
Jun 18 18:55:07 server  rpc.svcgssd[20418]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1592510348 (11041 from now), clnt: host@client.domain.tu-berlin.de, uid: -1, gid: -1, num aux grps: 0:
Jun 18 18:55:07 server  rpc.svcgssd[20418]: sending null reply
Jun 18 18:55:07 server  rpc.svcgssd[20418]: writing message: \x \x608202cf06092a864886f71201020201006e8202be308202baa003020105a10302010ea20703050020000000a3820194618201903082018ca003020105a10e1b0c54552d4245524c494e2e4445a2273025a003020103a11e301c1b036e66731b15616c6c2e6368656d2e74752d6265726c696e2e6465a382014a30820146a003
Jun 18 18:55:07 server  rpc.svcgssd[20418]: finished handling null request
Jun 18 18:55:07 server  rpc.svcgssd[20418]: entering poll

Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for topdir (nfsd4_cb) - ev->wd (5) ev->name (clnt36e) ev->mask (0x40000100)
Jun 18 18:55:07 server  rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt36e/info: No such file or directory
Jun 18 18:55:07 server  rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt36e/info

Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (krb5) ev->mask (0x00000200)
Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (gssd) ev->mask (0x00000200)
Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (info) ev->mask (0x00000200)
Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for topdir (nfsd4_cb) - ev->wd (5) ev->name (clnt36f) ev->mask (0x40000100)
Jun 18 18:55:07 server  rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)
Jun 18 18:55:07 server  rpc.svcgssd[20418]: leaving poll
Jun 18 18:55:07 server  rpc.svcgssd[20418]: handling null request
Jun 18 18:55:07 server  rpc.gssd[23620]: 
                                     handle_gssd_upcall: 'mech=krb5 uid=0 target=host@client.domain.tu-berlin.de service=nfs enctypes=18,17,16,23,3,1,2 ' (nfsd4_cb/clnt36f)
Jun 18 18:55:07 server  rpc.gssd[23620]: krb5_use_machine_creds: uid 0 tgtname host@client.domain.tu-berlin.de
Jun 18 18:55:07 server  rpc.svcgssd[20418]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Jun 18 18:55:07 server  rpc.gssd[23620]: Full hostname for 'client.domain.tu-berlin.de' is 'client.domain.tu-berlin.de'
Jun 18 18:55:07 server  rpc.gssd[23620]: Full hostname for 'server.domain.tu-berlin.de' is 'server.domain.tu-berlin.de'
Jun 18 18:55:07 server  rpc.gssd[23620]: Success getting keytab entry for 'nfs/server.domain.tu-berlin.de@TU-BERLIN.DE'
Jun 18 18:55:07 server  rpc.gssd[23620]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_TU-BERLIN.DE' are good until 1592526348
Jun 18 18:55:07 server  rpc.gssd[23620]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_TU-BERLIN.DE' are good until 1592526348
Jun 18 18:55:07 server  rpc.gssd[23620]: creating tcp client for server  client.domain.tu-berlin.de
Jun 18 18:55:07 server  rpc.gssd[23620]: DEBUG: port already set to 39495
Jun 18 18:55:07 server  rpc.gssd[23620]: creating context with server  host@client.domain.tu-berlin.de
Jun 18 18:55:07 server  rpc.gssd[23620]: in authgss_create_default()
Jun 18 18:55:07 server  rpc.gssd[23620]: in authgss_create()
Jun 18 18:55:07 server  rpc.gssd[23620]: authgss_create: name is 0x7fa5ac011970
Jun 18 18:55:07 server  rpc.gssd[23620]: authgss_create: gd->name is 0x7fa5ac005770
Jun 18 18:55:07 server  rpc.gssd[23620]: in authgss_refresh()
Jun 18 18:55:07 server  rpc.svcgssd[20418]: sname = host/client.domain.tu-berlin.de@TU-BERLIN.DE
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: calling nsswitch->princ_to_ids
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nss_getpwnam: name 'host/client.domain.tu-berlin.de@TU-BERLIN.DE' domain '(null)': resulting localname 'host/client.domain.tu-berlin.de'
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: nsswitch->princ_to_ids returned -2
Jun 18 18:55:07 server  rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: final return value is -2
Jun 18 18:55:07 server  rpc.svcgssd[20418]: DEBUG: serialize_krb5_ctx: lucid version!
Jun 18 18:55:07 server  rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: protocol 1
Jun 18 18:55:07 server  rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32
Jun 18 18:55:07 server  rpc.svcgssd[20418]: doing downcall
Jun 18 18:55:07 server  rpc.svcgssd[20418]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1592510348 (11041 from now), clnt: host@client.domain.tu-berlin.de, uid: -1, gid: -1, num aux grps: 0:
Jun 18 18:55:07 server  rpc.svcgssd[20418]: sending null reply
Jun 18 18:55:07 server  rpc.svcgssd[20418]: writing message: \x \x608202cf06092a864886f71201020201006e8202be308202baa003020105a10302010ea20703050020000000a3820194618201903082018ca003020105a10e1b0c54552d4245524c494e2e4445a2273025a003020103a11e301c1b036e66731b15616c6c2e6368656d2e74752d6265726c696e2e6465a382014a30820146a003
Jun 18 18:55:07 server  rpc.svcgssd[20418]: finished handling null request
Jun 18 18:55:07 server  rpc.svcgssd[20418]: entering poll



Best and keep well
Sebastian


Sebastian Kraus
Team IT am Institut für Chemie
Gebäude C, Straße des 17. Juni 115, Raum C7

Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin

________________________________________
From: J. Bruce Fields <bfields@fieldses.org>
Sent: Saturday, June 20, 2020 19:03
To: Kraus, Sebastian
Cc: linux-nfs@vger.kernel.org
Subject: Re: RPC Pipefs: Frequent parsing errors in client database

On Sat, Jun 20, 2020 at 11:35:56AM +0000, Kraus, Sebastian wrote:
> In consequence, about a week ago, I decided to investigate the problem
> in a deep manner by stracing the rpc.gssd daemon while running.  Since
> then, the segementation violations were gone, but now lots of
> complaints of the following type appear in the system log:
>
>  Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open
>  nfsd4_cb/clnt3bb/info: No such file or directory Jun 19 11:14:00 all
>  rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info
>
>
> This behaviour seems somehow strange to me.  But, one possible
> explanation could be: The execution speed of rpc.gssd slows down while
> being straced and the "true" reason for the segmentation violations
> pops up.  I would argue, rpc.gssd trying to parse non-existing files
> points anyway to an insane and defective behaviour of the RPC GSS user
> space daemon implementation.

Those files under rpc_pipefs come and go.  rpc.gssd monitors directories
under rpc_pipefs to learn about changes, and then tries to parse the
files under any new directories.

The thing is, if rpc.gssd is a little fast, I think it's possible that
it could get the notification about clnt3bb/ being created, and try to
look up "info", before "info" itself is actually created.

Or alternatively, if clnt3bb/ is short-lived, it might not look up
"info" until the directory's already been deleted again.

Neither problem should be fatal--rpc.gssd will get another update and
adjust to the new situation soon enough.

So it may be that the reall error here is an unconditional log message
in a case that's expected, not actually an error.

Or I could be wrong and maybe something else is happening.

But I think it'd be more useful to stay focused on the segfaults.

--b.

  reply	other threads:[~2020-06-20 21:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 21:24 RPC Pipefs: Frequent parsing errors in client database Kraus, Sebastian
2020-06-19 22:04 ` J. Bruce Fields
2020-06-20 11:35   ` Kraus, Sebastian
2020-06-20 17:03     ` J. Bruce Fields
2020-06-20 21:08       ` Kraus, Sebastian [this message]
2020-06-22 22:36         ` J. Bruce Fields
2020-06-25 17:43           ` Strange segmentation violations of rpc.gssd in Debian Buster Kraus, Sebastian
2020-06-25 20:14             ` J. Bruce Fields
2020-06-25 21:44             ` Doug Nazar
2020-06-26 12:31               ` Kraus, Sebastian
2020-06-26 17:23                 ` Doug Nazar
2020-06-26 19:46                   ` J. Bruce Fields
2020-06-26 20:15                     ` Doug Nazar
2020-06-26 21:02                       ` J. Bruce Fields
2020-06-26 21:30                         ` [PATCH v2] " Doug Nazar
2020-06-26 21:44                           ` J. Bruce Fields
2020-06-29  5:39                           ` Kraus, Sebastian
2020-06-29 14:09                             ` Doug Nazar
2020-07-01  7:39                               ` Kraus, Sebastian
2020-07-01  8:13                                 ` [PATCH v2] " Peter Eriksson
2020-07-01 18:45                                 ` [PATCH v2] " Doug Nazar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c45562c90404838944ee71a1d926c74@tu-berlin.de \
    --to=sebastian.kraus@tu-berlin.de \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).