From: "Kraus, Sebastian" <sebastian.kraus@tu-berlin.de>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: RPC Pipefs: Frequent parsing errors in client database
Date: Sat, 20 Jun 2020 21:08:55 +0000 [thread overview]
Message-ID: <5c45562c90404838944ee71a1d926c74@tu-berlin.de> (raw)
In-Reply-To: <20200620170316.GH1514@fieldses.org>
Hi Bruce,
>> But I think it'd be more useful to stay focused on the segfaults.
is it a clever idea to analyze core dumps? Or are there other much better debugging techniques w.r.t. RPC daemons?
I now do more tests while fiddling around with the time-out parameters "-T" and "-t" on the command line of rpc.gssd.
There are several things I do not really understand about the trace shown below:
1) How can it be useful that the rpc.gssd daemon tries to parse the info file although it knows about its absence beforehand?
2) Why are there two identifiers clnt36e and clnt36f being used for the same client?
3) What does the <?> in "inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)" mean?
Jun 18 18:55:07 server rpcbind[30464]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server rpcbind[30465]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server rpcbind[30466]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server rpcbind[30467]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server rpcbind[30468]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server rpcbind[30469]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server rpcbind[30470]: connect from 130.149.XXX.YYY to null()
Jun 18 18:55:07 server rpcbind[30471]: connect from 130.149.XXX.YYY to getport/addr(nfs)
Jun 18 18:55:07 server rpc.svcgssd[20418]: leaving poll
Jun 18 18:55:07 server rpc.svcgssd[20418]: handling null request
Jun 18 18:55:07 server rpc.svcgssd[20418]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Jun 18 18:55:07 server rpc.svcgssd[20418]: sname = host/client.domain.tu-berlin.de@TU-BERLIN.DE
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: calling nsswitch->princ_to_ids
Jun 18 18:55:07 server rpc.svcgssd[20418]: nss_getpwnam: name 'host/client.domain.tu-berlin.de@TU-BERLIN.DE' domain '(null)': resulting localname 'host/test0815.chem.tu-berlin.de'
Jun 18 18:55:07 server rpc.svcgssd[20418]: nss_ldap: reconnecting to LDAP server...
Jun 18 18:55:07 server rpc.svcgssd[20418]: nss_ldap: reconnected to LDAP server ldaps://ldap-server.tu-berlin.de after 1 attempt
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: nsswitch->princ_to_ids returned -2
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: final return value is -2
Jun 18 18:55:07 server rpc.svcgssd[20418]: DEBUG: serialize_krb5_ctx: lucid version!
Jun 18 18:55:07 server rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: protocol 1
Jun 18 18:55:07 server rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32
Jun 18 18:55:07 server rpc.svcgssd[20418]: doing downcall
Jun 18 18:55:07 server rpc.svcgssd[20418]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1592510348 (11041 from now), clnt: host@client.domain.tu-berlin.de, uid: -1, gid: -1, num aux grps: 0:
Jun 18 18:55:07 server rpc.svcgssd[20418]: sending null reply
Jun 18 18:55:07 server rpc.svcgssd[20418]: writing message: \x \x608202cf06092a864886f71201020201006e8202be308202baa003020105a10302010ea20703050020000000a3820194618201903082018ca003020105a10e1b0c54552d4245524c494e2e4445a2273025a003020103a11e301c1b036e66731b15616c6c2e6368656d2e74752d6265726c696e2e6465a382014a30820146a003
Jun 18 18:55:07 server rpc.svcgssd[20418]: finished handling null request
Jun 18 18:55:07 server rpc.svcgssd[20418]: entering poll
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for topdir (nfsd4_cb) - ev->wd (5) ev->name (clnt36e) ev->mask (0x40000100)
Jun 18 18:55:07 server rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt36e/info: No such file or directory
Jun 18 18:55:07 server rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt36e/info
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (krb5) ev->mask (0x00000200)
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (gssd) ev->mask (0x00000200)
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (info) ev->mask (0x00000200)
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for topdir (nfsd4_cb) - ev->wd (5) ev->name (clnt36f) ev->mask (0x40000100)
Jun 18 18:55:07 server rpc.gssd[23620]: inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)
Jun 18 18:55:07 server rpc.svcgssd[20418]: leaving poll
Jun 18 18:55:07 server rpc.svcgssd[20418]: handling null request
Jun 18 18:55:07 server rpc.gssd[23620]:
handle_gssd_upcall: 'mech=krb5 uid=0 target=host@client.domain.tu-berlin.de service=nfs enctypes=18,17,16,23,3,1,2 ' (nfsd4_cb/clnt36f)
Jun 18 18:55:07 server rpc.gssd[23620]: krb5_use_machine_creds: uid 0 tgtname host@client.domain.tu-berlin.de
Jun 18 18:55:07 server rpc.svcgssd[20418]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Jun 18 18:55:07 server rpc.gssd[23620]: Full hostname for 'client.domain.tu-berlin.de' is 'client.domain.tu-berlin.de'
Jun 18 18:55:07 server rpc.gssd[23620]: Full hostname for 'server.domain.tu-berlin.de' is 'server.domain.tu-berlin.de'
Jun 18 18:55:07 server rpc.gssd[23620]: Success getting keytab entry for 'nfs/server.domain.tu-berlin.de@TU-BERLIN.DE'
Jun 18 18:55:07 server rpc.gssd[23620]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_TU-BERLIN.DE' are good until 1592526348
Jun 18 18:55:07 server rpc.gssd[23620]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_TU-BERLIN.DE' are good until 1592526348
Jun 18 18:55:07 server rpc.gssd[23620]: creating tcp client for server client.domain.tu-berlin.de
Jun 18 18:55:07 server rpc.gssd[23620]: DEBUG: port already set to 39495
Jun 18 18:55:07 server rpc.gssd[23620]: creating context with server host@client.domain.tu-berlin.de
Jun 18 18:55:07 server rpc.gssd[23620]: in authgss_create_default()
Jun 18 18:55:07 server rpc.gssd[23620]: in authgss_create()
Jun 18 18:55:07 server rpc.gssd[23620]: authgss_create: name is 0x7fa5ac011970
Jun 18 18:55:07 server rpc.gssd[23620]: authgss_create: gd->name is 0x7fa5ac005770
Jun 18 18:55:07 server rpc.gssd[23620]: in authgss_refresh()
Jun 18 18:55:07 server rpc.svcgssd[20418]: sname = host/client.domain.tu-berlin.de@TU-BERLIN.DE
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: calling nsswitch->princ_to_ids
Jun 18 18:55:07 server rpc.svcgssd[20418]: nss_getpwnam: name 'host/client.domain.tu-berlin.de@TU-BERLIN.DE' domain '(null)': resulting localname 'host/client.domain.tu-berlin.de'
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: nsswitch->princ_to_ids returned -2
Jun 18 18:55:07 server rpc.svcgssd[20418]: nfs4_gss_princ_to_ids: final return value is -2
Jun 18 18:55:07 server rpc.svcgssd[20418]: DEBUG: serialize_krb5_ctx: lucid version!
Jun 18 18:55:07 server rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: protocol 1
Jun 18 18:55:07 server rpc.svcgssd[20418]: prepare_krb5_rfc4121_buffer: serializing key with enctype 18 and size 32
Jun 18 18:55:07 server rpc.svcgssd[20418]: doing downcall
Jun 18 18:55:07 server rpc.svcgssd[20418]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1592510348 (11041 from now), clnt: host@client.domain.tu-berlin.de, uid: -1, gid: -1, num aux grps: 0:
Jun 18 18:55:07 server rpc.svcgssd[20418]: sending null reply
Jun 18 18:55:07 server rpc.svcgssd[20418]: writing message: \x \x608202cf06092a864886f71201020201006e8202be308202baa003020105a10302010ea20703050020000000a3820194618201903082018ca003020105a10e1b0c54552d4245524c494e2e4445a2273025a003020103a11e301c1b036e66731b15616c6c2e6368656d2e74752d6265726c696e2e6465a382014a30820146a003
Jun 18 18:55:07 server rpc.svcgssd[20418]: finished handling null request
Jun 18 18:55:07 server rpc.svcgssd[20418]: entering poll
Best and keep well
Sebastian
Sebastian Kraus
Team IT am Institut für Chemie
Gebäude C, Straße des 17. Juni 115, Raum C7
Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin
________________________________________
From: J. Bruce Fields <bfields@fieldses.org>
Sent: Saturday, June 20, 2020 19:03
To: Kraus, Sebastian
Cc: linux-nfs@vger.kernel.org
Subject: Re: RPC Pipefs: Frequent parsing errors in client database
On Sat, Jun 20, 2020 at 11:35:56AM +0000, Kraus, Sebastian wrote:
> In consequence, about a week ago, I decided to investigate the problem
> in a deep manner by stracing the rpc.gssd daemon while running. Since
> then, the segementation violations were gone, but now lots of
> complaints of the following type appear in the system log:
>
> Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open
> nfsd4_cb/clnt3bb/info: No such file or directory Jun 19 11:14:00 all
> rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info
>
>
> This behaviour seems somehow strange to me. But, one possible
> explanation could be: The execution speed of rpc.gssd slows down while
> being straced and the "true" reason for the segmentation violations
> pops up. I would argue, rpc.gssd trying to parse non-existing files
> points anyway to an insane and defective behaviour of the RPC GSS user
> space daemon implementation.
Those files under rpc_pipefs come and go. rpc.gssd monitors directories
under rpc_pipefs to learn about changes, and then tries to parse the
files under any new directories.
The thing is, if rpc.gssd is a little fast, I think it's possible that
it could get the notification about clnt3bb/ being created, and try to
look up "info", before "info" itself is actually created.
Or alternatively, if clnt3bb/ is short-lived, it might not look up
"info" until the directory's already been deleted again.
Neither problem should be fatal--rpc.gssd will get another update and
adjust to the new situation soon enough.
So it may be that the reall error here is an unconditional log message
in a case that's expected, not actually an error.
Or I could be wrong and maybe something else is happening.
But I think it'd be more useful to stay focused on the segfaults.
--b.
next prev parent reply other threads:[~2020-06-20 21:09 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 21:24 RPC Pipefs: Frequent parsing errors in client database Kraus, Sebastian
2020-06-19 22:04 ` J. Bruce Fields
2020-06-20 11:35 ` Kraus, Sebastian
2020-06-20 17:03 ` J. Bruce Fields
2020-06-20 21:08 ` Kraus, Sebastian [this message]
2020-06-22 22:36 ` J. Bruce Fields
2020-06-25 17:43 ` Strange segmentation violations of rpc.gssd in Debian Buster Kraus, Sebastian
2020-06-25 20:14 ` J. Bruce Fields
2020-06-25 21:44 ` Doug Nazar
2020-06-26 12:31 ` Kraus, Sebastian
2020-06-26 17:23 ` Doug Nazar
2020-06-26 19:46 ` J. Bruce Fields
2020-06-26 20:15 ` Doug Nazar
2020-06-26 21:02 ` J. Bruce Fields
2020-06-26 21:30 ` [PATCH v2] " Doug Nazar
2020-06-26 21:44 ` J. Bruce Fields
2020-06-29 5:39 ` Kraus, Sebastian
2020-06-29 14:09 ` Doug Nazar
2020-07-01 7:39 ` Kraus, Sebastian
2020-07-01 8:13 ` [PATCH v2] " Peter Eriksson
2020-07-01 18:45 ` [PATCH v2] " Doug Nazar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c45562c90404838944ee71a1d926c74@tu-berlin.de \
--to=sebastian.kraus@tu-berlin.de \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).