linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simo Sorce <simo@redhat.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	Robbie Harwood <rharwood@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	Bruce Fields <bfields@fieldses.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Fedora 32 rpc.gssd misbehavior
Date: Thu, 30 Jul 2020 15:10:11 -0400	[thread overview]
Message-ID: <ee4b7c47bc37a53afd751159ae39d01d7cd3ee34.camel@redhat.com> (raw)
In-Reply-To: <4EB4AE01-F6D4-4E8F-86BF-C8BB07E63517@oracle.com>

On Thu, 2020-07-30 at 13:59 -0400, Chuck Lever wrote:
> > On Jul 30, 2020, at 1:08 PM, Robbie Harwood <rharwood@redhat.com> wrote:
> > 
> > Simo Sorce <simo@redhat.com> writes:
> > 
> > > On Wed, 2020-07-29 at 14:27 -0400, Chuck Lever wrote:
> > > > > On Jul 29, 2020, at 1:19 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > > > > 
> > > > > Hi!
> > > > > 
> > > > > I recently updated my test systems from EL7 to Fedora 32, and
> > > > > NFSv4.0 with Kerberos has stopped working.
> > > > > 
> > > > > I mount with "klimt.ib" as before. The client workload stops
> > > > > dead when the server tries to perform its first CB_RECALL.
> > > > > 
> > > > > I added some client instrumentation:
> > > > > 
> > > > >  kernel: NFSv4: Callback principal (nfs@klimt.ib.1015granger.net) does not match acceptor (nfs@klimt.ib).
> > > > >  kernel: NFS: NFSv4 callback contains invalid cred
> > > > > 
> > > > > I boosted gssd verbosity, and it says:
> > > > > 
> > > > >  rpc.gssd[986]: doing downcall: lifetime_rec=72226 acceptor=nfs@klimt.ib
> > > > > 
> > > > > But it knows the full hostname for the server:
> > > > > 
> > > > >  rpc.gssd[986]: Full hostname for 'klimt.ib' is 'klimt.ib.1015granger.net'
> > > > > 
> > > > > 
> > > > > The acceptor appears to come from the Kerberos library. Shouldn't
> > > > > it be canonicalized? If so, should the Kerberos library do it, or
> > > > > should gssd? Since this behavior appeared after an upgrade, I
> > > > > suspect a Kerberos library regression. But it could be config-
> > > > > related, since both systems were re-imaged from the ground up.
> > > > > 
> > > > > Also noticing some other problems on the server (missing hostname
> > > > > strings in debug messages, sssd_kcm infinite loops, and gssd
> > > > > sending garbage to the client after the NULL request that
> > > > > establishes the callback context).
> > > > > 
> > > > > But let's look at the client acceptor problem first.
> > > > 
> > > > I believe I found the problem.
> > > > 
> > > > 8bffe8c5ec1a ("gssd: add /etc/nfs.conf support") added a number of gssd config
> > > > options to /etc/nfs.conf, including "avoid-dns". The default setting of avoid-
> > > > dns is 1. When I set this option on my client system explicitly to 0, NFSv4.0
> > > > with Kerberos works again.
> > > > 
> > > > Is there a reason the default setting is 1?
> > > > 
> > > 
> > > Now that you mention DNS, this may be an interaction between a new
> > > default in Fedora 32 and how your environment is setup re DNS.
> > > 
> > > In F32 we changed the option dns_canonicalize_hostname from 'true' to
> > > 'fallback'.
> > > This is a transitional state to eventually move it to 'false' at some
> > > point in the future.
> > > 
> > > What it changes in practice is that it will first try the name passed
> > > in *as is* and only as a fallback try a CNAME if the name passed is not
> > > resolved as an A name. If you have principals in the KDC for both
> > > names, but you do not have keys in the keytab for both, you can have
> > > transitional issues.
> > > 
> > > Additionally we discovered a bug that causes non qualified names to
> > > fail resolution with the 'fallback' option.
> > > If your name in the principal is really not qualified it will try to
> > > qualify it anyway, so if your principal is literally nfs/foo@FOO
> > > libgssapi may try to use nfs/foo.my.domdain@FOO, where "my.domain" is
> > > what is defined in resolv.conf search path.
> > > 
> > > We are trying to address this regression.
> > > 
> > > So try to set dns_canonicalize_hostname to true to see if that may
> > > influence your issue. If so, please let me know, as we still need to
> > > address this where possible.
> > 
> > Also, please try setting `qualify_shortname = ""`.  (I did update the
> > config file we ship with Fedora, but upstream's default turns that on.
> > This is a temporary workaround while we merge something better
> > upstream.)
> 
> For completeness, I tried:
> 
> avoid-dns = 1
> dns_canonicalize_hostname = fallback
> qualify_shortname = ""
> 
> which is the default configuration out of the shrink wrap.
> 
> The workload hangs as before, and the acceptor is unqualified:
> 
> rpc.gssd[985]: doing downcall: lifetime_rec=84046 acceptor=nfs@klimt.ib
> 
> 
> The test is:
> 
> Configured domain name is "1015granger.net"
> 
> Fully-qualified client hostname is "manet.ib.granger.net"
> 
> Fully-qualified server hostname is "klimt.ib.granger.net"
> 
> mount command is "mount -o vers=4.0,sec=sys klimt.ib:/export /mnt"
> 
> In this case, both systems have keytabs and service principals, so
> the client automatically attempts to establish a GSS context for
> lease management and callback operations. The failure occurs because
> the server's principal is nfs@klimt.ib.1015granger.net but the
> acceptor now matches the server hostname from the mount command line,
> which is not always fully qualified.

Ok, TBH I personally consider the syntax you  are currently using as
working by accident and that you should really sue the FQDN on the
command line (I assume it works that way, right?), however I understand
this is also technically a regression, that said I do not think we can
really fix this case because your "shortname" is not short (it has a
dot in it) so the heuristicts won't trigger to qualify it even when you
set qualify_shortname="".

I have the feeling we'll break this case, and our answer will have to
be "use the fqdn on the command line".

Simo.
 
-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc





  reply	other threads:[~2020-07-30 19:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-29 17:19 Fedora 32 rpc.gssd misbehavior Chuck Lever
2020-07-29 18:27 ` Chuck Lever
2020-07-30 14:43   ` Steve Dickson
2020-07-30 16:14   ` Simo Sorce
2020-07-30 17:08     ` Robbie Harwood
2020-07-30 17:59       ` Chuck Lever
2020-07-30 19:10         ` Simo Sorce [this message]
2020-07-30 19:39           ` Chuck Lever
2020-08-10 15:28             ` Chuck Lever
2020-07-30 17:09     ` Chuck Lever
2020-07-30 17:57       ` Simo Sorce
2020-07-30 18:07         ` Chuck Lever
2020-07-30 18:20           ` Simo Sorce
2020-07-30 18:29             ` Chuck Lever
2020-07-30 18:55               ` Simo Sorce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee4b7c47bc37a53afd751159ae39d01d7cd3ee34.camel@redhat.com \
    --to=simo@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rharwood@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).