All of lore.kernel.org
 help / color / mirror / Atom feed
* when rpc.mountd flushes auth.unix.gid
@ 2014-10-17 14:42 Colin Hudler
  2014-10-17 16:15 ` Tom Haynes
  2014-10-17 21:06 ` Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: Colin Hudler @ 2014-10-17 14:42 UTC (permalink / raw)
  To: linux-nfs

We have a few hundred computers mounting an NFS server in a typical 
LDAP-based users (nss) setup. We frequently add and remove exports and 
use exportfs -r to update etab. Every time we do so, the clients report 
"NFS server not responding" and start backing off their requests. After 
a painful 3-5 minutes, they recover and life is normal again.

We discovered that when the rpc.mountd cache flushing occurs, our NIS 
system is overwhelmed with grouplist requests and this obviously blocks 
things. We are working on that problem separately, and I admit this to 
be a weakness in our setup. My question is simple.

Why does it flush auth.unix.gid when the etab changed? I think it makes 
unnecessary work for rpc.mountd because the gids are unlikely to have 
changed, and they already have a reasonable expiration policy.

--
Colin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
  2014-10-17 14:42 when rpc.mountd flushes auth.unix.gid Colin Hudler
@ 2014-10-17 16:15 ` Tom Haynes
  2014-10-17 18:19   ` Colin Hudler
  2014-10-17 21:06 ` Jeff Layton
  1 sibling, 1 reply; 7+ messages in thread
From: Tom Haynes @ 2014-10-17 16:15 UTC (permalink / raw)
  To: Colin Hudler; +Cc: linux-nfs


On Oct 17, 2014, at 9:42 AM, Colin Hudler <chudler@cs.uchicago.edu> wrote:

> We have a few hundred computers mounting an NFS server in a typical LDAP-based users (nss) setup. We frequently add and remove exports and use exportfs -r to update stab.

I know this isn’t your question, but you would be better served by using explicit exportfs -a and exportfs -r commands for the specific changes.



> Every time we do so, the clients report "NFS server not responding" and start backing off their requests. After a painful 3-5 minutes, they recover and life is normal again.
> 
> We discovered that when the rpc.mountd cache flushing occurs, our NIS system is overwhelmed with grouplist requests and this obviously blocks things. We are working on that problem separately, and I admit this to be a weakness in our setup. My question is simple.
> 
> Why does it flush auth.unix.gid when the etab changed? I think it makes unnecessary work for rpc.mountd because the gids are unlikely to have changed,

Another assumption is that exports rarely change. I expect your setup is an exception to the rule.


> and they already have a reasonable expiration policy.

One way to read what the man page states for exportfs -r:

       -r     Reexport  all  directories,  synchronizing  /var/lib/nfs/etab with /etc/exports and files under /etc/exports.d.
              This option removes entries in /var/lib/nfs/etab which have been  deleted  from  /etc/exports  or  files  under
              /etc/exports.d, and removes any entries from the kernel export table which are no longer valid.

is that it only removes entries which have been deleted.

Instead, it removes all entries and reexports those that are still valid. The remove of all is what blows away the auth.unix.gid caching.

Using exportfs -a <path> and exportfs -r <path> should solve this for you.




> 
> --
> Colin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
  2014-10-17 16:15 ` Tom Haynes
@ 2014-10-17 18:19   ` Colin Hudler
  0 siblings, 0 replies; 7+ messages in thread
From: Colin Hudler @ 2014-10-17 18:19 UTC (permalink / raw)
  To: linux-nfs



On 10/17/2014 11:15 AM, Tom Haynes wrote:
>
> On Oct 17, 2014, at 9:42 AM, Colin Hudler <chudler@cs.uchicago.edu> wrote:
>
>> We have a few hundred computers mounting an NFS server in a typical LDAP-based users (nss) setup. We frequently add and remove exports and use exportfs -r to update stab.
>
> I know this isn’t your question, but you would be better served by using explicit exportfs -a and exportfs -r commands for the specific changes.
>

Thank you for the insight and suggestions. We are considering changing 
our methods but it requires breaking some (long-standing, internal) 
abstractions and weighing the risk associated with that. In short, our 
automations suck and cannot be changed so easily. However, manual-mode 
is always an option.

>
>
>> Every time we do so, the clients report "NFS server not responding" and start backing off their requests. After a painful 3-5 minutes, they recover and life is normal again.
>>
>> We discovered that when the rpc.mountd cache flushing occurs, our NIS system is overwhelmed with grouplist requests and this obviously blocks things. We are working on that problem separately, and I admit this to be a weakness in our setup. My question is simple.
>>
>> Why does it flush auth.unix.gid when the etab changed? I think it makes unnecessary work for rpc.mountd because the gids are unlikely to have changed,
>
> Another assumption is that exports rarely change. I expect your setup is an exception to the rule.
>
>
>> and they already have a reasonable expiration policy.
>
> One way to read what the man page states for exportfs -r:
>
>         -r     Reexport  all  directories,  synchronizing  /var/lib/nfs/etab with /etc/exports and files under /etc/exports.d.
>                This option removes entries in /var/lib/nfs/etab which have been  deleted  from  /etc/exports  or  files  under
>                /etc/exports.d, and removes any entries from the kernel export table which are no longer valid.
>
> is that it only removes entries which have been deleted.
>
> Instead, it removes all entries and reexports those that are still valid. The remove of all is what blows away the auth.unix.gid caching.
>
> Using exportfs -a <path> and exportfs -r <path> should solve this for you.


Understood. I am now tempted to rework exportfs -r into a loop over 
dump(). Thanks again.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
  2014-10-17 14:42 when rpc.mountd flushes auth.unix.gid Colin Hudler
  2014-10-17 16:15 ` Tom Haynes
@ 2014-10-17 21:06 ` Jeff Layton
  2014-10-17 22:24   ` Tom Haynes
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2014-10-17 21:06 UTC (permalink / raw)
  To: Colin Hudler; +Cc: linux-nfs

On Fri, 17 Oct 2014 09:42:14 -0500
Colin Hudler <chudler@cs.uchicago.edu> wrote:

> We have a few hundred computers mounting an NFS server in a typical 
> LDAP-based users (nss) setup. We frequently add and remove exports and 
> use exportfs -r to update etab. Every time we do so, the clients report 
> "NFS server not responding" and start backing off their requests. After 
> a painful 3-5 minutes, they recover and life is normal again.
> 
> We discovered that when the rpc.mountd cache flushing occurs, our NIS 
> system is overwhelmed with grouplist requests and this obviously blocks 
> things. We are working on that problem separately, and I admit this to 
> be a weakness in our setup. My question is simple.
> 
> Why does it flush auth.unix.gid when the etab changed? I think it makes 
> unnecessary work for rpc.mountd because the gids are unlikely to have 
> changed, and they already have a reasonable expiration policy.
> 

Most likely because no one really cared until now.

When exports change, cache_flush() is called and that function flushes
out all of the kernel caches.

I expect that could be made to do something a bit more granular, but
you may need to do some archaeology in mountd/exportfs (and the kernel)
to ensure that you're not missing anything.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
  2014-10-17 21:06 ` Jeff Layton
@ 2014-10-17 22:24   ` Tom Haynes
  2014-10-17 23:37     ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Haynes @ 2014-10-17 22:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Colin Hudler, linux-nfs




> On Oct 17, 2014, at 4:06 PM, Jeff Layton <jeff.layton@primarydata.com> wrote:
> 
> On Fri, 17 Oct 2014 09:42:14 -0500
> Colin Hudler <chudler@cs.uchicago.edu> wrote:
> 
>> We have a few hundred computers mounting an NFS server in a typical 
>> LDAP-based users (nss) setup. We frequently add and remove exports and 
>> use exportfs -r to update etab. Every time we do so, the clients report 
>> "NFS server not responding" and start backing off their requests. After 
>> a painful 3-5 minutes, they recover and life is normal again.
>> 
>> We discovered that when the rpc.mountd cache flushing occurs, our NIS 
>> system is overwhelmed with grouplist requests and this obviously blocks 
>> things. We are working on that problem separately, and I admit this to 
>> be a weakness in our setup. My question is simple.
>> 
>> Why does it flush auth.unix.gid when the etab changed? I think it makes 
>> unnecessary work for rpc.mountd because the gids are unlikely to have 
>> changed, and they already have a reasonable expiration policy.
> 
> Most likely because no one really cared until now.
> 
> When exports change, cache_flush() is called and that function flushes
> out all of the kernel caches.
> 
> I expect that could be made to do something a bit more granular, but
> you may need to do some archaeology in mountd/exportfs (and the kernel)
> to ensure that you're not missing anything.
> 

One thing would be to not remove the exports which are going to be added back in.

The catch here is that you have to account for new entries which need to be added.



> -- 
> Jeff Layton <jlayton@primarydata.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
  2014-10-17 22:24   ` Tom Haynes
@ 2014-10-17 23:37     ` Jeff Layton
       [not found]       ` <60F78F70-72B4-41CA-8B13-3C7D607569E0@primarydata.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2014-10-17 23:37 UTC (permalink / raw)
  To: Tom Haynes; +Cc: Jeff Layton, Colin Hudler, linux-nfs

On Fri, 17 Oct 2014 17:24:14 -0500
Tom Haynes <thomas.haynes@primarydata.com> wrote:

> 
> 
> 
> > On Oct 17, 2014, at 4:06 PM, Jeff Layton <jeff.layton@primarydata.com> wrote:
> > 
> > On Fri, 17 Oct 2014 09:42:14 -0500
> > Colin Hudler <chudler@cs.uchicago.edu> wrote:
> > 
> >> We have a few hundred computers mounting an NFS server in a typical 
> >> LDAP-based users (nss) setup. We frequently add and remove exports and 
> >> use exportfs -r to update etab. Every time we do so, the clients report 
> >> "NFS server not responding" and start backing off their requests. After 
> >> a painful 3-5 minutes, they recover and life is normal again.
> >> 
> >> We discovered that when the rpc.mountd cache flushing occurs, our NIS 
> >> system is overwhelmed with grouplist requests and this obviously blocks 
> >> things. We are working on that problem separately, and I admit this to 
> >> be a weakness in our setup. My question is simple.
> >> 
> >> Why does it flush auth.unix.gid when the etab changed? I think it makes 
> >> unnecessary work for rpc.mountd because the gids are unlikely to have 
> >> changed, and they already have a reasonable expiration policy.
> > 
> > Most likely because no one really cared until now.
> > 
> > When exports change, cache_flush() is called and that function flushes
> > out all of the kernel caches.
> > 
> > I expect that could be made to do something a bit more granular, but
> > you may need to do some archaeology in mountd/exportfs (and the kernel)
> > to ensure that you're not missing anything.
> > 
> 
> One thing would be to not remove the exports which are going to be added back in.
> 
> The catch here is that you have to account for new entries which need to be added.
> 
> 

I'm not sure that flushing the uid or gid caches is really necessary on
an exports change at all. I don't think we expect that info to change.

In practical terms, we might be able to change exportfs to just flush
the nfsd.fh and nfsd.export caches instead of a full cache_flush() ?

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: when rpc.mountd flushes auth.unix.gid
       [not found]       ` <60F78F70-72B4-41CA-8B13-3C7D607569E0@primarydata.com>
@ 2014-10-18 10:10         ` Jeff Layton
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2014-10-18 10:10 UTC (permalink / raw)
  To: Tom Haynes; +Cc: Colin Hudler, linux-nfs

On Fri, 17 Oct 2014 21:21:18 -0500
Tom Haynes <thomas.haynes@primarydata.com> wrote:

> 
> On Oct 17, 2014, at 6:37 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> 
> > On Fri, 17 Oct 2014 17:24:14 -0500
> > Tom Haynes <thomas.haynes@primarydata.com> wrote:
> > 
> >> 
> >> 
> >> 
> >>> On Oct 17, 2014, at 4:06 PM, Jeff Layton <jeff.layton@primarydata.com> wrote:
> >>> 
> >>> On Fri, 17 Oct 2014 09:42:14 -0500
> >>> Colin Hudler <chudler@cs.uchicago.edu> wrote:
> >>> 
> >>>> We have a few hundred computers mounting an NFS server in a typical 
> >>>> LDAP-based users (nss) setup. We frequently add and remove exports and 
> >>>> use exportfs -r to update etab. Every time we do so, the clients report 
> >>>> "NFS server not responding" and start backing off their requests. After 
> >>>> a painful 3-5 minutes, they recover and life is normal again.
> >>>> 
> >>>> We discovered that when the rpc.mountd cache flushing occurs, our NIS 
> >>>> system is overwhelmed with grouplist requests and this obviously blocks 
> >>>> things. We are working on that problem separately, and I admit this to 
> >>>> be a weakness in our setup. My question is simple.
> >>>> 
> >>>> Why does it flush auth.unix.gid when the etab changed? I think it makes 
> >>>> unnecessary work for rpc.mountd because the gids are unlikely to have 
> >>>> changed, and they already have a reasonable expiration policy.
> >>> 
> >>> Most likely because no one really cared until now.
> >>> 
> >>> When exports change, cache_flush() is called and that function flushes
> >>> out all of the kernel caches.
> >>> 
> >>> I expect that could be made to do something a bit more granular, but
> >>> you may need to do some archaeology in mountd/exportfs (and the kernel)
> >>> to ensure that you're not missing anything.
> >>> 
> >> 
> >> One thing would be to not remove the exports which are going to be added back in.
> >> 
> >> The catch here is that you have to account for new entries which need to be added.
> >> 
> >> 
> > 
> > I'm not sure that flushing the uid or gid caches is really necessary on
> > an exports change at all. I don't think we expect that info to change.
> 
> Is there a manual way to flush these caches?
> 
> Bump down the default TTL?
> 
> 

The manual way is to write to /proc/net/rpc/*/flush (which is what
cache_flush() in nfs-utils does). The comments over it say:

/* flush the kNFSd caches.
 * Set the flush time to the mtime of _PATH_ETAB or
 * if force, to now.
 * the caches to flush are:
 *  auth.unix.ip nfsd.export nfsd.fh
 */

...but it looks like auth.unix.gid was added in 2007 and the
comment wasn't updated.

> > 
> > In practical terms, we might be able to change exportfs to just flush
> > the nfsd.fh and nfsd.export caches instead of a full cache_flush() ?
> > 
> > -- 
> > Jeff Layton <jlayton@poochiereds.net>
> 


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-18 10:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-17 14:42 when rpc.mountd flushes auth.unix.gid Colin Hudler
2014-10-17 16:15 ` Tom Haynes
2014-10-17 18:19   ` Colin Hudler
2014-10-17 21:06 ` Jeff Layton
2014-10-17 22:24   ` Tom Haynes
2014-10-17 23:37     ` Jeff Layton
     [not found]       ` <60F78F70-72B4-41CA-8B13-3C7D607569E0@primarydata.com>
2014-10-18 10:10         ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.