All of lore.kernel.org
 help / color / mirror / Atom feed
* VM issue causing high CPU loads
@ 2009-08-24 14:23 Yohan
  2009-08-24 23:21   ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Yohan @ 2009-08-24 14:23 UTC (permalink / raw)
  To: linux-kernel

Hi,

    Is someone have an idea for that :

        http://bugzilla.kernel.org/show_bug.cgi?id=14024

Thanks
Yohan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-24 14:23 VM issue causing high CPU loads Yohan
@ 2009-08-24 23:21   ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-08-24 23:21 UTC (permalink / raw)
  To: Yohan; +Cc: linux-kernel, linux-mm

On Mon, 24 Aug 2009 16:23:22 +0200
Yohan <kernel@yohan.staff.proxad.net> wrote:

> Hi,
> 
>     Is someone have an idea for that :
> 
>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> 

Please generate a kernel profile to work out where all the CPU tie is
being spent.  Documentation/basic_profiling.txt is a starting point.

Thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-08-24 23:21   ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-08-24 23:21 UTC (permalink / raw)
  To: Yohan; +Cc: linux-kernel, linux-mm

On Mon, 24 Aug 2009 16:23:22 +0200
Yohan <kernel@yohan.staff.proxad.net> wrote:

> Hi,
> 
>     Is someone have an idea for that :
> 
>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> 

Please generate a kernel profile to work out where all the CPU tie is
being spent.  Documentation/basic_profiling.txt is a starting point.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-24 23:21   ` Andrew Morton
@ 2009-08-26 11:08     ` Mel Gorman
  -1 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2009-08-26 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Yohan, linux-kernel, linux-mm

On Mon, Aug 24, 2009 at 04:21:55PM -0700, Andrew Morton wrote:
> On Mon, 24 Aug 2009 16:23:22 +0200
> Yohan <kernel@yohan.staff.proxad.net> wrote:
> 
> > Hi,
> > 
> >     Is someone have an idea for that :
> > 
> >         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> > 
> 
> Please generate a kernel profile to work out where all the CPU tie is
> being spent.  Documentation/basic_profiling.txt is a starting point.
> 

In the absense of a profile, here is a total stab in the dark. Is this a
NUMA machine? If so, is /proc/sys/vm/zone_reclaim_mode set to 1 and does
setting it to 0 help?

This is based on a relatively recent bug where malloc() could stall for
long times with large amounts of CPU usage due to useless scanning in
page reclaim.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-08-26 11:08     ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2009-08-26 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Yohan, linux-kernel, linux-mm

On Mon, Aug 24, 2009 at 04:21:55PM -0700, Andrew Morton wrote:
> On Mon, 24 Aug 2009 16:23:22 +0200
> Yohan <kernel@yohan.staff.proxad.net> wrote:
> 
> > Hi,
> > 
> >     Is someone have an idea for that :
> > 
> >         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> > 
> 
> Please generate a kernel profile to work out where all the CPU tie is
> being spent.  Documentation/basic_profiling.txt is a starting point.
> 

In the absense of a profile, here is a total stab in the dark. Is this a
NUMA machine? If so, is /proc/sys/vm/zone_reclaim_mode set to 1 and does
setting it to 0 help?

This is based on a relatively recent bug where malloc() could stall for
long times with large amounts of CPU usage due to useless scanning in
page reclaim.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-24 23:21   ` Andrew Morton
  (?)
  (?)
@ 2009-08-26 11:53   ` Yohan
  -1 siblings, 0 replies; 24+ messages in thread
From: Yohan @ 2009-08-26 11:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Andrew Morton wrote:
> On Mon, 24 Aug 2009 16:23:22 +0200
> Yohan <kernel@yohan.staff.proxad.net> wrote:
>   
>> Hi,
>>
>>     Is someone have an idea for that :
>>
>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
>>     
> Please generate a kernel profile to work out where all the CPU tie is
> being spent.  Documentation/basic_profiling.txt is a starting point.
I did & post the profiles on the bugtrack
I dit it with a 2.6.31-rc7-git2 kernel
(need at least 2 week days after a reboot/drop_cache  to really show the 
bug)

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-26 11:08     ` Mel Gorman
  (?)
@ 2009-08-26 11:55     ` Yohan
  -1 siblings, 0 replies; 24+ messages in thread
From: Yohan @ 2009-08-26 11:55 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-kernel, linux-mm

Mel Gorman wrote:
> On Mon, Aug 24, 2009 at 04:21:55PM -0700, Andrew Morton wrote:
>   
>> On Mon, 24 Aug 2009 16:23:22 +0200
>> Yohan <kernel@yohan.staff.proxad.net> wrote:
>>     
>>> Hi,
>>>
>>>     Is someone have an idea for that :
>>>
>>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
>>>       
>> Please generate a kernel profile to work out where all the CPU tie is
>> being spent.  Documentation/basic_profiling.txt is a starting point.
>>     
> In the absense of a profile, here is a total stab in the dark. Is this a
> NUMA machine? 
This is a Intel(R) Xeon(R) CPU E5520 on Dell R610
> If so, is /proc/sys/vm/zone_reclaim_mode set to 1 and does
> setting it to 0 help?
>   
The value is already 0...


Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-24 23:21   ` Andrew Morton
@ 2009-08-27  8:39     ` Yohan
  -1 siblings, 0 replies; 24+ messages in thread
From: Yohan @ 2009-08-27  8:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Andrew Morton wrote:
> On Mon, 24 Aug 2009 16:23:22 +0200
> Yohan <kernel@yohan.staff.proxad.net> wrote:
>   
>> Hi,
>>
>>     Is someone have an idea for that :
>>
>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
>>     
> Please generate a kernel profile to work out where all the CPU tie is
> being spent.  Documentation/basic_profiling.txt is a starting point.
>   
I post some new reports, it seems that the problem is in  
rpcauth_lookup_credcache ...

for information, this is an imap mail server that mounts ~10 netapp over 
~300 mountpoints..

Thanks
Yohan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-08-27  8:39     ` Yohan
  0 siblings, 0 replies; 24+ messages in thread
From: Yohan @ 2009-08-27  8:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Andrew Morton wrote:
> On Mon, 24 Aug 2009 16:23:22 +0200
> Yohan <kernel@yohan.staff.proxad.net> wrote:
>   
>> Hi,
>>
>>     Is someone have an idea for that :
>>
>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
>>     
> Please generate a kernel profile to work out where all the CPU tie is
> being spent.  Documentation/basic_profiling.txt is a starting point.
>   
I post some new reports, it seems that the problem is in  
rpcauth_lookup_credcache ...

for information, this is an imap mail server that mounts ~10 netapp over 
~300 mountpoints..

Thanks
Yohan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-08-27  8:39     ` Yohan
  (?)
@ 2009-08-31 20:39     ` Yohan
  2009-09-03  0:06         ` Andrew Morton
  -1 siblings, 1 reply; 24+ messages in thread
From: Yohan @ 2009-08-31 20:39 UTC (permalink / raw)
  To: Yohan; +Cc: Andrew Morton, linux-kernel

Yohan wrote:
> Andrew Morton wrote:
>> On Mon, 24 Aug 2009 16:23:22 +0200
>> Yohan <kernel@yohan.staff.proxad.net> wrote:  
>>> Hi,
>>>
>>>     Is someone have an idea for that :
>>>
>>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
>>>     
>> Please generate a kernel profile to work out where all the CPU tie is
>> being spent.  Documentation/basic_profiling.txt is a starting point.
>>   
> I post some new reports, it seems that the problem is in  
> rpcauth_lookup_credcache ...
>
> for information, this is an imap mail server that mounts ~10 netapp 
> over ~300 mountpoints..
    I saw that : http://patchwork.kernel.org/patch/24747/

I did only:

--- linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-03-23 23:04:09.000000000 +0100
+++ linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-05-19 16:02:35.000000000 +0200
@@ -62,8 +62,12 @@ 
  */
- #define RPC_CREDCACHE_HASHBITS	4
+ #define RPC_CREDCACHE_HASHBITS	12


And i test it in prod since sunday: i only have 36% of one core used by 
system
versus more than 3 cores used by system in another server that did a 
drop_caches at morning...

Yohan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-09-03  0:06         ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-09-03  0:06 UTC (permalink / raw)
  To: Yohan
  Cc: ytordjman, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields,
	Trond Myklebust, mikevs

On Mon, 31 Aug 2009 22:39:20 +0200
Yohan <ytordjman@corp.free.fr> wrote:

> Yohan wrote:
> > Andrew Morton wrote:
> >> On Mon, 24 Aug 2009 16:23:22 +0200
> >> Yohan <kernel@yohan.staff.proxad.net> wrote:  
> >>> Hi,
> >>>
> >>>     Is someone have an idea for that :
> >>>
> >>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> >>>     
> >> Please generate a kernel profile to work out where all the CPU tie is
> >> being spent.  Documentation/basic_profiling.txt is a starting point.
> >>   
> > I post some new reports, it seems that the problem is in  
> > rpcauth_lookup_credcache ...

Thanks, that helps a lot.

> > for information, this is an imap mail server that mounts ~10 netapp 
> > over ~300 mountpoints..
>     I saw that : http://patchwork.kernel.org/patch/24747/

I wonder what happened with Miquel's patch?

> I did only:
> 
> --- linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-03-23 23:04:09.000000000 +0100
> +++ linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-05-19 16:02:35.000000000 +0200
> @@ -62,8 +62,12 @@ 
>   */
> - #define RPC_CREDCACHE_HASHBITS	4
> + #define RPC_CREDCACHE_HASHBITS	12
> 
> 
> And i test it in prod since sunday: i only have 36% of one core used by 
> system
> versus more than 3 cores used by system in another server that did a 
> drop_caches at morning...
> 

OK, but it's still pretty bad.  Let's tell the NFS guys.

In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a
major meltdown caused by the linear search in
rpcauth_lookup_credcache() with Yohan's workload.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-09-03  0:06         ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-09-03  0:06 UTC (permalink / raw)
  To: Yohan
  Cc: ytordjman-CZvJ5kAzflf985uAA1p3mw, linux-kernel, linux-nfs,
	Neil Brown, J. Bruce Fields, Trond Myklebust, mikevs

On Mon, 31 Aug 2009 22:39:20 +0200
Yohan <ytordjman-CZvJ5kAzflf985uAA1p3mw@public.gmane.org> wrote:

> Yohan wrote:
> > Andrew Morton wrote:
> >> On Mon, 24 Aug 2009 16:23:22 +0200
> >> Yohan <kernel-qqs8qlct+LniQET9NtBuNOUY/dPFsV6l@public.gmane.org> wrote:  
> >>> Hi,
> >>>
> >>>     Is someone have an idea for that :
> >>>
> >>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> >>>     
> >> Please generate a kernel profile to work out where all the CPU tie is
> >> being spent.  Documentation/basic_profiling.txt is a starting point.
> >>   
> > I post some new reports, it seems that the problem is in  
> > rpcauth_lookup_credcache ...

Thanks, that helps a lot.

> > for information, this is an imap mail server that mounts ~10 netapp 
> > over ~300 mountpoints..
>     I saw that : http://patchwork.kernel.org/patch/24747/

I wonder what happened with Miquel's patch?

> I did only:
> 
> --- linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-03-23 23:04:09.000000000 +0100
> +++ linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-05-19 16:02:35.000000000 +0200
> @@ -62,8 +62,12 @@ 
>   */
> - #define RPC_CREDCACHE_HASHBITS	4
> + #define RPC_CREDCACHE_HASHBITS	12
> 
> 
> And i test it in prod since sunday: i only have 36% of one core used by 
> system
> versus more than 3 cores used by system in another server that did a 
> drop_caches at morning...
> 

OK, but it's still pretty bad.  Let's tell the NFS guys.

In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a
major meltdown caused by the linear search in
rpcauth_lookup_credcache() with Yohan's workload.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03  0:06         ` Andrew Morton
  (?)
@ 2009-09-03 13:01         ` Trond Myklebust
  2009-09-03 13:39           ` Yohan
  -1 siblings, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2009-09-03 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yohan, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs

On Wed, 2009-09-02 at 17:06 -0700, Andrew Morton wrote:
> On Mon, 31 Aug 2009 22:39:20 +0200
> Yohan <ytordjman@corp.free.fr> wrote:
> 
> > Yohan wrote:
> > > Andrew Morton wrote:
> > >> On Mon, 24 Aug 2009 16:23:22 +0200
> > >> Yohan <kernel@yohan.staff.proxad.net> wrote:  
> > >>> Hi,
> > >>>
> > >>>     Is someone have an idea for that :
> > >>>
> > >>>         http://bugzilla.kernel.org/show_bug.cgi?id=14024
> > >>>     
> > >> Please generate a kernel profile to work out where all the CPU tie is
> > >> being spent.  Documentation/basic_profiling.txt is a starting point.
> > >>   
> > > I post some new reports, it seems that the problem is in  
> > > rpcauth_lookup_credcache ...
> 
> Thanks, that helps a lot.
> 
> > > for information, this is an imap mail server that mounts ~10 netapp 
> > > over ~300 mountpoints..
> >     I saw that : http://patchwork.kernel.org/patch/24747/
> 
> I wonder what happened with Miquel's patch?

At the time, I asked him to split out the various changes into several
patches.

His patch did a lot of different things that would impact workloads in
different ways. For instance, while increasing the hash table size is
not likely to have a huge performance degradation for most people, the
change that decreases the garbage collection timeout is very likely to
cause issues (particularly with RPCSEC_GSS setups)...

> > I did only:
> > 
> > --- linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-03-23 23:04:09.000000000 +0100
> > +++ linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-05-19 16:02:35.000000000 +0200
> > @@ -62,8 +62,12 @@ 
> >   */
> > - #define RPC_CREDCACHE_HASHBITS	4
> > + #define RPC_CREDCACHE_HASHBITS	12
> > 
> > 
> > And i test it in prod since sunday: i only have 36% of one core used by 
> > system
> > versus more than 3 cores used by system in another server that did a 
> > drop_caches at morning...
> > 
> 
> OK, but it's still pretty bad.  Let's tell the NFS guys.
> 
> In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a
> major meltdown caused by the linear search in
> rpcauth_lookup_credcache() with Yohan's workload.
> 

OK. Could we please have some more details about the actual workload
involved here?
As far as I can see, there is no RPCSEC_GSS involved, so credentials
should never expire. They will be reused as long as processes aren't
switching between thousands and thousands of different combinations of
uid, gid and groups.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 13:01         ` Trond Myklebust
@ 2009-09-03 13:39           ` Yohan
  2009-09-03 14:02             ` Trond Myklebust
  0 siblings, 1 reply; 24+ messages in thread
From: Yohan @ 2009-09-03 13:39 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs


>>> I did only:
>>>
>>> --- linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-03-23 23:04:09.000000000 +0100
>>> +++ linux-2.6.27.21/include/linux/sunrpc/auth.h	2009-05-19 16:02:35.000000000 +0200
>>> @@ -62,8 +62,12 @@ 
>>>   */
>>> - #define RPC_CREDCACHE_HASHBITS	4
>>> + #define RPC_CREDCACHE_HASHBITS	12
>>>
>>>
>>> And i test it in prod since sunday: i only have 36% of one core used by 
>>> system
>>> versus more than 3 cores used by system in another server that did a 
>>> drop_caches at morning...
>>>       
>> OK, but it's still pretty bad.  Let's tell the NFS guys.
>>
>> In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a
>> major meltdown caused by the linear search in
>> rpcauth_lookup_credcache() with Yohan's workload.
>>     
> OK. Could we please have some more details about the actual workload involved here?
>   
I add a new server CPU graph and 60s readprofile on the bugzilla

> As far as I can see, there is no RPCSEC_GSS involved, so credentials
> should never expire. They will be reused as long as processes aren't
> switching between thousands and thousands of different combinations of
> uid, gid and groups.
My servers are imap servers.
Foreach user (~15 million) it have a specific uid over ~10 nfs netapp 
storage.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 13:39           ` Yohan
@ 2009-09-03 14:02             ` Trond Myklebust
  2009-09-03 14:08               ` Yohan
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Trond Myklebust @ 2009-09-03 14:02 UTC (permalink / raw)
  To: Yohan
  Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote:
> > As far as I can see, there is no RPCSEC_GSS involved, so credentials
> > should never expire. They will be reused as long as processes aren't
> > switching between thousands and thousands of different combinations of
> > uid, gid and groups.
> My servers are imap servers.
> Foreach user (~15 million) it have a specific uid over ~10 nfs netapp 
> storage.

OK, so 16 hash buckets are likely to be filled with ~10^6 entries each.
I can see that might be a performance issue...

So afaics, you did try adjusting the hashtable size. How much larger
does it have to be before you start to get acceptable performance? If it
solves your problem we could make hash table sizes adjustable via a
module parameter, for instance.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 14:02             ` Trond Myklebust
@ 2009-09-03 14:08               ` Yohan
  2009-09-03 14:35               ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg
  2009-09-03 20:05               ` VM issue causing high CPU loads Simon Kirby
  2 siblings, 0 replies; 24+ messages in thread
From: Yohan @ 2009-09-03 14:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

Trond Myklebust wrote:
> On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote:
>   
>>> As far as I can see, there is no RPCSEC_GSS involved, so credentials
>>> should never expire. They will be reused as long as processes aren't
>>> switching between thousands and thousands of different combinations of
>>> uid, gid and groups.
>>>       
>> My servers are imap servers.
>> Foreach user (~15 million) it have a specific uid over ~10 nfs netapp 
>> storage.
>>     
> OK, so 16 hash buckets are likely to be filled with ~10^6 entries each.
> I can see that might be a performance issue...
>
> So afaics, you did try adjusting the hashtable size. How much larger
> does it have to be before you start to get acceptable performance? If it
> solves your problem we could make hash table sizes adjustable via a
> module parameter, for instance.
>   
I run now with a value of 12, and it's great for me...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads]
  2009-09-03 14:02             ` Trond Myklebust
  2009-09-03 14:08               ` Yohan
@ 2009-09-03 14:35               ` Miquel van Smoorenburg
  2009-09-03 20:05               ` VM issue causing high CPU loads Simon Kirby
  2 siblings, 0 replies; 24+ messages in thread
From: Miquel van Smoorenburg @ 2009-09-03 14:35 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

On Thu, 2009-09-03 at 10:02 -0400, Trond Myklebust wrote:
> On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote:
> > > As far as I can see, there is no RPCSEC_GSS involved, so credentials
> > > should never expire. They will be reused as long as processes aren't
> > > switching between thousands and thousands of different combinations of
> > > uid, gid and groups.
> > My servers are imap servers.
> > Foreach user (~15 million) it have a specific uid over ~10 nfs netapp 
> > storage.
> 
> OK, so 16 hash buckets are likely to be filled with ~10^6 entries each.
> I can see that might be a performance issue...
> 
> So afaics, you did try adjusting the hashtable size. How much larger
> does it have to be before you start to get acceptable performance? If it
> solves your problem we could make hash table sizes adjustable via a
> module parameter, for instance.

That is *exactly* what my patch does :)
I ported it to 2.6.31-rc8-bk2 this afternoon, that was trivial.

What I wanted to discuss was finding out if there was another solution,
or that we should build something that auto-tunes hashtable sizes, of if
there was a way to limit the size of the cache in another way.

I have the same usage pattern as Yohan (also an IMAP server for
potentially a few million different uids) - lots of uids are used, but
not simultaneously (maybe a few hundred or a thousand at the same time).
It's just that the inode/dentry/cred caches never expire because modern
boxes have lots and lots of memory.

Due to personal circumstances though I haven't been able to work on
anything much for the last few months. I apologize for keeping quiet.

Patch attached. I've removed the debugging stuff, this is only the
"dynamically allocate credcache hashtables" patch.

Patch description:

   auth.h: increase RPC_CREDCACHE_HASHBITS from 4 to 12
           (16 hashtable entries -> 4096). This is just the default.
   auth.c: allocate hashtables dyamically
           add sysctl for credcache_hashsize
   auth_generic.c: use rpcauth_init_credcache
   auth_unix.c: use rpcauth_init_credcache
   sunrpc_syms.c: add hashsize module parameter

Mike.

[-- Attachment #2: linux-2.6.31-rc8-git2-sunprc-credcache_hashsize.patch --]
[-- Type: text/x-patch, Size: 9129 bytes --]

diff -ruN linux-2.6.31-rc8-git2.orig/include/linux/sunrpc/auth.h linux-2.6.31-rc8-git2/include/linux/sunrpc/auth.h
--- linux-2.6.31-rc8-git2.orig/include/linux/sunrpc/auth.h	2009-08-28 02:59:04.000000000 +0200
+++ linux-2.6.31-rc8-git2/include/linux/sunrpc/auth.h	2009-09-03 12:29:45.000000000 +0200
@@ -60,10 +60,14 @@
 /*
  * Client authentication handle
  */
-#define RPC_CREDCACHE_HASHBITS	4
+#define RPC_CREDCACHE_HASHBITS	12
 #define RPC_CREDCACHE_NR	(1 << RPC_CREDCACHE_HASHBITS)
+#define RPC_CREDCACHE_MIN	4
+#define RPC_CREDCACHE_MAX	16384
 struct rpc_cred_cache {
-	struct hlist_head	hashtable[RPC_CREDCACHE_NR];
+	int			hashsize;
+	int			hashbits;
+	struct hlist_head	*hashtable;
 	spinlock_t		lock;
 };
 
@@ -124,9 +128,8 @@
 extern const struct rpc_authops	authunix_ops;
 extern const struct rpc_authops	authnull_ops;
 
-void __init		rpc_init_authunix(void);
-void __init		rpc_init_generic_auth(void);
-void __init		rpcauth_init_module(void);
+int __init		rpc_init_generic_auth(void);
+int __init		rpcauth_init_module(int);
 void __exit		rpcauth_remove_module(void);
 void __exit		rpc_destroy_generic_auth(void);
 
diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth.c linux-2.6.31-rc8-git2/net/sunrpc/auth.c
--- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth.c	2009-08-28 02:59:04.000000000 +0200
+++ linux-2.6.31-rc8-git2/net/sunrpc/auth.c	2009-09-03 13:59:01.000000000 +0200
@@ -14,6 +14,8 @@
 #include <linux/hash.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/spinlock.h>
+#include <linux/vmalloc.h>
+#include <linux/sysctl.h>
 
 #ifdef RPC_DEBUG
 # define RPCDBG_FACILITY	RPCDBG_AUTH
@@ -28,6 +30,7 @@
 
 static LIST_HEAD(cred_unused);
 static unsigned long number_cred_unused;
+int credcache_hashsize = RPC_CREDCACHE_NR;
 
 static u32
 pseudoflavor_to_flavor(u32 flavor) {
@@ -147,7 +150,14 @@
 	new = kmalloc(sizeof(*new), GFP_KERNEL);
 	if (!new)
 		return -ENOMEM;
-	for (i = 0; i < RPC_CREDCACHE_NR; i++)
+	new->hashsize = credcache_hashsize;
+	new->hashbits = ilog2(new->hashsize);
+	new->hashtable = vmalloc(new->hashsize * sizeof(struct hlist_head));
+	if (!new->hashtable) {
+		kfree(new);
+		return -ENOMEM;
+	}
+	for (i = 0; i < new->hashsize; i++)
 		INIT_HLIST_HEAD(&new->hashtable[i]);
 	spin_lock_init(&new->lock);
 	auth->au_credcache = new;
@@ -184,7 +194,7 @@
 
 	spin_lock(&rpc_credcache_lock);
 	spin_lock(&cache->lock);
-	for (i = 0; i < RPC_CREDCACHE_NR; i++) {
+	for (i = 0; i < cache->hashsize; i++) {
 		head = &cache->hashtable[i];
 		while (!hlist_empty(head)) {
 			cred = hlist_entry(head->first, struct rpc_cred, cr_hash);
@@ -213,6 +223,8 @@
 	if (cache) {
 		auth->au_credcache = NULL;
 		rpcauth_clear_credcache(cache);
+		if (cache->hashtable)
+			vfree(cache->hashtable);
 		kfree(cache);
 	}
 }
@@ -291,7 +303,7 @@
 			*entry, *new;
 	unsigned int nr;
 
-	nr = hash_long(acred->uid, RPC_CREDCACHE_HASHBITS);
+	nr = hash_long(acred->uid, cache->hashbits);
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
@@ -568,19 +580,87 @@
 		test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) != 0;
 }
 
+#ifdef RPC_DEBUG
+static int proc_credcache_hashsize(struct ctl_table *table, int write,
+                        struct file *file, void __user *buffer,
+                        size_t *length, loff_t *ppos)
+{
+	int tmp = credcache_hashsize;
+
+	table->data = &tmp;
+	table->maxlen = sizeof(int);
+	proc_dointvec(table, write, file, buffer, length, ppos);
+	if (write) {
+		if (tmp < RPC_CREDCACHE_MIN ||
+		    tmp > RPC_CREDCACHE_MAX ||
+		    !is_power_of_2(tmp))
+			return -EINVAL;
+		credcache_hashsize = tmp;
+	}
+	return 0;
+}
+
+static ctl_table sunrpc_credcache_knobs_table [] = {
+	{
+		.procname	= "credcache_hashsize",
+		.data		= NULL,
+		.mode		= 0644,
+		.proc_handler	= &proc_credcache_hashsize,
+	},
+	{
+		.ctl_name	= 0,
+	}
+};
+
+static ctl_table sunrpc_credcache_table[] = {
+	{
+		.ctl_name	= CTL_SUNRPC,
+		.procname	= "sunrpc",
+		.mode		= 0555,
+		.child		= sunrpc_credcache_knobs_table,
+	},
+	{
+		.ctl_name = 0,
+	}
+};
+
+static struct ctl_table_header *sunrpc_credcache_table_header;
+#endif
+
 static struct shrinker rpc_cred_shrinker = {
 	.shrink = rpcauth_cache_shrinker,
 	.seeks = DEFAULT_SEEKS,
 };
 
-void __init rpcauth_init_module(void)
+int __init rpcauth_init_module(int hashsize)
 {
-	rpc_init_authunix();
-	rpc_init_generic_auth();
+	int err;
+
+	if (hashsize) {
+		hashsize = min(hashsize, RPC_CREDCACHE_MAX);
+		hashsize = max(hashsize, RPC_CREDCACHE_MIN);
+		credcache_hashsize = rounddown_pow_of_two(hashsize);
+		printk(KERN_INFO "RPC: credcache hashtable size %d\n",
+							credcache_hashsize);
+	}
+
+	err = rpc_init_generic_auth();
+	if (err)
+		goto out;
+#ifdef RPC_DEBUG
+	sunrpc_credcache_table_header =
+		register_sysctl_table(sunrpc_credcache_table);
+#endif
 	register_shrinker(&rpc_cred_shrinker);
+out:
+	return err;
 }
 
 void __exit rpcauth_remove_module(void)
 {
+#ifdef RPC_DEBUG
+	if (sunrpc_credcache_table_header)
+		unregister_sysctl_table(sunrpc_credcache_table_header);
+#endif
 	unregister_shrinker(&rpc_cred_shrinker);
 }
diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_generic.c linux-2.6.31-rc8-git2/net/sunrpc/auth_generic.c
--- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_generic.c	2009-08-28 02:59:04.000000000 +0200
+++ linux-2.6.31-rc8-git2/net/sunrpc/auth_generic.c	2009-09-03 12:29:45.000000000 +0200
@@ -26,7 +26,6 @@
 };
 
 static struct rpc_auth generic_auth;
-static struct rpc_cred_cache generic_cred_cache;
 static const struct rpc_credops generic_credops;
 
 /*
@@ -158,20 +157,16 @@
 	return 0;
 }
 
-void __init rpc_init_generic_auth(void)
+int __init rpc_init_generic_auth(void)
 {
-	spin_lock_init(&generic_cred_cache.lock);
+	return rpcauth_init_credcache(&generic_auth);
 }
 
 void __exit rpc_destroy_generic_auth(void)
 {
-	rpcauth_clear_credcache(&generic_cred_cache);
+	rpcauth_destroy_credcache(&generic_auth);
 }
 
-static struct rpc_cred_cache generic_cred_cache = {
-	{{ NULL, },},
-};
-
 static const struct rpc_authops generic_auth_ops = {
 	.owner = THIS_MODULE,
 	.au_name = "Generic",
@@ -182,7 +177,6 @@
 static struct rpc_auth generic_auth = {
 	.au_ops = &generic_auth_ops,
 	.au_count = ATOMIC_INIT(0),
-	.au_credcache = &generic_cred_cache,
 };
 
 static const struct rpc_credops generic_credops = {
diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_unix.c linux-2.6.31-rc8-git2/net/sunrpc/auth_unix.c
--- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_unix.c	2009-08-28 02:59:04.000000000 +0200
+++ linux-2.6.31-rc8-git2/net/sunrpc/auth_unix.c	2009-09-03 12:29:45.000000000 +0200
@@ -28,15 +28,23 @@
 #endif
 
 static struct rpc_auth		unix_auth;
-static struct rpc_cred_cache	unix_cred_cache;
 static const struct rpc_credops	unix_credops;
 
 static struct rpc_auth *
 unx_create(struct rpc_clnt *clnt, rpc_authflavor_t flavor)
 {
+	int err;
+
 	dprintk("RPC:       creating UNIX authenticator for client %p\n",
 			clnt);
 	atomic_inc(&unix_auth.au_count);
+	if (!unix_auth.au_credcache) {
+		err = rpcauth_init_credcache(&unix_auth);
+		if (err) {
+			atomic_dec(&unix_auth.au_count);
+			return ERR_PTR(err);
+		}
+	}
 	return &unix_auth;
 }
 
@@ -202,11 +210,6 @@
 	return p;
 }
 
-void __init rpc_init_authunix(void)
-{
-	spin_lock_init(&unix_cred_cache.lock);
-}
-
 const struct rpc_authops authunix_ops = {
 	.owner		= THIS_MODULE,
 	.au_flavor	= RPC_AUTH_UNIX,
@@ -218,17 +221,12 @@
 };
 
 static
-struct rpc_cred_cache	unix_cred_cache = {
-};
-
-static
 struct rpc_auth		unix_auth = {
 	.au_cslack	= UNX_WRITESLACK,
 	.au_rslack	= 2,			/* assume AUTH_NULL verf */
 	.au_ops		= &authunix_ops,
 	.au_flavor	= RPC_AUTH_UNIX,
 	.au_count	= ATOMIC_INIT(0),
-	.au_credcache	= &unix_cred_cache,
 };
 
 static
diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/sunrpc_syms.c linux-2.6.31-rc8-git2/net/sunrpc/sunrpc_syms.c
--- linux-2.6.31-rc8-git2.orig/net/sunrpc/sunrpc_syms.c	2009-08-28 02:59:04.000000000 +0200
+++ linux-2.6.31-rc8-git2/net/sunrpc/sunrpc_syms.c	2009-09-03 12:29:45.000000000 +0200
@@ -23,6 +23,7 @@
 #include <linux/sunrpc/xprtsock.h>
 
 extern struct cache_detail ip_map_cache, unix_gid_cache;
+static int hashsize;
 
 static int __init
 init_sunrpc(void)
@@ -31,13 +32,14 @@
 	if (err)
 		goto out;
 	err = rpc_init_mempool();
-	if (err) {
-		unregister_rpc_pipefs();
-		goto out;
-	}
+	if (err)
+		goto out_err1;
 #ifdef RPC_DEBUG
 	rpc_register_sysctl();
 #endif
+	err = rpcauth_init_module(hashsize);
+	if (err)
+		goto out_err2;
 #ifdef CONFIG_PROC_FS
 	rpc_proc_init();
 #endif
@@ -45,7 +47,14 @@
 	cache_register(&unix_gid_cache);
 	svc_init_xprt_sock();	/* svc sock transport */
 	init_socket_xprt();	/* clnt sock transport */
-	rpcauth_init_module();
+	goto out;
+out_err2:
+	rpc_destroy_mempool();
+#ifdef RPC_DEBUG
+	rpc_unregister_sysctl();
+#endif
+out_err1:
+	unregister_rpc_pipefs();
 out:
 	return err;
 }
@@ -68,6 +77,8 @@
 #endif
 	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
+module_param(hashsize, int, 0);
+MODULE_PARM_DESC(hashsize, "size of hashtables for credential caches");
 MODULE_LICENSE("GPL");
 module_init(init_sunrpc);
 module_exit(cleanup_sunrpc);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 14:02             ` Trond Myklebust
  2009-09-03 14:08               ` Yohan
  2009-09-03 14:35               ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg
@ 2009-09-03 20:05               ` Simon Kirby
  2009-09-03 20:49                 ` Trond Myklebust
  2009-09-03 21:21                   ` Muntz, Daniel
  2 siblings, 2 replies; 24+ messages in thread
From: Simon Kirby @ 2009-09-03 20:05 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote:

> OK, so 16 hash buckets are likely to be filled with ~10^6 entries each.
> I can see that might be a performance issue...

We have a similar setup with millions of UIDs over NFS (currently NFSv3).
I _wish_ there were a way to use NFSv4 without having to use name-mapped
UIDs and GIDs, since our user and group names come from MySQL anyway, and
are guaranteed to be consistent across machines.

Why on earth does NFSv4 force the use of names?

I was considering hacking the code to stick IDs in there anyway, but I
haven't looked at the feasibility of this.  I suspect this would break or
complicate other things, but the current NFSv4 design just seems like an
incredible waste for this case.

Simon-

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 20:05               ` VM issue causing high CPU loads Simon Kirby
@ 2009-09-03 20:49                 ` Trond Myklebust
  2009-09-03 22:22                   ` Simon Kirby
  2009-09-03 21:21                   ` Muntz, Daniel
  1 sibling, 1 reply; 24+ messages in thread
From: Trond Myklebust @ 2009-09-03 20:49 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, 2009-09-03 at 13:05 -0700, Simon Kirby wrote:
> On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote:
> 
> > OK, so 16 hash buckets are likely to be filled with ~10^6 entries each.
> > I can see that might be a performance issue...
> 
> We have a similar setup with millions of UIDs over NFS (currently NFSv3).
> I _wish_ there were a way to use NFSv4 without having to use name-mapped
> UIDs and GIDs, since our user and group names come from MySQL anyway, and
> are guaranteed to be consistent across machines.

That's a separate issue.

I'm working on increasing the idmapper scalability, however another
project is currently taking up most of my time. I can't guarantee that
the revised idmapper code will be finished in time to allow for
inclusion in 2.6.32.

> Why on earth does NFSv4 force the use of names?

NFSv4 aspires to be an internet-wide protocol, and so you cannot use
uids/gids: they just aren't guaranteed to represent a unique user
outside your local LDAP/NIS or /etc/passwd domain. Furthermore, uids and
gids are a posix construct. They simply don't work in environments where
you may have lots of non-posix systems.

Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: VM issue causing high CPU loads
  2009-09-03 20:05               ` VM issue causing high CPU loads Simon Kirby
@ 2009-09-03 21:21                   ` Muntz, Daniel
  2009-09-03 21:21                   ` Muntz, Daniel
  1 sibling, 0 replies; 24+ messages in thread
From: Muntz, Daniel @ 2009-09-03 21:21 UTC (permalink / raw)
  To: Simon Kirby, Trond Myklebust
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

Amen.  I understand that v4 wants to extend across domains, etc., but it
goes out of its way to prevent the use of uids/gids, which in the vast
majority of installations would work just fine and wouldn't incur the
overhead of the mapping/unmapping operations.  There's no reason
uids/gids couldn't coexist with string names.  If the 4.0 spec had a
slightly different version of this paragraph:

   To provide a greater degree of compatibility with previous versions
   of NFS (i.e., v2 and v3), which identified users and groups by 32-bit
   unsigned uid's and gid's, owner and group strings that consist of
   decimal numeric values with no leading zeros can be given a special
   interpretation by clients and servers which choose to provide such
   support.  The receiver may treat such a user or group string as
   representing the same user as would be represented by a v2/v3 uid or
   gid having the corresponding numeric value.  A server is not
   obligated to accept such a string, but may return an NFS4ERR_BADOWNER
   instead.  To avoid this mechanism being used to subvert user and
   group translation, so that a client might pass all of the owners and
   groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
   error when there is a valid translation for the user or owner
   designated in this way.  In that case, the client must use the
   appropriate name@domain string and not the special form for
   compatibility.

i.e., take out the "subvert" portion, and just plain allow string
representations of uids/gids, then at least the conversion would just be
an atoi and itoa.  Even better, allow the uids/gids to be used directly
and avoid the atoi/itoa, perhaps with a flag.  Either case is better
than idmapd and getting EDELAY and an X-second pause in odd places
because NFS has to go to userspace for a translation.

  -Dan Quixote

> -----Original Message-----
> From: Simon Kirby [mailto:sim@hostway.ca] 
> Sent: Thursday, September 03, 2009 1:06 PM
> To: Trond Myklebust
> Cc: Yohan; Andrew Morton; linux-kernel@vger.kernel.org; 
> linux-nfs@vger.kernel.org; Neil Brown; J. Bruce Fields; 
> mikevs@xs4all.net
> Subject: Re: VM issue causing high CPU loads
> 
> On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote:
> 
> > OK, so 16 hash buckets are likely to be filled with ~10^6 
> entries each.
> > I can see that might be a performance issue...
> 
> We have a similar setup with millions of UIDs over NFS 
> (currently NFSv3).
> I _wish_ there were a way to use NFSv4 without having to use 
> name-mapped UIDs and GIDs, since our user and group names 
> come from MySQL anyway, and are guaranteed to be consistent 
> across machines.
> 
> Why on earth does NFSv4 force the use of names?
> 
> I was considering hacking the code to stick IDs in there 
> anyway, but I haven't looked at the feasibility of this.  I 
> suspect this would break or complicate other things, but the 
> current NFSv4 design just seems like an incredible waste for 
> this case.
> 
> Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-nfs" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: VM issue causing high CPU loads
@ 2009-09-03 21:21                   ` Muntz, Daniel
  0 siblings, 0 replies; 24+ messages in thread
From: Muntz, Daniel @ 2009-09-03 21:21 UTC (permalink / raw)
  To: Simon Kirby, Trond Myklebust
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

Amen.  I understand that v4 wants to extend across domains, etc., but it
goes out of its way to prevent the use of uids/gids, which in the vast
majority of installations would work just fine and wouldn't incur the
overhead of the mapping/unmapping operations.  There's no reason
uids/gids couldn't coexist with string names.  If the 4.0 spec had a
slightly different version of this paragraph:

   To provide a greater degree of compatibility with previous versions
   of NFS (i.e., v2 and v3), which identified users and groups by 32-bit
   unsigned uid's and gid's, owner and group strings that consist of
   decimal numeric values with no leading zeros can be given a special
   interpretation by clients and servers which choose to provide such
   support.  The receiver may treat such a user or group string as
   representing the same user as would be represented by a v2/v3 uid or
   gid having the corresponding numeric value.  A server is not
   obligated to accept such a string, but may return an NFS4ERR_BADOWNER
   instead.  To avoid this mechanism being used to subvert user and
   group translation, so that a client might pass all of the owners and
   groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
   error when there is a valid translation for the user or owner
   designated in this way.  In that case, the client must use the
   appropriate name@domain string and not the special form for
   compatibility.

i.e., take out the "subvert" portion, and just plain allow string
representations of uids/gids, then at least the conversion would just be
an atoi and itoa.  Even better, allow the uids/gids to be used directly
and avoid the atoi/itoa, perhaps with a flag.  Either case is better
than idmapd and getting EDELAY and an X-second pause in odd places
because NFS has to go to userspace for a translation.

  -Dan Quixote

> -----Original Message-----
> From: Simon Kirby [mailto:sim@hostway.ca] 
> Sent: Thursday, September 03, 2009 1:06 PM
> To: Trond Myklebust
> Cc: Yohan; Andrew Morton; linux-kernel@vger.kernel.org; 
> linux-nfs@vger.kernel.org; Neil Brown; J. Bruce Fields; 
> mikevs@xs4all.net
> Subject: Re: VM issue causing high CPU loads
> 
> On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote:
> 
> > OK, so 16 hash buckets are likely to be filled with ~10^6 
> entries each.
> > I can see that might be a performance issue...
> 
> We have a similar setup with millions of UIDs over NFS 
> (currently NFSv3).
> I _wish_ there were a way to use NFSv4 without having to use 
> name-mapped UIDs and GIDs, since our user and group names 
> come from MySQL anyway, and are guaranteed to be consistent 
> across machines.
> 
> Why on earth does NFSv4 force the use of names?
> 
> I was considering hacking the code to stick IDs in there 
> anyway, but I haven't looked at the feasibility of this.  I 
> suspect this would break or complicate other things, but the 
> current NFSv4 design just seems like an incredible waste for 
> this case.
> 
> Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-nfs" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 20:49                 ` Trond Myklebust
@ 2009-09-03 22:22                   ` Simon Kirby
  2009-09-04 12:31                       ` Trond Myklebust
  0 siblings, 1 reply; 24+ messages in thread
From: Simon Kirby @ 2009-09-03 22:22 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, Sep 03, 2009 at 04:49:25PM -0400, Trond Myklebust wrote:

> I'm working on increasing the idmapper scalability, however another
> project is currently taking up most of my time. I can't guarantee that
> the revised idmapper code will be finished in time to allow for
> inclusion in 2.6.32.

Sure, improving it would be nice for cases where it's needed, but in
environments where all IDs are consistent (by design), it just seems
silly to force this extra work for zero gain.

> NFSv4 aspires to be an internet-wide protocol, and so you cannot use
> uids/gids: they just aren't guaranteed to represent a unique user
> outside your local LDAP/NIS or /etc/passwd domain. Furthermore, uids and
> gids are a posix construct. They simply don't work in environments where
> you may have lots of non-posix systems.

So, for environments with all POSIX systems, what do you think about
perhaps a mount or export flag that violates the spec on purpose to allow
numeric IDs to be used?

I can understand that the quiet use of IDs if name-to-user mapping fails
will cause security issues in environments without consistent users, so
it would now be unsafe to turn this on silently.  However, making this
an option seems reasonable to me.

(Not that I know what I'm doing.)

Simon-

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
  2009-09-03 22:22                   ` Simon Kirby
@ 2009-09-04 12:31                       ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2009-09-04 12:31 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, 2009-09-03 at 15:22 -0700, Simon Kirby wrote:
> So, for environments with all POSIX systems, what do you think about
> perhaps a mount or export flag that violates the spec on purpose to allow
> numeric IDs to be used?

No! I'm not interested in starting a LinuxPrivateNFSv4 protocol on top
of everything else we've got...

Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: VM issue causing high CPU loads
@ 2009-09-04 12:31                       ` Trond Myklebust
  0 siblings, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2009-09-04 12:31 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown,
	J. Bruce Fields, mikevs

On Thu, 2009-09-03 at 15:22 -0700, Simon Kirby wrote:
> So, for environments with all POSIX systems, what do you think about
> perhaps a mount or export flag that violates the spec on purpose to allow
> numeric IDs to be used?

No! I'm not interested in starting a LinuxPrivateNFSv4 protocol on top
of everything else we've got...

Trond


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-09-04 12:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-24 14:23 VM issue causing high CPU loads Yohan
2009-08-24 23:21 ` Andrew Morton
2009-08-24 23:21   ` Andrew Morton
2009-08-26 11:08   ` Mel Gorman
2009-08-26 11:08     ` Mel Gorman
2009-08-26 11:55     ` Yohan
2009-08-26 11:53   ` Yohan
2009-08-27  8:39   ` Yohan
2009-08-27  8:39     ` Yohan
2009-08-31 20:39     ` Yohan
2009-09-03  0:06       ` Andrew Morton
2009-09-03  0:06         ` Andrew Morton
2009-09-03 13:01         ` Trond Myklebust
2009-09-03 13:39           ` Yohan
2009-09-03 14:02             ` Trond Myklebust
2009-09-03 14:08               ` Yohan
2009-09-03 14:35               ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg
2009-09-03 20:05               ` VM issue causing high CPU loads Simon Kirby
2009-09-03 20:49                 ` Trond Myklebust
2009-09-03 22:22                   ` Simon Kirby
2009-09-04 12:31                     ` Trond Myklebust
2009-09-04 12:31                       ` Trond Myklebust
2009-09-03 21:21                 ` Muntz, Daniel
2009-09-03 21:21                   ` Muntz, Daniel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.