* RE: multiple servers per automount
@ 2003-10-10 15:16 Ogden, Aaron A.
2003-10-13 3:23 ` [NFS] " Ian Kent
0 siblings, 1 reply; 21+ messages in thread
From: Ogden, Aaron A. @ 2003-10-10 15:16 UTC (permalink / raw)
To: Ian Kent, Mike Waychison; +Cc: autofs mailing list, nfs
-----Original Message-----
From: Ian Kent [mailto:raven@themaw.net]
Sent: Thursday, October 09, 2003 8:09 PM
To: Mike Waychison
Cc: Ogden, Aaron A.; autofs mailing list; nfs@lists.sourceforge.net
Subject: Re: [autofs] multiple servers per automount
>> The maximum number of plain pseudo-block device filesystems on a
given
>> filesystem is limitted to 256. (This includes proc, autofs, nfs..).
>>
>> This is because pseudo-block filesystems all use major 0, and each
have
>> a different minor (thus the 256 limit).
>>
>> There are however patches floating around (look at SuSe's kernels,
I'm
>> not sure about RH) that allow n majors to be used (default 5). This
>> gives you 1280 mounts, a big step up :)
>>
>
> But as Aaron and I know things go pear shaped at just shy of 800
mounts
> with RedHat kernels. They have the more-unnamed patch.
>
> So this would indicate that even if there is a device system that can
> increase the number of unnamed devices that subsystems like NFS cannot
> handle this many mounts.
Maybe. I'm not 100% certain though. Currently I am holding steady at
710 active mounts, I am going to write a little script to mount more in
small increments, ie. read a list of ~1000 mountpoints from /home, mount
a few of them, check the filesystems, and repeat... this way I will know
exactly where things break down.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: multiple servers per automount
2003-10-10 15:16 multiple servers per automount Ogden, Aaron A.
@ 2003-10-13 3:23 ` Ian Kent
2003-10-14 7:05 ` Joseph V Moss
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kent @ 2003-10-13 3:23 UTC (permalink / raw)
To: Ogden, Aaron A.; +Cc: autofs mailing list, nfs, Mike Waychison
On Fri, 10 Oct 2003, Ogden, Aaron A. wrote:
>
>
> > So this would indicate that even if there is a device system that can
> > increase the number of unnamed devices that subsystems like NFS cannot
> > handle this many mounts.
>
> Maybe. I'm not 100% certain though. Currently I am holding steady at
> 710 active mounts, I am going to write a little script to mount more in
> small increments, ie. read a list of ~1000 mountpoints from /home, mount
> a few of them, check the filesystems, and repeat... this way I will know
> exactly where things break down.
Interesting.
If you can edge it up then it's probably not an available port
restriction.
There may be more than one issue at work here.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: multiple servers per automount
2003-10-13 3:23 ` [NFS] " Ian Kent
@ 2003-10-14 7:05 ` Joseph V Moss
2003-10-14 13:37 ` Ian Kent
0 siblings, 1 reply; 21+ messages in thread
From: Joseph V Moss @ 2003-10-14 7:05 UTC (permalink / raw)
To: Ian Kent; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison
> On Fri, 10 Oct 2003, Ogden, Aaron A. wrote:
>
> >
> >
> > > So this would indicate that even if there is a device system that can
> > > increase the number of unnamed devices that subsystems like NFS cannot
> > > handle this many mounts.
> >
> > Maybe. I'm not 100% certain though. Currently I am holding steady at
> > 710 active mounts, I am going to write a little script to mount more in
> > small increments, ie. read a list of ~1000 mountpoints from /home, mount
> > a few of them, check the filesystems, and repeat... this way I will know
> > exactly where things break down.
>
> Interesting.
>
> If you can edge it up then it's probably not an available port
> restriction.
>
> There may be more than one issue at work here.
>
The limit is 800 as others have stated. Although, it can be less than that
if something else is already using up some of the reserved UDP ports.
I wrote a patch long ago against a 2.2.x kernel to enable it to use
multiple majors for NFS mounts (like the patches now common in several
distros). I then ran into the 800 limit in the RPC layer. After changing
the RPC layer to count up from 0, instead of down from 800, with no real
upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
I'm sure I could have done many thousand if I had had that many filesystems
around to mount. Obviously, after 1024, it wasn't using reserved ports
anymore, but it didn't seem to matter.
Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
the RPC layer is different enough between 2.2 and 2.4 that it didn't work
right off. Bumping it up to somewhere around 1024 should work, but using
non-reserved ports didn't seem to work when I made a simple attempt.
Of course, the real fix for the NFS layer is the expansion of the minor
numbers that's already occurred in 2.6 and the RPC layer problems should
be fixed by multiplexing multiple mounts on the same port.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: RE: [autofs] multiple servers per automount
2003-10-14 7:05 ` Joseph V Moss
@ 2003-10-14 13:37 ` Ian Kent
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kent @ 2003-10-14 13:37 UTC (permalink / raw)
To: Joseph V Moss; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison
On Tue, 14 Oct 2003, Joseph V Moss wrote:
> The limit is 800 as others have stated. Although, it can be less than that
> if something else is already using up some of the reserved UDP ports.
>
> I wrote a patch long ago against a 2.2.x kernel to enable it to use
> multiple majors for NFS mounts (like the patches now common in several
> distros). I then ran into the 800 limit in the RPC layer. After changing
> the RPC layer to count up from 0, instead of down from 800, with no real
> upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> I'm sure I could have done many thousand if I had had that many filesystems
> around to mount. Obviously, after 1024, it wasn't using reserved ports
> anymore, but it didn't seem to matter.
>
> Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> right off. Bumping it up to somewhere around 1024 should work, but using
> non-reserved ports didn't seem to work when I made a simple attempt.
>
> Of course, the real fix for the NFS layer is the expansion of the minor
> numbers that's already occurred in 2.6 and the RPC layer problems should
> be fixed by multiplexing multiple mounts on the same port.
>
>
I don't see that expansion in 2.6 (test6). It looks to me like the
allocation is done in set_anon_super (in fs/super.c) and that looks like
it is restricted to 256. Please correct this for me. I can't see how there
is any change to the number of unnmaed devices.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: RE: [autofs] multiple servers per automount
@ 2003-10-14 13:37 ` Ian Kent
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kent @ 2003-10-14 13:37 UTC (permalink / raw)
To: Joseph V Moss; +Cc: Ogden, Aaron A., autofs mailing list, nfs, Mike Waychison
On Tue, 14 Oct 2003, Joseph V Moss wrote:
> The limit is 800 as others have stated. Although, it can be less than that
> if something else is already using up some of the reserved UDP ports.
>
> I wrote a patch long ago against a 2.2.x kernel to enable it to use
> multiple majors for NFS mounts (like the patches now common in several
> distros). I then ran into the 800 limit in the RPC layer. After changing
> the RPC layer to count up from 0, instead of down from 800, with no real
> upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> I'm sure I could have done many thousand if I had had that many filesystems
> around to mount. Obviously, after 1024, it wasn't using reserved ports
> anymore, but it didn't seem to matter.
>
> Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> right off. Bumping it up to somewhere around 1024 should work, but using
> non-reserved ports didn't seem to work when I made a simple attempt.
>
> Of course, the real fix for the NFS layer is the expansion of the minor
> numbers that's already occurred in 2.6 and the RPC layer problems should
> be fixed by multiplexing multiple mounts on the same port.
>
>
I don't see that expansion in 2.6 (test6). It looks to me like the
allocation is done in set_anon_super (in fs/super.c) and that looks like
it is restricted to 256. Please correct this for me. I can't see how there
is any change to the number of unnmaed devices.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-14 13:37 ` Ian Kent
@ 2003-10-14 15:52 ` Mike Waychison
-1 siblings, 0 replies; 21+ messages in thread
From: Mike Waychison @ 2003-10-14 15:52 UTC (permalink / raw)
To: Ian Kent
Cc: Joseph V Moss, Ogden, Aaron A.,
autofs mailing list, nfs, Kernel Mailing List
Ian Kent wrote:
>On Tue, 14 Oct 2003, Joseph V Moss wrote:
>
>
>
>>The limit is 800 as others have stated. Although, it can be less than that
>>if something else is already using up some of the reserved UDP ports.
>>
>>I wrote a patch long ago against a 2.2.x kernel to enable it to use
>>multiple majors for NFS mounts (like the patches now common in several
>>distros). I then ran into the 800 limit in the RPC layer. After changing
>>the RPC layer to count up from 0, instead of down from 800, with no real
>>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
>>I'm sure I could have done many thousand if I had had that many filesystems
>>around to mount. Obviously, after 1024, it wasn't using reserved ports
>>anymore, but it didn't seem to matter.
>>
>>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
>>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
>>right off. Bumping it up to somewhere around 1024 should work, but using
>>non-reserved ports didn't seem to work when I made a simple attempt.
>>
>>Of course, the real fix for the NFS layer is the expansion of the minor
>>numbers that's already occurred in 2.6 and the RPC layer problems should
>>be fixed by multiplexing multiple mounts on the same port.
>>
>>
>>
>>
>
>I don't see that expansion in 2.6 (test6). It looks to me like the
>allocation is done in set_anon_super (in fs/super.c) and that looks like
>it is restricted to 256. Please correct this for me. I can't see how there
>is any change to the number of unnmaed devices.
>
>
>
Here is the quick fix for this in RH 2.1AS kernels:
http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
I don't know if anyone is working out a better scheme for
get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
PAGE_SIZE, automatically allowing for 32768 unnamed devices.
Mike Waychison
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: multiple servers per automount
@ 2003-10-14 15:52 ` Mike Waychison
0 siblings, 0 replies; 21+ messages in thread
From: Mike Waychison @ 2003-10-14 15:52 UTC (permalink / raw)
To: Ian Kent
Cc: Ogden, Aaron A.,
autofs mailing list, nfs, Kernel Mailing List, Joseph V Moss
Ian Kent wrote:
>On Tue, 14 Oct 2003, Joseph V Moss wrote:
>
>
>
>>The limit is 800 as others have stated. Although, it can be less than that
>>if something else is already using up some of the reserved UDP ports.
>>
>>I wrote a patch long ago against a 2.2.x kernel to enable it to use
>>multiple majors for NFS mounts (like the patches now common in several
>>distros). I then ran into the 800 limit in the RPC layer. After changing
>>the RPC layer to count up from 0, instead of down from 800, with no real
>>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
>>I'm sure I could have done many thousand if I had had that many filesystems
>>around to mount. Obviously, after 1024, it wasn't using reserved ports
>>anymore, but it didn't seem to matter.
>>
>>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
>>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
>>right off. Bumping it up to somewhere around 1024 should work, but using
>>non-reserved ports didn't seem to work when I made a simple attempt.
>>
>>Of course, the real fix for the NFS layer is the expansion of the minor
>>numbers that's already occurred in 2.6 and the RPC layer problems should
>>be fixed by multiplexing multiple mounts on the same port.
>>
>>
>>
>>
>
>I don't see that expansion in 2.6 (test6). It looks to me like the
>allocation is done in set_anon_super (in fs/super.c) and that looks like
>it is restricted to 256. Please correct this for me. I can't see how there
>is any change to the number of unnmaed devices.
>
>
>
Here is the quick fix for this in RH 2.1AS kernels:
http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
I don't know if anyone is working out a better scheme for
get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
PAGE_SIZE, automatically allowing for 32768 unnamed devices.
Mike Waychison
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-14 15:52 ` [NFS] " Mike Waychison
(?)
@ 2003-10-14 20:44 ` H. Peter Anvin
2003-10-14 23:12 ` Mike Waychison
-1 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2003-10-14 20:44 UTC (permalink / raw)
To: linux-kernel
Followup to: <3F8C1BB6.9010202@sun.com>
By author: Mike Waychison <Michael.Waychison@Sun.COM>
In newsgroup: linux.dev.kernel
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>
dev_t enlargement, which solves this without a bunch of auxilliary
majors, should be in 2.6.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-14 20:44 ` [NFS] RE: [autofs] " H. Peter Anvin
@ 2003-10-14 23:12 ` Mike Waychison
2003-10-15 10:28 ` Ingo Oeser
0 siblings, 1 reply; 21+ messages in thread
From: Mike Waychison @ 2003-10-14 23:12 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel, Ian Kent
[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]
H. Peter Anvin wrote:
> Followup to: <3F8C1BB6.9010202@sun.com>
> By author: Mike Waychison <Michael.Waychison@Sun.COM>
> In newsgroup: linux.dev.kernel
>
>>Here is the quick fix for this in RH 2.1AS kernels:
>>
>>http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>>
>>It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>>
>>I don't know if anyone is working out a better scheme for
>>get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
>>patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
>>PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>>
>
>
> dev_t enlargement, which solves this without a bunch of auxilliary
> majors, should be in 2.6.
>
> -hpa
The problem still remains in 2.6 that we limit the count to 256. I've
attached a quick patch that I've compiled and tested. I don't know if
there is a better way to handle dynamic assignment of minors (haven't
kept up to date in that realm), but if there is, then we should probably
use it instead.
Mike Waychison
[-- Attachment #2: max_anon.patch --]
[-- Type: text/plain, Size: 881 bytes --]
===== fs/super.c 1.108 vs edited =====
--- 1.108/fs/super.c Wed Oct 1 15:36:45 2003
+++ edited/fs/super.c Tue Oct 14 22:52:12 2003
@@ -528,14 +528,22 @@
* filesystems which don't use real block-devices. -- jrs
*/
-enum {Max_anon = 256};
-static unsigned long unnamed_dev_in_use[Max_anon/(8*sizeof(unsigned long))];
+enum {Max_anon = PAGE_SIZE * 8};
+static void *unnamed_dev_in_use = NULL;
static spinlock_t unnamed_dev_lock = SPIN_LOCK_UNLOCKED;/* protects the above */
int set_anon_super(struct super_block *s, void *data)
{
int dev;
spin_lock(&unnamed_dev_lock);
+
+ if (!unnamed_dev_in_use)
+ unnamed_dev_in_use = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!unnamed_dev_in_use) {
+ spin_unlock(&unnamed_dev_lock);
+ return -ENOMEM;
+ }
+
dev = find_first_zero_bit(unnamed_dev_in_use, Max_anon);
if (dev == Max_anon) {
spin_unlock(&unnamed_dev_lock);
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-14 23:12 ` Mike Waychison
@ 2003-10-15 10:28 ` Ingo Oeser
2003-10-15 16:16 ` Mike Waychison
2003-10-23 13:37 ` Ian Kent
0 siblings, 2 replies; 21+ messages in thread
From: Ingo Oeser @ 2003-10-15 10:28 UTC (permalink / raw)
To: Mike Waychison
Cc: linux-kernel, Ian Kent, linux-kernel, Ian Kent, linux-kernel,
Ian Kent, linux-kernel, Ian Kent
On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
> The problem still remains in 2.6 that we limit the count to 256. I've
> attached a quick patch that I've compiled and tested. I don't know if
> there is a better way to handle dynamic assignment of minors (haven't
> kept up to date in that realm), but if there is, then we should probably
> use it instead.
In your patch you allocate inside the spinlock.
I would suggest to do sth. like the following:
void *local;
if (!unamed_dev_inuse) {
local = get_zeroed_page(GFP_KERNEL);
if (!local)
return -ENOMEM;
}
spinlock(&unamed_dev_lock);
mb();
if (!unamed_dev_inuse) {
unamed_dev_inuse = local;
/* Used globally, don't free now */
local = NULL;
}
/*
Do the lookup and alloc
*/
spinunlock(&unamed_dev_lock);
/* Free page, because of race on allocation. */
if (local)
free_page(local);
Which will swap the pointers atomically and still alloc outside the
non-sleeping locking.
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-15 10:28 ` Ingo Oeser
@ 2003-10-15 16:16 ` Mike Waychison
2003-10-23 13:37 ` Ian Kent
1 sibling, 0 replies; 21+ messages in thread
From: Mike Waychison @ 2003-10-15 16:16 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Mike Waychison, linux-kernel, Ian Kent
[-- Attachment #1: Type: text/plain, Size: 717 bytes --]
Ingo Oeser wrote:
> On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
>
>>The problem still remains in 2.6 that we limit the count to 256. I've
>>attached a quick patch that I've compiled and tested. I don't know if
>>there is a better way to handle dynamic assignment of minors (haven't
>>kept up to date in that realm), but if there is, then we should probably
>> use it instead.
>
>
>
> In your patch you allocate inside the spinlock.
>
> I would suggest to do sth. like the following:
>
Better yet.. we could move it into an __init section that will panic if
the allocation fails (this should be the desired behaviour..). This way
we don't even have to grab the lock either.
Mike Waychison
[-- Attachment #2: max_anon_2.patch --]
[-- Type: text/plain, Size: 1592 bytes --]
===== fs/namespace.c 1.49 vs edited =====
--- 1.49/fs/namespace.c Thu Jul 17 22:30:49 2003
+++ edited/fs/namespace.c Wed Oct 15 15:59:11 2003
@@ -23,6 +23,7 @@
#include <linux/mount.h>
#include <asm/uaccess.h>
+extern void __init super_init(void);
extern int __init init_rootfs(void);
extern int __init sysfs_init(void);
@@ -1154,6 +1155,7 @@
d++;
i--;
} while (i);
+ super_init();
sysfs_init();
init_rootfs();
init_mount_tree();
===== fs/super.c 1.108 vs edited =====
--- 1.108/fs/super.c Wed Oct 1 15:36:45 2003
+++ edited/fs/super.c Wed Oct 15 15:59:50 2003
@@ -24,6 +24,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/smp_lock.h>
+#include <linux/init.h>
#include <linux/acct.h>
#include <linux/blkdev.h>
#include <linux/quotaops.h>
@@ -527,15 +528,22 @@
* Unnamed block devices are dummy devices used by virtual
* filesystems which don't use real block-devices. -- jrs
*/
-
-enum {Max_anon = 256};
-static unsigned long unnamed_dev_in_use[Max_anon/(8*sizeof(unsigned long))];
+enum {Max_anon = PAGE_SIZE * 8};
+static void *unnamed_dev_in_use;
static spinlock_t unnamed_dev_lock = SPIN_LOCK_UNLOCKED;/* protects the above */
+void __init super_init(void)
+{
+ unnamed_dev_in_use = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!unnamed_dev_in_use)
+ panic("Could not allocate anonymous device map");
+}
+
int set_anon_super(struct super_block *s, void *data)
{
int dev;
spin_lock(&unnamed_dev_lock);
+
dev = find_first_zero_bit(unnamed_dev_in_use, Max_anon);
if (dev == Max_anon) {
spin_unlock(&unnamed_dev_lock);
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-15 10:28 ` Ingo Oeser
2003-10-15 16:16 ` Mike Waychison
@ 2003-10-23 13:37 ` Ian Kent
2003-10-23 17:00 ` Mike Waychison
1 sibling, 1 reply; 21+ messages in thread
From: Ian Kent @ 2003-10-23 13:37 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Mike Waychison, Kernel Mailing List
Please forgive my ignorance Ingo but ...
I suffer from race condition blindness. A terible afflicition when one is
trying to understand the sublties of the kernel, but I'm trying.
While I am not questioning your suggestion, I have thought about the code
and fail to see the race you point out. Please help me along.
On Wed, 15 Oct 2003, Ingo Oeser wrote:
> On Wednesday 15 October 2003 01:12, Mike Waychison wrote:
> > The problem still remains in 2.6 that we limit the count to 256. I've
> > attached a quick patch that I've compiled and tested. I don't know if
> > there is a better way to handle dynamic assignment of minors (haven't
> > kept up to date in that realm), but if there is, then we should probably
> > use it instead.
>
>
> In your patch you allocate inside the spinlock.
Do you mean we don't want to sleep under the spin lock?
Would a GFP_ATOMIC make a difference to the analysis?
>
> I would suggest to do sth. like the following:
>
> void *local;
> if (!unamed_dev_inuse) {
> local = get_zeroed_page(GFP_KERNEL);
>
> if (!local)
> return -ENOMEM;
> }
>
> spinlock(&unamed_dev_lock);
> mb();
> if (!unamed_dev_inuse) {
> unamed_dev_inuse = local;
>
> /* Used globally, don't free now */
> local = NULL;
> }
>
> /*
> Do the lookup and alloc
> */
>
> spinunlock(&unamed_dev_lock);
>
> /* Free page, because of race on allocation. */
> if (local)
> free_page(local);
>
>
> Which will swap the pointers atomically and still alloc outside the
> non-sleeping locking.
As I said please give me a hint about your thinking here.
And the use of a memory barrier as well ... umm?
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-23 13:37 ` Ian Kent
@ 2003-10-23 17:00 ` Mike Waychison
2003-10-23 17:09 ` Tim Hockin
2003-10-24 0:47 ` Ian Kent
0 siblings, 2 replies; 21+ messages in thread
From: Mike Waychison @ 2003-10-23 17:00 UTC (permalink / raw)
To: Ian Kent; +Cc: Ingo Oeser, Kernel Mailing List
Ian Kent wrote:
>On Wed, 15 Oct 2003, Ingo Oeser wrote:
>
>
>>In your patch you allocate inside the spinlock.
>>
>>
>
>Do you mean we don't want to sleep under the spin lock?
>Would a GFP_ATOMIC make a difference to the analysis?
>
>
Yes, sleeping within a spinlock is bad practice because it may
eventually deadlock. Pretend that the lock is taken, the call to
kmalloc is made, the mm system doesn't have any immidiately free memory
and through some flow of execution requires that a some pseudo-block
device backed filesystem needs to be mounted -> deadlock. I have no
idea if this is currently a likely scenario, however not sleeping within
a lock is 'The Right Thing' and should be avoided at all costs.
GFP_ATOMIC should be avoided in most circumstances, particularly in
environments where the code can be refactored to allow for the sleep.
It is less likely to find free memory atomically and is thus more likely
to fail.
>>I would suggest to do sth. like the following:
>>
>>void *local;
>>if (!unamed_dev_inuse) {
>> local = get_zeroed_page(GFP_KERNEL);
>>
>> if (!local)
>> return -ENOMEM;
>>}
>>
>>spinlock(&unamed_dev_lock);
>>mb();
>>if (!unamed_dev_inuse) {
>> unamed_dev_inuse = local;
>>
>> /* Used globally, don't free now */
>> local = NULL;
>>}
>>
>>/*
>> Do the lookup and alloc
>> */
>>
>>spinunlock(&unamed_dev_lock);
>>
>>/* Free page, because of race on allocation. */
>>if (local)
>> free_page(local);
>>
>>
>>Which will swap the pointers atomically and still alloc outside the
>>non-sleeping locking.
>>
>>
>
>As I said please give me a hint about your thinking here.
>And the use of a memory barrier as well ... umm?
>
>
>
Ingo's patch simply moved the allocation outside the spinlock.. See my
later patch about moving the allocation to and __init section, which is
probably the cleaner thing to do and doesn't require grabbing the page
and using it conditionally.
As for the mb(), I *thought* that a spinlock implied a memory barrier,
however I think he put it there because it solves the age-old badness of
double-checked locking (search google for good explanations of the badness).
--
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-23 17:00 ` Mike Waychison
@ 2003-10-23 17:09 ` Tim Hockin
2003-10-24 0:47 ` Ian Kent
1 sibling, 0 replies; 21+ messages in thread
From: Tim Hockin @ 2003-10-23 17:09 UTC (permalink / raw)
To: Mike Waychison; +Cc: Ian Kent, Ingo Oeser, Kernel Mailing List
On Thu, Oct 23, 2003 at 01:00:57PM -0400, Mike Waychison wrote:
> >Would a GFP_ATOMIC make a difference to the analysis?
> Yes, sleeping within a spinlock is bad practice because it may
> eventually deadlock. Pretend that the lock is taken, the call to
> kmalloc is made, the mm system doesn't have any immidiately free memory
> and through some flow of execution requires that a some pseudo-block
> device backed filesystem needs to be mounted -> deadlock. I have no
> idea if this is currently a likely scenario, however not sleeping within
> a lock is 'The Right Thing' and should be avoided at all costs.
it's worse than that. It's forbidden. It's a VERY likely deadlock scenario
in the general sense, even if this particular case is not. If you need to
lock something and you need to sleep holding that lock, use a semaphore.
--
Notice that as computers are becoming easier and easier to use,
suddenly there's a big market for "Dummies" books. Cause and effect,
or merely an ironic juxtaposition of unrelated facts?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-23 17:00 ` Mike Waychison
2003-10-23 17:09 ` Tim Hockin
@ 2003-10-24 0:47 ` Ian Kent
2003-10-24 1:42 ` Tim Hockin
1 sibling, 1 reply; 21+ messages in thread
From: Ian Kent @ 2003-10-24 0:47 UTC (permalink / raw)
To: Mike Waychison; +Cc: Ingo Oeser, Kernel Mailing List
Thanks for the description.
I thought it was bad to call a function that could block while
holding a lock. At least I was close to right this time.
I wasn't aware of the badness I'll see what I can find.
On Thu, 23 Oct 2003, Mike Waychison wrote:
>
> Ingo's patch simply moved the allocation outside the spinlock.. See my
> later patch about moving the allocation to and __init section, which is
> probably the cleaner thing to do and doesn't require grabbing the page
> and using it conditionally.
>
Missed that when I returned to it. Found it now.
That is clearly a better way to do it.
I there any chance this would be accepted into 2.6.0?
I think it's quite important, hopefully others do as well.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-24 0:47 ` Ian Kent
@ 2003-10-24 1:42 ` Tim Hockin
0 siblings, 0 replies; 21+ messages in thread
From: Tim Hockin @ 2003-10-24 1:42 UTC (permalink / raw)
To: Ian Kent; +Cc: Mike Waychison, Ingo Oeser, Kernel Mailing List, torvalds
Recap: Mike Waychison posted a simple patch to make Max_anon bit array
(NFS mounts etc.) use exactly one page.
On Fri, Oct 24, 2003 at 08:47:57AM +0800, Ian Kent wrote:
> I there any chance this would be accepted into 2.6.0?
>
> I think it's quite important, hopefully others do as well.
Wouldn't it be saner to have a sysctl to adjust that? From 1 page to
2^20/(PAGE_SIZE * CHAR_BIT) pages? Perhaps just in page-sized increments?
This would be a simple patch... But maybe it's not 'stabilization' for
2.6.0.
Maybe the simple version in 2.6.0 and the right version in 2.6.1?
Linus?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [NFS] RE: [autofs] multiple servers per automount
2003-10-14 15:52 ` [NFS] " Mike Waychison
(?)
@ 2003-10-15 7:22 ` Ian Kent
-1 siblings, 0 replies; 21+ messages in thread
From: Ian Kent @ 2003-10-15 7:22 UTC (permalink / raw)
To: Mike Waychison
Cc: Joseph V Moss, Ogden, Aaron A.,
autofs mailing list, nfs, Kernel Mailing List
On Tue, 14 Oct 2003, Mike Waychison wrote:
> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated. Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros). I then ran into the 800 limit in the RPC layer. After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount. Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off. Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>
OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.
Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?
Comments from anyone about where to check and what to watch out for are
welcome.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: RE: [autofs] multiple servers per automount
@ 2003-10-15 7:22 ` Ian Kent
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kent @ 2003-10-15 7:22 UTC (permalink / raw)
To: Mike Waychison
Cc: Joseph V Moss, Ogden, Aaron A.,
autofs mailing list, nfs, Kernel Mailing List
On Tue, 14 Oct 2003, Mike Waychison wrote:
> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated. Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros). I then ran into the 800 limit in the RPC layer. After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount. Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off. Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>
OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.
Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?
Comments from anyone about where to check and what to watch out for are
welcome.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: RE: [autofs] multiple servers per automount
@ 2003-10-15 7:22 ` Ian Kent
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kent @ 2003-10-15 7:22 UTC (permalink / raw)
To: Mike Waychison
Cc: Joseph V Moss, Ogden, Aaron A.,
autofs mailing list, nfs, Kernel Mailing List
On Tue, 14 Oct 2003, Mike Waychison wrote:
> Ian Kent wrote:
>
> >On Tue, 14 Oct 2003, Joseph V Moss wrote:
> >
> >
> >
> >>The limit is 800 as others have stated. Although, it can be less than that
> >>if something else is already using up some of the reserved UDP ports.
> >>
> >>I wrote a patch long ago against a 2.2.x kernel to enable it to use
> >>multiple majors for NFS mounts (like the patches now common in several
> >>distros). I then ran into the 800 limit in the RPC layer. After changing
> >>the RPC layer to count up from 0, instead of down from 800, with no real
> >>upper limit, I was able to mount more than 2000 NFS filesystems simultaneously.
> >>I'm sure I could have done many thousand if I had had that many filesystems
> >>around to mount. Obviously, after 1024, it wasn't using reserved ports
> >>anymore, but it didn't seem to matter.
> >>
> >>Unfortunately, while the changes to NFS were easy to port to the 2.4 kernel,
> >>the RPC layer is different enough between 2.2 and 2.4 that it didn't work
> >>right off. Bumping it up to somewhere around 1024 should work, but using
> >>non-reserved ports didn't seem to work when I made a simple attempt.
> >>
> >>Of course, the real fix for the NFS layer is the expansion of the minor
> >>numbers that's already occurred in 2.6 and the RPC layer problems should
> >>be fixed by multiplexing multiple mounts on the same port.
> >>
> >>
> >>
> >>
> >
> >I don't see that expansion in 2.6 (test6). It looks to me like the
> >allocation is done in set_anon_super (in fs/super.c) and that looks like
> >it is restricted to 256. Please correct this for me. I can't see how there
> >is any change to the number of unnmaed devices.
> >
> >
> >
>
> Here is the quick fix for this in RH 2.1AS kernels:
>
> http://www.kernelnewbies.org/kernels/rh21as/SOURCES/linux-2.4.9-moreunnamed.patch
>
> It makes unnamed block devices use majors 12, 14, 38, 39, as well as 0.
>
> I don't know if anyone is working out a better scheme for
> get_unnamed_dev in 2.6 yet. It does need to be done though. A simple
> patch for 2.6 would maybe see the unnamed_dev_in_use bitmap grow to
> PAGE_SIZE, automatically allowing for 32768 unnamed devices.
>
OK. Sounds like a good job for me to do (simple - maybe).
I'll spend a while looking for possible side effects.
Do you think that the possible NFS port allocation problems should hold up
this work or should it drive updates to NFS?
Comments from anyone about where to check and what to watch out for are
welcome.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [NFS] RE: [autofs] multiple servers per automount
@ 2003-10-15 14:31 ` Lever, Charles
0 siblings, 0 replies; 21+ messages in thread
From: Lever, Charles @ 2003-10-15 14:31 UTC (permalink / raw)
To: Ian Kent
Cc: Joseph V Moss, Ogden, Aaron A.,
Mike Waychison, autofs mailing list, nfs, Kernel Mailing List
Ian Kent said:
> Do you think that the possible NFS port allocation problems
> should hold up this work or should it drive updates to NFS?
hi ian-
the port stuff has to be addressed at some point, but i don't
think you should wait for it, because it is behind a long queue
of other RPC work (like Kerberos for Linux NFS) that has a
higher priority. also, there are other patches that partially
address this limitation, and certainly those will be used by
the desparate few who need it now. :^)
IMHO.
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [NFS] RE: [autofs] multiple servers per automount
@ 2003-10-15 14:31 ` Lever, Charles
0 siblings, 0 replies; 21+ messages in thread
From: Lever, Charles @ 2003-10-15 14:31 UTC (permalink / raw)
To: Ian Kent
Cc: Joseph V Moss, Ogden, Aaron A.,
Mike Waychison, autofs mailing list, nfs, Kernel Mailing List
Ian Kent said:
> Do you think that the possible NFS port allocation problems
> should hold up this work or should it drive updates to NFS?
hi ian-
the port stuff has to be addressed at some point, but i don't
think you should wait for it, because it is behind a long queue
of other RPC work (like Kerberos for Linux NFS) that has a
higher priority. also, there are other patches that partially
address this limitation, and certainly those will be used by
the desparate few who need it now. :^)
IMHO.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-10-24 1:52 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-10 15:16 multiple servers per automount Ogden, Aaron A.
2003-10-13 3:23 ` [NFS] " Ian Kent
2003-10-14 7:05 ` Joseph V Moss
2003-10-14 13:37 ` RE: [autofs] " Ian Kent
2003-10-14 13:37 ` Ian Kent
2003-10-14 15:52 ` [NFS] " Mike Waychison
2003-10-14 15:52 ` [NFS] " Mike Waychison
2003-10-14 20:44 ` [NFS] RE: [autofs] " H. Peter Anvin
2003-10-14 23:12 ` Mike Waychison
2003-10-15 10:28 ` Ingo Oeser
2003-10-15 16:16 ` Mike Waychison
2003-10-23 13:37 ` Ian Kent
2003-10-23 17:00 ` Mike Waychison
2003-10-23 17:09 ` Tim Hockin
2003-10-24 0:47 ` Ian Kent
2003-10-24 1:42 ` Tim Hockin
2003-10-15 7:22 ` Ian Kent
2003-10-15 7:22 ` Ian Kent
2003-10-15 7:22 ` Ian Kent
2003-10-15 14:31 [NFS] " Lever, Charles
2003-10-15 14:31 ` Lever, Charles
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.