All of lore.kernel.org
 help / color / mirror / Atom feed
* "simultaneous" mounts causing weird behavior
@ 2003-11-04 14:42 Matthew Mitchell
  2003-11-04 17:02 ` H. Peter Anvin
  2003-11-05  0:46 ` Ian Kent
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Mitchell @ 2003-11-04 14:42 UTC (permalink / raw)
  To: autofs

Hello,

On some SMP processing nodes we have in our cluster we are noticing the
following odd behavior.  It seems like there might be a race condition
somewhere in automount that results in the same (in this case NFS)
device mounted twice on the same mountpoint.

In our case we have a (closed-source, vendor provided) data processing
app that runs 2-4 processes at a time on each of these nodes.  The
processes communicate via MPI.  What ends up happening is that each of
them tries to read data from these NFS-mounted volumes at exactly the
same time, and sometimes (about one node out of every 10) we get unlucky
and the disk gets double-mounted.

Here is the entry from the messages file where the disks are getting
mounted:
Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
/etvf/data0
Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
/etvf/data0

(Yes, there are two of them.)

/proc/mounts looks as follows:

rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
usbdevfs /proc/bus/usb usbdevfs rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
automount(pid626) /etvp autofs rw 0 0
automount(pid674) /etvf autofs rw 0 0
automount(pid695) /nova autofs rw 0 0
automount(pid589) /home autofs rw 0 0
automount(pid601) /etve autofs rw 0 0
automount(pid649) /etvo autofs rw 0 0
odin:/export/users /home/users nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0
pecan:/etvp/data8 /etvp/data8 nfs
rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=pecan 0 0
fenris:/etvf/data0 /etvf/data0 nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
fenris:/etvf/data0 /etvf/data0 nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
odin:/export/prog /home/prog nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0

The mount in question is "fenris:/etvf/data0".  (We have an automount
process running for each of our big disk servers.  Each has a different,
NIS provided map of disks to serve.)

Something odd, possibly related: when you use 'df', you get a strange
message:
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda3             74754492   1524216  69432912   3% /
/dev/hda1               101089      6976     88894   8% /boot
none                   2069232         0   2069232   0% /dev/shm
df: `/tmp/autofs-bind-3fa390d2-259/dir2': No such file or directory
odin:/export/users    44038844  39681000   4357844  91% /home/users
pecan:/etvp/data8    872779558 682596377 181455386  79% /etvp/data8
fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
odin:/export/prog     31456316  18282432  13173884  59% /home/prog

This is automount 3.1.7 as provided in Red Hat 8.0.  We are running a
2.4.20 kernel patched with Trond Myklebust's NFS client patches and
support for Broadcom's gigabit ethernet cards.

Any help or suggestions appreciated.  If the problem is fixed in autofs4
client tools I'll be happy to try them and report back.  Since this is a
cluster, though, I'm reluctant to commit to upgrading all of the
machines without some idea if it'll make a difference.

Oh -- the reason that we care!  Based on anecdotal evidence, nodes that
do this double-mount run their processing jobs much slower than those
that don't.  I suspect the reason for that is some negative effect on
caching due to the duplicated mount.  In any event, though, it does seem
like a bug.

---
Matthew Mitchell
Systems Programmer/Administrator
Geophysical Development Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 14:42 "simultaneous" mounts causing weird behavior Matthew Mitchell
@ 2003-11-04 17:02 ` H. Peter Anvin
  2003-11-04 20:56   ` Matthew Mitchell
  2003-11-05  0:46 ` Ian Kent
  1 sibling, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2003-11-04 17:02 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: autofs

Matthew Mitchell wrote:
> Hello,
> 
> On some SMP processing nodes we have in our cluster we are noticing the
> following odd behavior.  It seems like there might be a race condition
> somewhere in automount that results in the same (in this case NFS)
> device mounted twice on the same mountpoint.
> 
> In our case we have a (closed-source, vendor provided) data processing
> app that runs 2-4 processes at a time on each of these nodes.  The
> processes communicate via MPI.  What ends up happening is that each of
> them tries to read data from these NFS-mounted volumes at exactly the
> same time, and sometimes (about one node out of every 10) we get unlucky
> and the disk gets double-mounted.
> 
> Here is the entry from the messages file where the disks are getting
> mounted:
> Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> /etvf/data0
> Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> /etvf/data0
> 
> (Yes, there are two of them.)
> 

This happens because mount silently changed behaviour -- autofs relies 
on mount only allowing one thing to be mounted on each mount point, but 
that was suddenly changed without warning.

	-hpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 17:02 ` H. Peter Anvin
@ 2003-11-04 20:56   ` Matthew Mitchell
  2003-11-04 21:03     ` H. Peter Anvin
  2003-11-05  0:49     ` Ian Kent
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Mitchell @ 2003-11-04 20:56 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: autofs

On Tue, 2003-11-04 at 11:02, H. Peter Anvin wrote:
> Matthew Mitchell wrote:
> > Hello,
> > 
> > On some SMP processing nodes we have in our cluster we are noticing the
> > following odd behavior.  It seems like there might be a race condition
> > somewhere in automount that results in the same (in this case NFS)
> > device mounted twice on the same mountpoint.
> > 
> > In our case we have a (closed-source, vendor provided) data processing
> > app that runs 2-4 processes at a time on each of these nodes.  The
> > processes communicate via MPI.  What ends up happening is that each of
> > them tries to read data from these NFS-mounted volumes at exactly the
> > same time, and sometimes (about one node out of every 10) we get unlucky
> > and the disk gets double-mounted.
> > 
> > Here is the entry from the messages file where the disks are getting
> > mounted:
> > Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> > /etvf/data0
> > Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> > /etvf/data0
> > 
> > (Yes, there are two of them.)
> > 
> 
> This happens because mount silently changed behaviour -- autofs relies 
> on mount only allowing one thing to be mounted on each mount point, but 
> that was suddenly changed without warning.

Yes, I know that mount can do that now.  But automount is spawning the
second mount before the first one completes, no?  Still seems like there
should be some sort of mutual exclusion around the lookup_mount call.  I
realize our workload is kind of funny in that the processes effectively
_try_ to synchronize their automount request.  

Has any of this changed in the 4.0-pre autofs?  I haven't read through
its source yet.

-m

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 20:56   ` Matthew Mitchell
@ 2003-11-04 21:03     ` H. Peter Anvin
  2003-11-05  0:51       ` Ian Kent
  2003-11-05  0:49     ` Ian Kent
  1 sibling, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2003-11-04 21:03 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: autofs

Matthew Mitchell wrote:
> 
> Yes, I know that mount can do that now.  But automount is spawning the
> second mount before the first one completes, no?  Still seems like there
> should be some sort of mutual exclusion around the lookup_mount call.  I
> realize our workload is kind of funny in that the processes effectively
> _try_ to synchronize their automount request.  
> 

autofs 3 relied on the mutural exclusion inherent in mount(2).  When
this was changed, it introduced a race condition in autofs 3.  I have
asked to add a mount flag to override this new behaviour; I don't know
if one was ever introduced or added to mount(8).

> Has any of this changed in the 4.0-pre autofs?  I haven't read through
> its source yet.

autofs 4 uses a different mutural exclusion strategy, so I would *think*
it would not be affected.

	-hpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 14:42 "simultaneous" mounts causing weird behavior Matthew Mitchell
  2003-11-04 17:02 ` H. Peter Anvin
@ 2003-11-05  0:46 ` Ian Kent
  2003-11-05 16:57   ` Matthew Mitchell
  1 sibling, 1 reply; 12+ messages in thread
From: Ian Kent @ 2003-11-05  0:46 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: autofs

On Tue, 4 Nov 2003, Matthew Mitchell wrote:

> Hello,
>
> On some SMP processing nodes we have in our cluster we are noticing the
> following odd behavior.  It seems like there might be a race condition
> somewhere in automount that results in the same (in this case NFS)
> device mounted twice on the same mountpoint.
>
> In our case we have a (closed-source, vendor provided) data processing
> app that runs 2-4 processes at a time on each of these nodes.  The
> processes communicate via MPI.  What ends up happening is that each of
> them tries to read data from these NFS-mounted volumes at exactly the
> same time, and sometimes (about one node out of every 10) we get unlucky
> and the disk gets double-mounted.
>
> Here is the entry from the messages file where the disks are getting
> mounted:
> Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> /etvf/data0
> Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> /etvf/data0
>
> (Yes, there are two of them.)
>
> /proc/mounts looks as follows:
>
> rootfs / rootfs rw 0 0
> /dev/root / ext3 rw 0 0
> /proc /proc proc rw 0 0
> usbdevfs /proc/bus/usb usbdevfs rw 0 0
> /dev/hda1 /boot ext3 rw 0 0
> none /dev/pts devpts rw 0 0
> none /dev/shm tmpfs rw 0 0
> automount(pid626) /etvp autofs rw 0 0
> automount(pid674) /etvf autofs rw 0 0
> automount(pid695) /nova autofs rw 0 0
> automount(pid589) /home autofs rw 0 0
> automount(pid601) /etve autofs rw 0 0
> automount(pid649) /etvo autofs rw 0 0
> odin:/export/users /home/users nfs
> rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0
> pecan:/etvp/data8 /etvp/data8 nfs
> rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=pecan 0 0
> fenris:/etvf/data0 /etvf/data0 nfs
> rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
> fenris:/etvf/data0 /etvf/data0 nfs
> rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
> odin:/export/prog /home/prog nfs
> rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0
>
> The mount in question is "fenris:/etvf/data0".  (We have an automount
> process running for each of our big disk servers.  Each has a different,
> NIS provided map of disks to serve.)
>
> Something odd, possibly related: when you use 'df', you get a strange
> message:
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/hda3             74754492   1524216  69432912   3% /
> /dev/hda1               101089      6976     88894   8% /boot
> none                   2069232         0   2069232   0% /dev/shm
> df: `/tmp/autofs-bind-3fa390d2-259/dir2': No such file or directory
> odin:/export/users    44038844  39681000   4357844  91% /home/users
> pecan:/etvp/data8    872779558 682596377 181455386  79% /etvp/data8
> fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
> fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
> odin:/export/prog     31456316  18282432  13173884  59% /home/prog

The /tmp entry is caused by mount failing to handle overlapping requests.
Aaron Ogden and I have been there recently with autofs v4.

The overlapping mount problem is likely causing the other problem as well.
I put some altogether ugly code, which shouldn't work at all, but seems
to, into autofs v4 to deal with this. In fact I hated it so much, I
removed it at one point and Aaron was horrified to find everything broken
again.

Also, since the bind mount was only a test I added the -n flag to it to
get rid of the /tmp mount entries. Maybe Peter would like to try something
like that in autofs v3.

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 20:56   ` Matthew Mitchell
  2003-11-04 21:03     ` H. Peter Anvin
@ 2003-11-05  0:49     ` Ian Kent
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Kent @ 2003-11-05  0:49 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: autofs, H. Peter Anvin

On Tue, 4 Nov 2003, Matthew Mitchell wrote:

> On Tue, 2003-11-04 at 11:02, H. Peter Anvin wrote:
> > Matthew Mitchell wrote:
> > > Hello,
> > >
> > > On some SMP processing nodes we have in our cluster we are noticing the
> > > following odd behavior.  It seems like there might be a race condition
> > > somewhere in automount that results in the same (in this case NFS)
> > > device mounted twice on the same mountpoint.
> > >
> > > In our case we have a (closed-source, vendor provided) data processing
> > > app that runs 2-4 processes at a time on each of these nodes.  The
> > > processes communicate via MPI.  What ends up happening is that each of
> > > them tries to read data from these NFS-mounted volumes at exactly the
> > > same time, and sometimes (about one node out of every 10) we get unlucky
> > > and the disk gets double-mounted.
> > >
> > > Here is the entry from the messages file where the disks are getting
> > > mounted:
> > > Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> > > /etvf/data0
> > > Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
> > > /etvf/data0
> > >
> > > (Yes, there are two of them.)
> > >
> >
> > This happens because mount silently changed behaviour -- autofs relies
> > on mount only allowing one thing to be mounted on each mount point, but
> > that was suddenly changed without warning.
>
> Yes, I know that mount can do that now.  But automount is spawning the
> second mount before the first one completes, no?  Still seems like there
> should be some sort of mutual exclusion around the lookup_mount call.  I
> realize our workload is kind of funny in that the processes effectively
> _try_ to synchronize their automount request.
>
> Has any of this changed in the 4.0-pre autofs?  I haven't read through
> its source yet.

Yep. If you are willing, try the latest autofs-4.1.0-beta.


-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-04 21:03     ` H. Peter Anvin
@ 2003-11-05  0:51       ` Ian Kent
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Kent @ 2003-11-05  0:51 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: autofs, Matthew Mitchell

On Tue, 4 Nov 2003, H. Peter Anvin wrote:

> Matthew Mitchell wrote:
> >
> > Yes, I know that mount can do that now.  But automount is spawning the
> > second mount before the first one completes, no?  Still seems like there
> > should be some sort of mutual exclusion around the lookup_mount call.  I
> > realize our workload is kind of funny in that the processes effectively
> > _try_ to synchronize their automount request.
> >
>
> autofs 3 relied on the mutural exclusion inherent in mount(2).  When
> this was changed, it introduced a race condition in autofs 3.  I have
> asked to add a mount flag to override this new behaviour; I don't know
> if one was ever introduced or added to mount(8).
>
> > Has any of this changed in the 4.0-pre autofs?  I haven't read through
> > its source yet.
>
> autofs 4 uses a different mutural exclusion strategy, so I would *think*
> it would not be affected.

autofs v4 was also affected. Overlapping requests caused some odd
problems.


-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-05  0:46 ` Ian Kent
@ 2003-11-05 16:57   ` Matthew Mitchell
  2003-11-06  3:53     ` Ian Kent
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Mitchell @ 2003-11-05 16:57 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

On Tue, 2003-11-04 at 18:46, Ian Kent wrote:

> The /tmp entry is caused by mount failing to handle overlapping requests.
> Aaron Ogden and I have been there recently with autofs v4.
> 
> The overlapping mount problem is likely causing the other problem as well.
> I put some altogether ugly code, which shouldn't work at all, but seems
> to, into autofs v4 to deal with this. In fact I hated it so much, I
> removed it at one point and Aaron was horrified to find everything broken
> again.

Looking at the 4.1.0-beta2 code, I don't see any mutex-looking code in
handle_packet_missing.  Isn't that where it should be?  Or are you
taking care of it elsewhere?  (Or are you allowing the bind mounts to go
through in the odd case where you get past the lstat() test while still
waiting on the lookup_mount to finish?)

My memory of signal gymnastics is fuzzy, but would this work?

(pseudocode)

if (lstat tests fail) {
  if (mount is pending) {
    /* someone is already trying to mount this path */
    spin && retry from start;
  } else {
    set mount is pending;
    f = fork();
    if (f == -1) { couldn't fork, error case; }
    if (!f) {
	/* child */
	close fds;
	attempt mount;
	if (mount succeeds) { exit 0 };
	else if (mount fails) { exit 1 };
    }
    else {
      /* parent */
      success = wait(f);
      unset mount is pending;
    }
} else {
  /* already there */
}

I just am not seeing how you can properly guarantee that you don't end
up with an unnecessary bind mount without waiting to see if the mount
succeeds or not.  I do realize you might not want to wait forever if
there is a problem mounting the device...

Am I missing something?

-m

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-05 16:57   ` Matthew Mitchell
@ 2003-11-06  3:53     ` Ian Kent
  2003-11-10 16:57       ` Matthew Mitchell
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Kent @ 2003-11-06  3:53 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: autofs

On Wed, 5 Nov 2003, Matthew Mitchell wrote:

> On Tue, 2003-11-04 at 18:46, Ian Kent wrote:
>
> > The /tmp entry is caused by mount failing to handle overlapping requests.
> > Aaron Ogden and I have been there recently with autofs v4.
> >
> > The overlapping mount problem is likely causing the other problem as well.
> > I put some altogether ugly code, which shouldn't work at all, but seems
> > to, into autofs v4 to deal with this. In fact I hated it so much, I
> > removed it at one point and Aaron was horrified to find everything broken
> > again.
>
> Looking at the 4.1.0-beta2 code, I don't see any mutex-looking code in
> handle_packet_missing.  Isn't that where it should be?  Or are you
> taking care of it elsewhere?  (Or are you allowing the bind mounts to go
> through in the odd case where you get past the lstat() test while still
> waiting on the lookup_mount to finish?)

Huh?.

You know of a user space semaphore implementation other than SysV IPC.
Tell me and I'll use it.

There is no MUTEX type code and the code that is there is not around the
lookup.

First I believe one of the problems is caused by contention for the mtab
file within mount. This can cause mount to return a fail even though the
mount has happened. Similar things happen at umount time when there are
many master map entries. Hence using a -n switch on the bind mount test
helps when there are many entries in the master map. This problem can also
occur when mount requests occur in rapid succession. The other possiblity
is that the kernel module incorrectly fires multiple mounts requests. If
this is the case (as it was at one point) then that needs to be fixed in
the kernel module not the daemon.

I can't remember whether you gave details of your senario.

All I do is use a lock file around mount calls and a timed delay once
the lock file is aquired. I'm not giving any guarentee that this will
always work or that it will even work at all. However it does seem to work
better than I expected. It is a temporary work around until a better
solution is implemented.

Perhaps you can help here with a patch. But first try it and see if it
works. If it aint broke then I don't have the time to fix it.

You should be using the latest autofs4 kernel module with this. It is
available form the same place you got the beta autofs-4.


-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "simultaneous" mounts causing weird behavior
  2003-11-06  3:53     ` Ian Kent
@ 2003-11-10 16:57       ` Matthew Mitchell
  2003-11-11 13:33         ` [autofs] " Ian Kent
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Mitchell @ 2003-11-10 16:57 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

On Wed, 2003-11-05 at 21:53, Ian Kent wrote:
> On Wed, 5 Nov 2003, Matthew Mitchell wrote:
> 
> > Looking at the 4.1.0-beta2 code, I don't see any mutex-looking code in
> > handle_packet_missing.  Isn't that where it should be?  Or are you
> > taking care of it elsewhere?  (Or are you allowing the bind mounts to go
> > through in the odd case where you get past the lstat() test while still
> > waiting on the lookup_mount to finish?)
> 
> Huh?.
> 
> You know of a user space semaphore implementation other than SysV IPC.
> Tell me and I'll use it.
> 
> There is no MUTEX type code and the code that is there is not around the
> lookup.

Sorry, what I wrote was a bit unclear.  You don't need a real mutex
because there is only one automount process per mount point.  The
problem (it seems to me) is that the automount process runs mount
asychronously (fork && exec without wait()) which can lead to
unnecessary bind() mounts.   I would think that you either need to
wait() on the mount or put in a manual exclusion check around the fork
&& exec.  Of course I am only thinking about a simple NFS mount
situation; there may be other automount uses that preclude this.  If so
please educate me before I take the naive path. :)

> First I believe one of the problems is caused by contention for the mtab
> file within mount. This can cause mount to return a fail even though the
> mount has happened. Similar things happen at umount time when there are
> many master map entries. Hence using a -n switch on the bind mount test
> helps when there are many entries in the master map. This problem can also
> occur when mount requests occur in rapid succession. The other possiblity
> is that the kernel module incorrectly fires multiple mounts requests. If
> this is the case (as it was at one point) then that needs to be fixed in
> the kernel module not the daemon.

Contention for mtab might explain the weird /tmp/mount-foobar garbage
that df complains about.  I don't have any idea what was in mtab when
this happened, though.  Have to check it the next time the problem shows
up.

> I can't remember whether you gave details of your senario.
> 
> All I do is use a lock file around mount calls and a timed delay once
> the lock file is aquired. I'm not giving any guarentee that this will
> always work or that it will even work at all. However it does seem to work
> better than I expected. It is a temporary work around until a better
> solution is implemented.
> 
> Perhaps you can help here with a patch. But first try it and see if it
> works. If it aint broke then I don't have the time to fix it.

Understood.

> You should be using the latest autofs4 kernel module with this. It is
> available form the same place you got the beta autofs-4.

OK.  Looks like a big change to the kernel setup (from a quick browse of
the patch).  In the interest of isolating the change to userspace, can
you point me to a version of autofs4 that works with the stock 2.4.20
kernel module?  I'd be happy to forward-port anything that works and
test it again with the newer code.  (I can test the new stuff pretty
easily on one machine but I don't want to undertake the week-long
process of updating the kernel on the whole cluster just for this
problem.)

-m

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [autofs] "simultaneous" mounts causing weird behavior
  2003-11-10 16:57       ` Matthew Mitchell
@ 2003-11-11 13:33         ` Ian Kent
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Kent @ 2003-11-11 13:33 UTC (permalink / raw)
  To: Matthew Mitchell; +Cc: Kernel Mailing List

On Tue, 2003-11-11 at 00:57, Matthew Mitchell wrote:

> > You should be using the latest autofs4 kernel module with this. It is
> > available form the same place you got the beta autofs-4.
> 
> OK.  Looks like a big change to the kernel setup (from a quick browse of
> the patch).  In the interest of isolating the change to userspace, can
> you point me to a version of autofs4 that works with the stock 2.4.20
> kernel module?  I'd be happy to forward-port anything that works and
> test it again with the newer code.  (I can test the new stuff pretty
> easily on one machine but I don't want to undertake the week-long
> process of updating the kernel on the whole cluster just for this
> problem.)
> 

What is the question here. autofs is a kernel automounter. The autofs4
kernel module is as much a part of the package as the daemon. There is
no userspace version of the kernel module. I can't see how there can be.

Using the module distributed with 2.4 kernels is fine if you don't want
browsable mount points and that's it.

<rant>

I must also add that you don't need to change the kernel source tree at
all to use my updated kernel module. This was one of the reasons I
constructed the module build kit (available on kernel.org in the same
directory as the autofs v4 package). Further more, the changes I have
made are exclusively to the autofs4 module, as should be the case for
any filesystem module.

The other reason I put the kit together was that when I took over
maintenance of autofs v4, the previous maintainer, although happy to
hand over the maintenance of the daemon, was less than happy for me to
maintain the kernel module. So I will, as time passes, try to get my
changes into the kernel although I haven't made any progress there yet.

</rant>

You need to read the doco in the kit to use it. Briefly, you need the
kernel source tree corresponding to the running kernel and matching
kernel config file. In Makefile.conf set the macro variables as
instructed and 'make' then 'make install' (install must be done as
root). This will build the module, make a copy of the kernels' module
and place the updated module in the right place. Subsequent 'make
install' will not overwrite the backup if it already exists. 'make
uninstall' will reinstate the original kernel module.

I have tried to cater for people who are reluctant to patch and build
kernels with the kit.  

What else can I do?

-- 

   ,-._|\    Ian Kent
  /      \   Perth, Western Australia
  *_.--._/   E-mail: raven@themaw.net
        v    Web: http://themaw.net/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: "simultaneous" mounts causing weird behavior
@ 2003-11-05 18:45 Ogden, Aaron A.
  0 siblings, 0 replies; 12+ messages in thread
From: Ogden, Aaron A. @ 2003-11-05 18:45 UTC (permalink / raw)
  To: Ian Kent, Matthew Mitchell; +Cc: autofs



-----Original Message-----
From: autofs-bounces@linux.kernel.org
[mailto:autofs-bounces@linux.kernel.org] On Behalf Of Ian Kent
Sent: Tuesday, November 04, 2003 6:47 PM
To: Matthew Mitchell
Cc: autofs@linux.kernel.org
Subject: Re: [autofs] "simultaneous" mounts causing weird behavior

> 
> The /tmp entry is caused by mount failing to handle overlapping
requests.
> Aaron Ogden and I have been there recently with autofs v4.
> 
> The overlapping mount problem is likely causing the other problem as
well.
> I put some altogether ugly code, which shouldn't work at all, but
seems
> to, into autofs v4 to deal with this. In fact I hated it so much, I
> removed it at one point and Aaron was horrified to find everything
broken
> again.
> 
> Also, since the bind mount was only a test I added the -n flag to it
to
> get rid of the /tmp mount entries. Maybe Peter would like to try
something
> like that in autofs v3.

Ugly or not, autofs 4.1.0 beta2 is working pretty well over here.  :-)
Once the hung mountpoint problem gets resolved I will be a happy camper.
Ian, any word on when autofs 4.1.0 final might be released?

Re: /tmp mount entries, I found it to be a good indicator of problems
with autofs (overlapping mounts, etc.) so it was actually useful in a
way.

-A

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-11-11 13:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-04 14:42 "simultaneous" mounts causing weird behavior Matthew Mitchell
2003-11-04 17:02 ` H. Peter Anvin
2003-11-04 20:56   ` Matthew Mitchell
2003-11-04 21:03     ` H. Peter Anvin
2003-11-05  0:51       ` Ian Kent
2003-11-05  0:49     ` Ian Kent
2003-11-05  0:46 ` Ian Kent
2003-11-05 16:57   ` Matthew Mitchell
2003-11-06  3:53     ` Ian Kent
2003-11-10 16:57       ` Matthew Mitchell
2003-11-11 13:33         ` [autofs] " Ian Kent
2003-11-05 18:45 Ogden, Aaron A.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.