linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Towards a Modern Autofs
@ 2004-01-06 19:55 Mike Waychison
  2004-01-06 21:01 ` [autofs] " H. Peter Anvin
  2004-01-07 21:14 ` [autofs] [RFC] Towards a Modern Autofs Jim Carter
  0 siblings, 2 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-06 19:55 UTC (permalink / raw)
  To: Kernel Mailing List, autofs mailing list


[-- Attachment #1.1: Type: text/plain, Size: 2049 bytes --]

Hi Everyone,
                                                                                

We've spent some time over the past couple months researching how Linux 
autofs can be brought to a level that is comparable to that found on 
other major Unix systems out there.
                                                                                

The attached paper was written an attempt to design an automount system 
with complete Solaris-style autofs functionality.  This includes 
browsing, direct maps and lazy mounting of multimounts.  The paper can 
also be found online at:
                                                                                

ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.txt
                                                                                

and
                                                                                

ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.pdf
                                                                                

Over this time, We've discovered that Linux namespaces completely wreak 
havoc onthe current autofs implementation for several reasons.  We've 
tried to define consistent semantics for dealing with automounts in a 
multi-namespace environment.

After a lot of contemplation, these design ideas are the most reasonable 
we could find.  If you have alternative ideas, or better ideas, or just 
think we're idiots, let us know.  Suggestions and comments are not just 
welcome, they are NEEDED.
                                                                                

Thanks,

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #1.2: towards_a_modern_autofs.txt --]
[-- Type: text/plain, Size: 82334 bytes --]

Sun Microsystems Inc. -- Linux Software Engineering

Towards a Modern AutoFS
======================

By: Mike Waychison <michael.waychison@sun.com>
Edited By: Tim Hockin <thockin@sun.com>
Copyright 2003-2004, Sun Microsystems, Inc.

Table of Contents
=================
1	Abstract
2	Introduction
3	Requirements
4	Analyzing the Alternatives
5	Proposed implementation
5.1	Indirect Maps
5.1.1	Browsing
5.2	Direct Maps
5.3	Multimounts and Offsets
5.3.1	Explanation
5.3.2	Implementation
5.3.3	Multimounts without root offsets
5.4	Expiry
5.5	Handling Changing Maps
5.5.1	Base Triggers
5.5.2	Forcing Expiry to Occur
5.6	The Userspace Utility
6	New Facilities
6.1	Mountpoint file descriptors
6.2	Native Expiry Support
6.3	Cloning super_block
6.3.1	The --bind problem
7	Scalability
8	Conclusion


1 Abstract
==========

Automounting is a system that allows local and network filesystems to be mounted
as needed.  The automount configuration for a system is distributed via flat
files or via network based lookups.  This can become a difficult system to get
right given the large set of features required and some new features available
in Linux today.  By breaking out the task of automounting from a daemon into a
usermode helper application, we are able to simplify the architecture in both
userspace and kernelspace.  This enables us to solve some existing problems,
deal with Linux filesystem namespaces, and gives us an architecture to provide
per-namespace automount configurations.  This document describes such a system
and details what infrastructure needs to be added to the Linux kernel before
such a system can be implemented.

2 Introduction
==============

Traditionally, automounting has been implemented in one of two ways.  The
earlier implementations usually handled the entire problem of automounting by
creating a userspace NFS server.  More modern implementations have added kernel
support directly by either modifying the VFS layer or by creating filesystems
that cope with the problem.  Both of these architectures have traditionally
relied on one or more daemons that handled all policy in userspace and were
responsible for performing the actual mounting of filesystems.

The earlier implementations that used a userspace NFS server worked by mounting
NFS shares at appropriate locations in the filesystem tree.  The daemon that
served those shares would then be able to trap all directory traversals and
perform mount actions as required.  Some systems have also used similar
techniques to catch triggering actions and have the desired filesystem mounted
elsewhere, using symbolic links that point back to them. These systems lent
themselves to difficult administration and often lead to hung filesystems and
cruft mounts when the daemon was unexpectedly killed.  

Later implementations placed traps directly into the kernel by creating a new
filesystem called 'autofs'.  This filesystem would be responsible for triggering
on directory traversals and would pass mount requests up to userspace.  The
kernel infrastructure became necessary as it became more and more evident that
implementing everything in userspace became extremely tedious and difficult to
properly manage.

Different architectural models exist for daemon implementations.  Solaris
currently uses a single daemon approach that handles all requests coming from
kernelspace.  This allows for easy management of changing maps and for dealing
with the expiry of nested maps.  The single daemon approach was also preferred
because it consolidated any process overhead for the entire system into one
process.  

One of the difficulties in managing a single daemon automount system is that the
entire system must be tooled to work asynchronously.  This includes all
components from performing NIS lookups to performing NFS mounts.  Although this
is an achievable goal, it requires a lot of work.  It is much simpler to have a
single process for each automount trigger that uses the existing synchronous
facilities.

Unlike Solaris, Linux uses a multi-process daemon approach.  This system works
adequately given the level of functionality it aims to support, but it is not
without flaws.  Linux currently only supports the use of indirect maps and the
nesting of maps via the fstype=autofs mount option.  Each indirect mountpoint
has exactly one daemon and one map associated with it.  All map lookups are
performed synchronously within the daemon.  This means that a lookup for an
entry within a given mount may block and may cause a second lookup within the
same indirect mount to block unnecessarily.  This is a fair design decision as
both entries are determined to be coming from the same source and will equally
block [[[The exception to this is of course any networked map that is being
served from a slave server.  Any cached entries may be returned at different
speeds and blocking times may actually vary. A second example of where this rule
doesn't apply is when a local file-backed map is referenced, which in turn
includes another map from the network.]]].  Mounts are handled in an
asynchronous fashion by forking and executing the mount(1) command.  Linux's
implementation is multi-process in the sense that each indirect mountpoint has
an associated daemon process.  Nesting of maps is handled by maintaining
parent-child relationships between daemon processes.  A parent process manages
the parent map.  When an entry in this map is accessed which has the
'fstype=autofs' option specified, the daemon forks and executes a new copy of
itself.  This child daemon is responsible for the nested map.  The two processes
communicate with each other using signal IPC so that they may synchronize expiry
of the nested map.

One problem with Linux's multi-process approach is that it does not handle lazy
mounts in multimount entries.  This is evident in the implementation of
automount 3.9.99-4.0.0-pre10 (the most current release at the time of this
writing).  In this daemon implementation, multimounts are not lazy mounted and
will, by default, attempt to mount all entries in the multimount immediately.
It will also, by default, fail if any of the filesystems fail to mount properly.
This latter problem has been quick-fixed by the addition of a 'nostrict' option
for multimounts.  This leads to large numbers of potentially unneeded
filesystems being mounted and causes unnecessary latency.  A multimount entry
may contain multiple shares from different hosts and mounting them all can cause
a noticeable lag to a user application.  Mounting unneeded remote filesystems
also increases the likelihood that one of the filesystems will go stale and hang
processes which attempt to access them.  A filesystem can go stale when the
system serving it crashes, is rebooted or when network connectivity is lost.
For example, depending on the configuration given to an NFS client, a crashed
server will cause all processes accessing the NFS filesystem to hang
indefinitely. 

Another flaw that both Solaris' and Linux's automount models have is that
neither lends itself to dealing nicely with filesystem namespaces, a new feature
in Linux as of kernels 2.4.19 and 2.5.22.  Namespaces allow a system to have
multiple distinct mount hierarchies.  Namespaces are created by a call to the
clone(2) system call with the CLONE_NEWNS flag.  This call creates a new process
as it usually would, but the new process receives an entirely new copy of the
parent process's namespace.  This is done by creating a new mount table for the
given process, and essentially re-executing all previous mounts in the new mount
table.  This creates a completely distinct mount table, and allows any changes
such as bind mounts, moved mounts, new mounts, and unmounts to only be reflected
within that namespace.  This idea was partially borrowed from distributed
operating systems such as Plan 9 ("The Use of Name Spaces in Plan 9", Rob Pike
et al. "http://plan9.bell-labs.com/sys/doc/names.html") and Spring ("A Uniform
Name Service for Spring's UNIX Environment" Michael N. Nelson / Sanjay R. Radia
http://www.usenix.org/publications/library/proceedings/sf94/full_papers/nelson.ps),
which allows users to create their own mount hierarchies, independent from any
other user.  

The key as to why these automounting implementations cause problems when using
namespaces is that the previous implementations rely on a daemon process that
inherently resides within a single namespace.  Whenever an autofs mount is
triggered, the kernel communicates with the daemon, which in turn mounts the
filesystem onto the given path.  Unfortunately, this breaks cross namespace
functionality because the mounted filesystem is grafted into the daemon's
namespace, which may or may not be the same namespace as used by the triggering
application.  Namespaces are designed such that cross-namespace facilities are
deliberately absent.  The easiest method for performing any cross-namespace
functions is to execute within the alternative namespace.

Determining whether a process is a user application causing a trigger or an
automount daemon performing a mount has also traditionally been difficult and
required special casing.  We can avoid any such special casing by providing a
file descriptor that describes the target directory to the automounter, which
would in turn fchdir(2) to the target location.

With the current state of automount understood, we can explore the problems that
exist today and look at new approaches to automounting.  

3 Requirements
==============

A new automount system involves several new requirements in order to work
gracefully with new Linux facilities.  To enumerate these requirements we must
start by examining the current implementations and determining where things
begin to break.  Specifically, we will look at the current modes of userspace /
kernelspace communication used by both the current Linux autofs3 and autofs4
implementations.

Traditionally the autofs filesystem has needed a way to distinguish whether an
application that traverses into an autofs mount is a regular user process, or
the daemon coming in to perform a mount.  This has previously been handled in
Linux by identifying the daemon using its process group, as registered at mount
time.  The use of the daemon's process group not only abuses Unix semantics, but
also makes handling complex automount hierarchies very difficult.  It forces the
implementation to handle nested mounts using distinct processes in order to
traverse the outer directories as if it were not the automounter. 

Another big caveat to the current approach is the system's reliance on the
automount daemon registering an open pipe with the kernel.  This registration is
made at mount time using a mount option to pass the pipe's file descriptor.
This kind of communication channel registration makes for a system that is
incapable of self-healing.  It is impossible in this form of communication for
the daemon to disconnect from the kernel and reconnect.  A daemon that dies
(accidentally or forcefully) will leave the system with autofs filesystems
mounted yet stuck in what is called a 'catatonic' state.  The autofs filesystems
will give up trying to communicate with their respective daemons and will not
process any new triggers.  On top of that, any expiry runs that should be
occurring will cease to run as they are invoked by the daemon itself [[[Expiry
is triggered from userspace via an ioctl on the root directory of the autofs
filesystem.  The filesystem will in turn check to see if any of the current
sub-mounts have been inactive for some period of time and will return the
path(!) of the entry to expire back to userspace.   Userspace will then attempt
to unmount the path using umount(8).]]].  This forces an administrator to either
manually unmount all the filesystems left behind, or more often than not, simply
restart the daemon, causing more filesystem to be mounted over the existing
stale ones in wait for the next reboot. 

Another reason one would want to move away from a single daemon approach is
because automounting semantics are not very clear when  namespaces are used.
One of the driving forces behind implementing distinct namespaces in Linux is to
allow the root user to create distinct mount environments for differing services
and users.  This is different from chrooting because processes outside of the
chroot environment can still navigate any new mountpoints within the chroot.
When using namespaces, processes cannot navigate mounts that are not within
their own namespace.  

One particularly useful advantage to namespaces is that a user may mount a
privileged filesystem such as a Samba share, without allowing any other users to
see the mount in question.  Not even the root user himself would be able to gain
access to the contents of the filesystem.  Another possibility of namespaces is
that a system may be configured such that upon login, the login process could
create a new namespace for that user and bind mount $HOME/tmp over /tmp.  In
effect, the user has a private /tmp directory that no other user is capable of
accessing.

Namespaces are currently implemented such that root can create a new namespace
by deriving from an existing namespace.  From this derivation, the namespace may
be completely customized by adding and removing mounts in the system.  Given the
current Linux autofs implementation, any derived namespace will inherit autofs
filesystems, but they do not work as expected, as the persistent daemon has no
access to this namespace and cannot thus mount new filesystems upon trigger.
Instead of the mount occurring in the derived namespace, where it was triggered,
the mount will occur in the original namespace in which the daemon is running
and not be visible from the triggering namespace.  Further, any filesystems that
were automounted in the original namespace will persist in the new namespace and
will never expire.  The original mount will expire in its own namespace but the
cloned copy of it will not be visible to the daemon.  Even if the kernel (which
can see all namespaces) told the daemon that the mount needed to be expired, the
daemon itself has no way to unmount the filesystem in a namespace other than its
own. This is clearly not the desired functionality.

Ideally, one would like to properly be able to inherit automount triggers when
creating a new namespace.  Automount triggers would ideally work as configured
in the parent namespace, but also be removable and installable using a different
automount configuration.  It is also desirable to have a system that is not
reliant on a persistent daemon and which is capable of healing any stale
triggers.  The most obvious approach to handle these kinds of problems is to
remove any persistent namespace context - namely the kernel's reliance on a
single daemon, while providing more namespace context during the mounting
process.

The following new set of architectural requirements become necessary:

o Automount triggers should continue to operate properly within a cloned
  namespace.  We want to be sure that an automount trigger that exists in both
  the parent and child namespaces will cause a mount to occur in the appropriate
  namespace only.

o Automount triggers that are inherited from a parent namespace should remain
  distinct from their parent counterpart.  We cannot allow a user in one
  namespace to alter the automount configuration across multiple namespaces.

o Filesystems that have been automounted and duplicated into a cloned namespace
  should continue to expire.

o The addition or removal of an automount trigger should only affect the
  namespace in which the change applies.

In addition, the following functions are required above and beyond the existing
Linux automount implementation in order to be in line with the functionality
provided by other Unix implementations:

o Both direct and indirect maps should work as expected.

o The system should expire and unmount any unused automounted filesystems.

o Lazy mounting should occur wherever possible.

o The system must be able to scale to thousands of mounts.

o The browsing of indirect maps should be supported.

o The system should be able to handle changing maps and update the current
  configuration as required.

4 Analyzing the Alternatives
============================

Working with these requirements in mind, different types of architectures can be
considered.  Several facets of each potential architecture need to be examined.

1) Are any of the required facilities to implement this architecture already in
   place?

2) How much state is duplicated between userspace and in kernelspace?

3) How well can automount triggers be handled in a multi-namespace environment?

4) How simple is the implementation and how prone is it to error?

With these questions, we can evaluate different architectures for our new
system. The following are a couple differing ways a new automounting system can
be architected.

1) Perform everything in kernelspace.  There is no need for a daemon.   A
   utility will communicate with the kernel to install all the triggers.  It is
   the kernel's responsibility to catch all directory traversals that require a
   new mount to occur.  The kernel also handles name-service lookups, map entry
   parsing and performing the actual mounts.

Pros:
   o Makes handling cross-namespace triggers a lot easier as full access to
     kernel data-structures is available.

   o Managing atomicity when handling a trigger is greatly simplified.

   o Full access to map resources is available.

Cons:
   o Lookups being performed in the kernel places an enormous amount of logic in
     the kernel that is probably better left in userspace.

   o Does not leverage the benefit of using the mount(8) utility which already
     handles mounting different filesystems very well.  Many filesystems,
     notably NFS and SMB, have differing APIs for handling mounts and require
     packed structures to be passed to the kernel.

   o Requires new APIs to be put in place that will allow userspace applications
     to remove triggers from their mount table.

   o Canceling a trigger action (e.g.: via a SIGINT) becomes much more difficult
     to handle properly.

2) Continue using a multiprocess daemon using file descriptors to describe the
   target mountpoints.  Use a daemon similar to that used in the current Linux
   automount package.  Augment the kernelspace/userspace communication protocol
   so that we can have the daemon mount and unmount on file descriptors (which
   are namespace aware) instead of by pathnames (which are namespace dependent).

Pros:
   o Automounting continues to work across a cloned namespaces.

Cons:
   o Requires new API that allows the passed back file descriptor to be
     re-associated with a map and key.

   o Would require one persistent process per direct/indirect mountpoint.  

   o Difficult to handle lazy mounting of multimounts.

   o Difficult to manage a large hierarchy of processes that is continuously in
     flux.

   o Duplicates structure information found in the kernel.

   o Doesn't allow for clean administration of differing automount schemas
     across different namespaces.

   o Requires new system calls to natively support mounting and unmounting on a
     file descriptor.

   o Cloned namespaces are left with automount triggers that do not have a
     daemon running in the new namespace.

3) Create a single process daemon that is capable of handling all trigger
   requests across the system.  Again, uses file descriptors passed back from
   kernelspace to describe mountpoint targets.

Pros:
   o Consolidated process and memory overhead.

   o Can be done without maintaining too much state in the userspace daemon.

   o Continues to work as desired across cloned namespaces.

Cons:
   o Requires new API for grabbing the file descriptor on which to mount, and
     associate the proper map sources.

   o Access to map information across namespaces is difficult to access. Files
     may differ, as may network service client configurations.

   o Requires new system calls to natively support mounting and unmounting on a
     file descriptor.

   o Requires asynchronous infrastructure to handle synchronous name service
     APIs.
     
   o Managing differing automount configurations becomes difficult.

4) Use a usermode helper application that handles the trigger requests.
   Contextual information is passed to the kernel when installing the automount
   trigger.  This information is then passed back to a usermode helper
   application that is invoked on each triggering action.  The usermode helper
   is invoked within the triggering action's namespace. All lookup logic and
   mounting is handled by the usermode helper which then mounts the desired
   filesystem on a given file descriptor which describes the target directory.

Pros:
   o API for passing file descriptor and associated map information is already
     in place. All information can be passed in to the helper application via
     command line arguments, environment variables and through open file
     descriptors.

   o No daemon means state is only maintained in kernelspace.

   o Allows in place replacement of the userspace infrastructure.
   
   o No need to worry about a daemon dying and leaving the system with stale
     automount triggers.

   o Easy access to local namespace configuration for both file maps and network
     services.

Cons:
   o A lot of triggers occurring simultaneously would invoke many processes.

   o A new facility that allows mounting operations using file descriptors of
     directories is needed.

The alternative approach of using a usermode helper application to handle the
mount requests using a usermode helper application quickly becomes a viable
option when one realizes the benefits in both cross-namespace use and
reliability. By moving any logic that was previously in the daemon out into a
usermode application, we can enrich the userspace/kernelspace protocol by giving
the process context about where the triggering action occurred.  The use of the
hotplug system is preferred in this implementation because it is already a
well-defined and accepted form of kernelspace to userspace communication, though
a separate but similar system could be used instead.  /sbin/hotplug is currently
invoked with any number of arguments and any number of environment variables.
The goal is to have all trigger events be performed by the userspace agent.
Unfortunately, as we will discover, implementing expiry is a more difficult task
and must be done completely in the kernel.

Implementing automounting without having a single persistent daemon does also
have its own problems.  It assumes that the system upon which the automounting
is occurring will have enough system resources to be able to handle a high
automounting load.  By invoking a single process per automount action, we are
consuming more resources than a more traditional automount system would
otherwise consume, and doing so in bursts.  It is the belief of the author that
these extra resources are reasonable and will not grossly affect the performance
of the system.  These assumptions should however be properly qualified by
performing relevant benchmarks and stress tests on a prototype implementation.

The rest of this document describes a way to implement an automount system that
uses a usermode helper application to perform automount requests.

5 Proposed implementation
=========================

By removing the need for a persistent daemon and by adding mountpoint navigation
facilities we are able to address all of the shortcomings of the current Linux
automount system and fulfill all of the new requirements introduced by
namespaces.  The preferred approach is to use a userspace helper application
similar in nature to that used by the hotplug subsystem.  /sbin/hotplug already
provides userspace defined agents for a variety of systems and adding an
automount agent is as simple as dropping a file in the /etc/hotplug directory.

It must be noted that the hotplug action will run outside of any chroot(2)
environments.  The current Linux automount implementations do not enforce any
such restriction and mixing automounting with chroot(2) leads to undefined
behavior.  Chroots are different from namespaces because they share portions of
the mount-table while differing namespaces do not.  Forcing the hotplug
invocation to occur at the root of a namespace enforces a single automount
configuration per namespace.  These semantics are similar to those on other
operating systems when automounting and chroots are used in conjunction.

Registering an automount in a namespace will still be handled as a filesystem
that will be responsible for catching any triggering actions.  In the current
Linux autofs implementations, the file descriptor for the writing end of an open
pipe is passed as a mount option and used for kernelspace to userspace
communication.  This makes the kernel dependent on the pipe being open for
communication with userspace.  This causes an automount trigger to become
catatonic when the reading end of the pipe is closed.  This communication
artifact will be completely removed as part of the new protocol.  

The daemon's process group is used in the existing automounter implementation to
let the filesystem determine if the process causing a trigger was a user process
accessing automounted resources or an automount daemon satisfying a prior
request.  In the design outlined in this document, we avoid this issue
altogether by allowing the servicing process to bypass pathname walks.  This is
done by using file descriptors to describe target locations of mounts.  

In addition to describing target directories as file descriptors, mount
operations that are be capable of dealing directly with file descriptors are
needed.  Assuming new mount facilities are in place, mount operations throughout
this document are done in terms of directory file descriptors. Rudimentary
requirements are summarized in section 6.1.

Installing automount triggers in a system will be handled by mounting 'autofs'
filesystems at the appropriate locations.  Mount options will be used to pass
all the context information needed later by the helper application when
responding to triggering actions.  Most of these mount options will not be
interpreted by the kernel itself.  They solely serve to pass contextual
information to the helper application upon invocation.  All mount options that
are interpreted by the kernel are noted as such.

5.1 Indirect Maps
-----------------

The implementation of indirect maps will be done using an autofs filesystem
similar to that found in the current implementation.  The main difference being
that it will take a list of mount options indicating that it is an indirect map
as well as where the indirect map entries can be found.  For example, if the
directory /home is to be an indirect mountpoint using the map auto_home, the
following mount command would be used:

----------
mount -o maptype=indirect,mapname=auto_home -t autofs autofs /home
----------

This would mount a filesystem of type autofs on the /home directory in the
current namespace.  The 'maptype' mount option is used by the filesystem code
and tells it to use indirect map semantics [[[The difference between direct and
indirect semantics is that a direct map requires a trigger to occur on traversal
into the autofs filesystem while an indirect map requires a trigger to occur
traversal into each subdirectory.  Direct maps are described in more detail in
the next section.]]].  

A simple example indirect map might have a single entry as follows:

---------
mikew		host:/export/home/mikew
---------

Later on, if user mikew were to access his home directory /home/mikew, the
system hotplug handler would be invoked as root in the same namespace as the
triggering process:

---------
/sbin/hotplug autofs mount
---------
 
This process is invoked in the same namespace as the triggering process because
in order for the triggering process to see the mounts, we require that all
mounts occur in the namespace of the triggering application.  Also, the hotplug
helper needs to access the configuration of the triggering application's
namespace.  This configuration may include the /etc directory, as well as any
NIS and/or LDAP settings.  Execution of the hotplug system is currently
hard-coded to run in init's context.  Running /sbin/hotplug in an arbitrary
namespace differs from the existing hotplug functionality and should be
documented as such. [[[This semantic difference may justify using a different
executable rather than /sbin/hotplug.   Either way, hotplug is used for the sake
of discussion.]]]

When invoked, the following environment variables would be set [[[This document
uses environment variables to pass values to the hotplug agent because it is
easier to convey their relations in pseudo-code terms.  An actual implementation
may choose to use command line arguments instead of environment variables
because '/sbin/hotplug autofs mount auto_home mikew 0' appears clearer.  This is
an implementation detail and of little importance to the discussion at hand.]]]:

---------
MOUNTFD=0
MAPNAME=auto_home
MAPKEY=mikew
---------

The hotplug agent would be responsible for performing the keyed lookup of
$MAPKEY in the map named $MAPNAME.  It would then use the information in the
entry to perform the mount directly on the $MOUNTFD specified before returning a
successful exit code.  For the simple indirect mount case, these three
environment variables comprise all the information that is required to properly
perform the userspace actions.  The $MOUNTFD environment variable refers to the
number of an open file descriptor of the directory upon which to mount.  The new
mount system call will be used to allow for file descriptor based mount
operations.  A file descriptor is preferred because it allows any mount-related
system calls to completely bypass any pathname resolution, thus allowing the
automounter to bypass any triggers directly.  This simplifies any blocking logic
when a mount is occurring and eliminates the need for identifying the helper
application as performing the mount.  This allows us to have automount triggers
handled by individual processes without any special reliance on their process
group.  It also alleviates the need for persistence (again, due to the process
group dependency).

Once an autofs filesystem is mounted, we no longer rely on its absolute path for
automount functionality.  We effectively disassociate any map context
information from the actual location of the mount.  This allows autofs mounts to
be moved (mount(8) --move option) or bound (mount(8) --bind option) without
affecting automount functionality.  It also allows an administrator to install
automount triggers without modifying the /etc/auto_master file.  For example, a
map auto_ws could be manually installed on directory /ws using a command such
as:

---------
mkdir /ws
mount -o maptype=indirect,mapname=auto_ws -t autofs autofs /ws
---------

This can be done without affecting any currently configured automount triggers.

5.1.1 Browsing
``````````````

When an indirect map is installed on a directory, the resulting filesystem has
no files or directories within it.  Subdirectories are created upon lookup.  For
instance, the indirect mount on /home mentioned above would have no contents
(other than the usual '.' and '..' entries) until access to some subdirectory is
performed.  

The exception to this rule is when the map entry for /home contains the option
'browse':

----------
/home		auto_home	-browse
----------

In this case, a directory listing of /home should return a directory entry for
each valid key in the associated map.  None of the entries should be automounted
when this is performed.  Such actions are delayed until the directories are
traversed.  This is useful from a user perspective, allowing a user to enumerate
all entries that are available without requiring any mounts to occur.

In order to implement this functionality we begin by adding a 'browse' mount
option to the autofs filesystem.  This option switches behavior such that an
indirect mount filesystem will call the usermode helper with the following
information upon the first directory listing request (called by the ->readdir
file operation on the root directory of the filesystem).  The usermode helper
will be called with the 'browse' action and will receive the following
information on invocation:

----------
MAPNAME=auto_home
OUTPUTFD=0
----------

It is then the helper application's responsibility to retrieve the map and
validate the entries.  It will then pass the keys of the map back to kernelspace
by printing them out to the file descriptor described by $OUTPUTFD.  The kernel
will take the values written to $OUTPUTFD and will later used them to fill in
requests to readdir.  It will need to create dummy directory entries so that
lookups caused by calling stat(2) will return valid results.  Once again, the
usermode helper application will run within the same namespace as the triggering
application so that namespace-local configuration is used.

In order to maintain some form of coherency between changing maps, these dummy
directory entries will remain in place within the dcache so that the kernel
doesn't need to query the usermode helper as often.  These entries will
periodically timeout and will be unhashed from the dcache.  Any subsequent
directory listing requires the kernel refresh these entries with a new call to
the usermode helper.  The timeout will be specified as another mount option
('browsetimeout=<seconds>') to the autofs filesystem.  The value will be passed
back to the usermode helper when mounting as the environment variable
$BROWSETIMEOUT, so that the usermode helper may inherit these values for any
nested maps.  This environment variable will be specified for all automount
types, however, the browsetimeout mount option will only be used by autofs
mounts that have maptype=indirect and the browse options set.  Other
configurations will silently ignore this value. A default value of 10 minutes
(600 seconds) will be assumed.

Executing the usermode helper within the namespace of the triggering application
does have a problem when browsing is used.  We are caching map keys in
kernelspace and can run into coherency problems when an autofs super_block is
associated with multiple namespaces which have differing automount maps in /etc.
This kind of situation may occur if a namespace is cloned and a new /etc
directory with a different auto_home map is mounted.  The results from a readdir
within the first namespace may differ than the expected results from a readdir
in the derived namespace.  In order to handle this, facilities need to be added
that allow autofs super_blocks to be cloned when cloning namespaces.  Doing so
ensures that an autofs super_block is local to its namespace and the
namespace-local configuration.  Cloning of super_blocks is described in section
6.3.

5.2 Direct Maps
---------------

Direct maps will be handled in a similar fashion to indirect maps.  The main
differences are outlined as follows:

1) The mount option 'maptype' is now 'direct'.  This tells the filesystem code
   to have direct map semantics.

2) The map key for the direct mount entry is now passed as a new mount option
   called 'mapkey'.  It will be the key to use when looking up the entry in the
   direct map.  For direct map entries, this will always be the same as the path
   upon which the trigger is mounted; however, handling lazy mounts will also
   use this value as they will use the same kind of automount trigger.  

This is different from indirect maps where the map key is produced by a
directory lookup.  Direct automounts have no such directory lookup and this
contextual information must be explicitly specified at mount time.  The value of
this mount option is used as the $MAPKEY environment variable when the hotplug
agent is invoked.

When a user process traverses into the root of an autofs filesystem that has
maptype=direct, a mount needs to be performed.  The triggering process will
block while the hotplug userspace helper application is again invoked in the
triggering process's namespace.  For example, assume that the auto_master file
has the following entry:

----------
/-	/etc/auto_direct
----------

This tells the installing application (see below: The Userspace Utility) to
iterate over the /etc/auto_direct map and install a direct automount trigger for
each of the entries in the map.  Assume the auto_direct file contains one entry:

----------
/usr/share	hostname:/export/share
----------

To install this entry, the following mount command would be used:

----------
mount -o maptype=direct,mapname=/etc/auto_direct,mapkey=/usr/share \
  -t autofs autofs /usr/share
----------

This hands the kernel all the information it needs to pass back to the hotplug
agent in order to let it perform the mount when necessary.  When the agent is
invoked, it is again called with the 'mount' action and it is passed the same
environment variables as in the case of an indirect mount.  In our example these
are:

----------
MOUNTFD=0
MAPNAME=/etc/auto_direct
MAPKEY=/usr/share
BROWSETIMEOUT=600
----------

The helper application will need to go through and lookup the key '/usr/share'
in the map '/etc/auto_direct', parse the entry and finally mount the relevant
filesystem on the directory specified by the given file descriptor [[[Even
though the value of the key looks like an absolute path, it should not be
interpreted as such.  Its sole purpose is to index into the given map.]]]. This
is exactly the same logic as required for handling indirect maps.

5.3 Multimounts and Offsets
---------------------------

5.3.1 Explanation
`````````````````

A multimount is a map entry with an extended syntax that allows for a
potentially complex hierarchy of filesystems to be mounted on a given directory.
Multimounts may occur in both direct and indirect maps.  They are most often
used to enable the automounting of one NFS share nested within another.  For
example, if we want to automount hosta:/export/src on /usr/src and
hostb:/export/linuxsrc on /usr/src/linux, we would need to use a multimount. In
this case the multimount entry would be placed in a direct map and would look
like the following:

----------
/usr/src                hosta:/export/src	\
            /linux      hostb:/export/linuxsrc
----------

In this example, the hosta:/export/src is to be mounted directly on the /usr/src
directory, and hostb:/export/linuxsrc.   The mount information for /usr/src
could have also been written as:

----------
/usr/src    /           hosta:/export/src	\
            /linux      hostb:/export/linuxsrc
----------

In this example, the '/' of the multimount is explicit whereas in the first
example it was implied.  Both path components '/' and '/linux' are called
offsets.  A multimount is comprised of a set of offsets, each of which has a set
of sources.  In all the examples in this document, only one source (such as an
NFS share) is given for each offset.  There can very well be more than one
source per offset.  This technique of listing multiple sources is used to
specify fail-over redundancy.  Handling NFS fail-over redundancy is better
implemented within the NFS subsystem and is not described in this document.

By design, the multimount syntax is really just a superset of the regular map
entry syntax.  For example, the following two map entries are equivalent:

----------
Entry 1:
	mikew			hostc:/export/home/mikew

Entry 2:
	mikew		/	hostc:/export/home/mikew
----------

In the first entry, the '/' offset is implied.  So by design, all map entries
may be treated as a multimount.  Most of which simply only have the 'root
offset' defined.

One of the interesting aspects of multimounts is that entries do not have to
have a 'root offset' defined at all.  For instance, consider the situation where
three users exist on the system and their home directories all come from NFS
servers.  The indirect map for /home may look something like this:

----------
userA		host:/export/home/userA
userB		host:/export/home/userB
userC		host:/export/home/userC
----------

A new user is then added to the system who needs /home/userD/server1 to come
from one server, while /home/userD/server2 to be mounted from a second server.
There is no need to mount anything directly on /home/userD.  This can be quickly
added to the above map as the following entry:

----------
userD		/server1	host1:/export/share1		\
		/server2	host2:/export/share2
----------

In this entry, there are two different offsets defined, namely '/server1' and
'/server2' but there is no 'root offset' defined.

To complicate matters even more, offsets can also nest within each other:

----------
/usr		/		hosta:/export/share/usr	\
		/src		hostb:/export/src		\
		/src/linux	hostc:/linuxsrc
----------

The desired behavior is to 'lazy-mount' all these mounts.  This means that only
those directories that are accessed are ever mounted.  So, if only /usr is being
accessed, then only the share from hosta is mounted.  Only when /usr/src is
first accessed will the share from hostb be mounted.  The same 'laziness' holds
for /usr/src/linux from hostc.

5.3.2 Implementation
````````````````````

An interesting aspect of implementing lazy mounts is that a multimount entry can
be broken down into several direct mounts.  This is done by associating an
offset value with each direct mount trigger.  This offset value is used at
trigger time to identify which portion of the mount has just triggered and which
subsequent triggers need to be installed.  This offset value will be specified
at autofs mount-time using a new mount option, 'mapoffset', and will be passed
down to the hotplug agent as a new environment variable: $MAPOFFSET.  The
'mapoffset' mount option will default to '/' if it is not explicitly specified.
This builds on the definitions explained above for both direct and indirect
maps.

With this in mind, we provide an example using the following direct multimount
entry from map auto_direct:

----------
/usr            /                hosta:/export/share/usr        \
                /src             hostb:/export/src              \
                /src/linux       hostc:/linuxsrc
----------

The mount command used to install the trigger would now look as follow (with
additions in bold):

----------
mount -o maptype=direct,mapname=auto_direct,mapkey=/usr\
  ,mapoffset=/ -t autofs autofs /usr
----------

Once this automount trigger has been installed, a first access to the directory
/usr will cause /sbin/hotplug to be invoked with the following environment
variables:

----------
MOUNTFD=0
MAPNAME=auto_direct
MAPKEY=/usr
BROWSETIMEOUT=600
MAPOFFSET=/
----------

$MOUNTFD, $MAPNAME, $MAPKEY are still defined as in the explanations of both
direct and indirect map handling.  The agent is to retrieve the entry with key
'/usr' from the map 'auto_direct' and parse it.  The key addition is that it now
uses the $MAPOFFSET to figure out which part of the entry is being mounted.
Once the filesystem is mounted, the agent then mounts any other required child
offsets on top of the filesystem before exiting.  So, in the case of traversing
into the /usr directory, the following actions are performed:
	
o lookup key '/usr' in map 'auto_direct'
o parse entry
o lookup offset '/' in entry
o mkdir('/tmp/<unique_dir>')
o mount 'hosta:/export/share/usr' '/tmp/<unique_dir>'
o mkdir('/tmp/<unique_dir>/src')
o mount -o maptype=direct,mapname=auto_direct,mapkey=/usr\
  ,mapoffset=/src -t autofs 'autofs' './tmp/<unique_dir>/src'
o fchdir($MOUNTFD)
o mount --move '/tmp/<unique_dir>' '.'
o rmdir /tmp/<unique_dir>
o exit(EXIT_SUCCESS)

In this and following examples, we choose to use a temporary directory
'/tmp/<unique_dir>' as an intermediate root of our mount because we need to be
able reach into the newly mounted filesystem to install the child offsets.  If
we had directly mounted the share from hosta on $MOUNTFD, we would not be able
to change the current working directory into the newly mounted filesystem
without first traversing back into the parent directory and then walking back
across the trigger.  Using this intermediate directory allows us to bypass this
completely.  Once we have finished performing all of the nested mounts we
complete the transaction by moving tree of mounts directly onto the target
directory and returning a successful exit code. [[[A final implementation would
preferably use what we refer to as 'floating mountpoints' as described in
section 6.1, 'Mountpoint file descriptors' to achieve the same desired effect
without requiring the building of mountpoints in a temporary directory.]]]

Comparing the initial autofs mount and the nested autofs mount, we notice that
the only difference between the trigger on /usr and the trigger on /usr/src is
the mapoffset mount option.  This differentiator is enough to distinguish the
two automount triggers.

If a user were then to traverse into /usr/src, similar actions are performed by
the agent:

o lookup key '/usr' in map 'auto_direct'
o parse entry
o lookup offset '/src' in entry
o mkdir('/tmp/<unique_dir>')
o mount 'hostb:/export/src' '/tmp/<unique_dir>'
o mkdir('/tmp/<unique_dir>/linux')
o mount -o maptype=direct,mapname=auto_direct,mapkey=/usr\
  ,mapoffset=/src/linux -t autofs 'autofs' '/tmp/<unique_dir>/linux'
o fchdir($MOUNTFD)
o mount --move '/tmp/<unique_dir>' '.'
o rmdir('/tmp/<unique_dir>')
o exit(EXIT_SUCCESS)

Finally, if one walks into the /usr/src/linux directory:

o lookup '/usr' in map 'auto_direct'
o parse entry
o lookup offset '/src/linux' in entry
o mkdir('/tmp/<unique_dir>')
o mount 'hostc:/linuxsrc' '/tmp/<unique_dir>'
o fchdir($MOUNTFD)
o mount --move '/tmp/<unique_dir>' '.'
o rmdir('/tmp/<unique_dir>')
o exit(EXIT_SUCCESS)

5.3.3 Multimounts without root offsets
``````````````````````````````````````

The only remaining problem to be dealt with is multimounts that have no 'root
offset'.  These are a special case of regular multimounts and can be handled by
still installing the direct mount trigger on the root of the multimount.
However, instead of mounting a real filesystem upon trigger, a tmpfs filesystem
is mounted before the agent proceeds to install child trigger mounts.  Following
is the auto_home map bound to /home from a previous example:
	
----------
userA		host:/export/home/userA
userB		host:/export/home/userB
userC		host:/export/home/userC
userD		/server1	host1:/export/share1		\
		/server2	host2:/export/share2
----------

We still install the indirect trigger on /home as before:

----------
mount -o maptype=indirect,mapname=auto_home -t autofs autofs /home
----------

When a process traverses into the /home/userD directory, the following
environment variables are passed to the /sbin/hotplug agent:

----------
MOUNTFD=0
MAPNAME=auto_home
MAPKEY=userD
MAPOFFSET=/
----------

The agent takes this information and performs the following actions:

o lookup 'userD' in map 'auto_home'
o parse entry
o lookup offset '/' in entry
o mkdir('/tmp/<unique_dir>')
// no root offset found!  Install dummy filesystem:
o mount -t tmpfs 'tmpfs' '/tmp/<unique_dir>'
// handle child offsets
o mkdir(/tmp/<unique_dir>/server1)
o mount -o maptype=indirect,mapname=auto_home,\
  mapkey=userD,mapoffset=/server1 -t autofs 'autofs' '/tmp/<unique_dir>/server1'
o mkdir(/tmp/<unique_dir>/server2)
o mount -o maptype=indirect,mapname=auto_home,\
  mapkey=userD,mapoffset=/server2 -t autofs 'autofs' '/tmp/<unique_dir>/server2'
// remount the tmpfs filesystem read-only because it is just a dummy filesystem.
o mount -o remount,ro '/tmp/<unique_dir>'
// move the tree of mounts onto the target directory
o fchdir($MOUNTFD)
o mount --move '/tmp/<unique_dir>' '.'
o rmdir('/tmp/<unique_dir>')
o exit(EXIT_SUCCESS)

We use a tmpfs filesystem on /home/userD because we need to be able to create
directories and we would like to have these directories exist on a filesystem
that is expirable.  Traditionally, the directory of the root offset for entries
with no defined root offset is immutable.  It may not be changed by any
userspace program.  We use the simple approach of remounting the filesystem
read-only once we have created the directories to simulate this effect.

The two nested direct mount triggers act as they normally would.

5.4 Expiry
----------

Handling expiry of mounts is difficult to get right.  Several different aspects
need to be considered before being able to properly perform expiry.

In the existing Linux autofs implementations, the system works such that the
userspace daemon will ask the autofs filesystem code to check to see if any of
the automounted filesystems can expire (this is done by calling an ioctl on the
base directory of the autofs filesystem).  The autofs filesystem will then
acquire the necessary locks and walk each of the currently mounted filesystems
to see if anybody is using them.  If the kernel code determines that a mount is
ready to be expired, it sends the path back to the daemon.  The daemon in turn
unmounts it from userspace.  This method of expiry has several problems:

o The autofs filesystem really should know as little about VFS internal
  structures as possible.  In this case, the filesystem code is charged with
  walking across mountpoints and manually counting reference counts.  This task
  is much better left to the VFS internals.

o Unmounting the filesystem from userspace is racy, as any program can begin
  using a mount between the time the daemon has received a path to expire and
  the time it actually makes the umount(2) system call.  This sequence of events
  would make the expiry fail.  Even worse, manually unmounting several mounts in
  a multimount can possibly lead to an expiry that fails to unmount after some
  of the mounts have already been unmounted, leaving the multimount in an
  inconsistent state.

o Having userspace initiate mount expiry requires a userspace application to
  periodically make the query the kernel.  This is done using a daemon, but as
  we have already discovered, automounting with a daemon does not work well when
  you are working in a multi-namespace environment.

These points suggest that the kernel's VFS sub-system should be charged with
handling expiry.  Some of the benefits of having it perform this functionality
over other ad-hoc solutions are:

o All data structure specifics (like navigation and lock semantics) are
  maintained within the same component of the kernel.  This improves
  maintainability and sustainability of the kernel proper and of individual
  filesystem implementations.

o Other filesystems would like to have expiry functionality in the VFS
  sub-system.  Providing this service at the VFS layer would reduce duplicated
  efforts between filesystems to support this functionality.  Similar to this is
  the way the VFS layer provides read-only functionality for all filesystems
  from a higher level of abstraction.

The following questions must be answered before a complete expiry solution is
designed:

o How will the kernel determine the expiry timeout value?  In other words, how
  does it know how much time must pass for an unused mountpoint before it
  expires?

    We will need to pass timeout values in from userspace.  The simplest method
    to pass this information to the kernel is to pass it to the VFS layer as a
    mount option. This option is tentatively named 'vfsexpire' and will accept a
    timeout value given in seconds. [[[Unfortunately, the current mount system
    calls do not allow arbitrary information to be passed directly to the VFS
    layer if they cannot be represented as a boolean flag.  A new set of system
    calls and interface semantics will need to be thought about and implemented
    for this mount option to be available.]]]

    As described above, we may be installing multiple mounts upon each trigger.
    This tree of mounts will need to expire together as an atomic unit.  We will
    need to register this block of mounts to some expiry system.  This will be
    done by performing a remount on the base automounted filesystem after any
    nested offset mounts have been installed

o How will the VFS layer verify that a filesystem is inactive?

    The VFS layer can atomically peek into the mountpoint structures (struct
    vfsmount) and look at the given reference counts to determine whether a
    filesystem is currently active or not. 

Reference counting alone does not solve the issue of having to be able to
atomically unmount several mountpoints.  This is evident when lazy-mounting is
considered.  We would like to expire a base mountpoint that may optionally have
nested autofs mounts ready to catch a trigger.  These nested mounts increase the
reference count on the base mount, and thus need to be considered as counting
towards the total reference count.  These nested mounts in turn must recursively
also be inactive for the base mount to expire.

The proposed semantics are as follows:

o A mount may be made without the vfsexpire mount option.  In this case, the
  value defaults to 0, specifying that this mountpoint will never expire.

o A mount may be made with the vfsexpire=n mount option.  This specifies that
  the kernel may detach this mount at some time after at least n seconds have
  passed with the mount inactive.

o An existing mountpoint may be remounted with vfsexpire=0.  This signifies that
  if this mountpoint was to previously set to expire, it no longer will.

o An existing mountpoint may be remounted with vfsexpire=n, where n is non-zero.
  This signifies that this mountpoint together with any mountpoints currently
  underneath it will expire atomically.  That is to say, if all of the said
  mounts are inactive (no one is using any of them, and nothing else is later
  mounted within them), only then will the entire tree of mounts expire
  together.  This is an all or nothing expiry, where a hierarchy of mountpoints
  expires as a single unit.

We require that a tree of mounts be able to expire atomically together to ensure
that we do not wind up with a partial expiry.  A partial expiry would break our
ability to lazy mount as some of the nested autofs filesystems would no longer
be mounted.  Such an arrangement would remain inconsistent until the root of the
expiry is unmounted.

The unmount itself will be performed within the kernel.  Doing so assures that
the unmount occurred while nobody was accessing the filesystem.  Further details
on how native expiry support may be implemented are described below in section
6.2.

5.5 Handling Changing Maps
--------------------------

In a network that uses automounting in abundance, it is expected that maps will
change fairly often.  It is desirable that systems using the new automounting
architecture will stay coherent with the maps provided by the nameservices on
the network.

Before designing a strategy to handle changing maps, it is important to first
understand what types of changes can occur.  Table 1 describes a cross-section
of map entry types and of the types of changes that may occur.  This
cross-section view allows us to identify how map changes are propagated to a
running configuration given the automount system described thus far.

Table 1 - Strategies for Changing Maps

                  Entry Modified   | Entry Removed   | Entry Added
                  ____________________________________________________
Direct Entry     |                 |                 |
(in direct map   | Updated on      | Requires        | Requires
included from    | Expiry          | Removal         | Addition
auto_master)     |                 |                 |
-----------------|----------------------------------------------------
Indirect Map     | Requires        |                 |
(as listed in    | updated         | Requires        | Requires
auto_master)     | associated      | Removal         | Addition
                 | context         |                 |
-----------------|----------------------------------------------------
Indirect Entry   |                 |                 |
                 | Updated on      | Updated on      | Works
                 | Expiry          | Expiry          |
                 |                 |                 |
-----------------|----------------------------------------------------

Most of the changes that may occur get propagated to a running system the next
time a trigger is performed.  This means that any updates to maps for an already
mounted system becomes active after an expiry occurs.  Each triggering action
causes a new map lookup to occur.  These map lookups will cause the trigger to
receive any new modified entries.

5.5.1 Base Triggers
```````````````````

There are however certain conditions where a running system will not be
completely in sync with changing maps. These changes involve the modification of
the master map as well as any direct maps.  Entries in these maps will need to
be reflected on the running system by running a utility program that will
synchronize the map contents against the filesystem layout on a running machine.
This will involve adding or removing direct and indirect mountpoints as well as
refreshing the context associated with each indirect mountpoint.

A utility program will need to create a delta between the running system and the
master map and direct maps involved.  This information is available from the
proc filesystem (/proc/self/mounts).  The program will then be able to identify
any entries that would have come from the master or direct maps (by finding the
autofs filesystem that are mounted on unique paths prefixes) and add and remove
filesystems from the running namespace to bring the mount table in line with the
maps.

We must also consider the case where indirect entries from the master map and
direct entries from direct maps are installed and the maps subsequently change.
In order to handle updating the context associated with the indirect trigger
filesystem atomically, a remount is performed on the autofs filesystem with the
new context passed as mount options.  A simple approach would allow the remount
to happen on a pathname because the following assumptions hold:

o The filesystem is an indirect filesystem, which will never be covered by
  another filesystem.  If it is, then it is not updated.

o Because it is an indirect filesystem, remounting it will not cause any other
  filesystems to be incorrectly triggered (because the base directory of the
  filesystem is immediately available).

However, there remains the issue of a direct map entry that changes from one map
to another, or is removed from the direct map set.  Access to a direct map mount
is not available when it is covered by another filesystem, and accessing it
directly by pathname would in turn cause the direct mount to trigger and mount a
different filesystem.  Because of these problems, we need to define some method
that allows a direct mount to be accessible in a manner that would not trigger a
new mount, nor follow into any overlaying mounts.  The proposed solution is to
adopt a new interface that allows user space to navigate mountpoints on a given
system.  The goal is to use this navigation in conjunction with mount operations
(such as unmounting and re-mounting with new options) to reconfigure an
automount system and bring it up-to-date with all of the changing maps.  Such a
system for navigating mountpoints is described below in section 6.1.

5.5.2 Forcing Expiry to Occur
`````````````````````````````

Given a new interface that allows the navigation of mountpoints within a
namespace, we now have the ability to force expiry completely from userspace.
Forcing expiry to occur becomes as trivial as writing a simple utility that gets
the mountpoint file descriptor for the root filesystem and traverses across all
mountpoints.  Whenever this utility would see a mountpoint of type 'autofs', we
would walk amongst its immediate child mountpoints and performing a lazy
unmount on each child mountpoint [[[See umount(8), 'Lazy unmount'.]]].
Similarly, we can also remove all autofs filesystems from a given namespace by
lazy unmounting them as well.

5.6 The Userspace Utility
-------------------------

The userspace utility program to be used in administrating an automounted system
would preferably be called 'automount'.  It would fulfill the following
functions:

----------
automount install [mastermapname]
----------

This action would go through the master map (overridden by the mastermapname)
and would install triggers within the running namespace. [[[The master map (with
default value '/etc/auto_master') will need to be accessible from the calling
namespace, as would any other file map references.]]]

----------
automount refresh
----------

This action would go through the current namespace and update the base autofs
filesystems as described in the section titled "Base Triggers".  It would not
perform a lazy unmount of all the mounted filesystems.

----------
automount detachall
----------

This action would perform a lazy unmount on all the automounted filesystems.

----------
automount uninstall
----------

This action would remove all autofs triggers from the current namespace.

6 New Facilities
================

The following sub-sections describe in high-level detail the new facilities that
are needed in order to fully support a robust automount system.  The
descriptions that follow are in places deliberately over-simplified as several
of their design aspects are open for much discussion and debate.

It is hoped that the ideas below are well entertained.  It is the intent of the
author to further investigate details for each concept introduced and to propose
more elaborate requests for comments to the community.  Suggestions and comments
are most welcomed for the sections that follow.

6.1 Mountpoint file descriptors
-------------------------------

Mountpoint file descriptors are intended to describe mountpoints as first-class
citizens within the Linux environment.  By being able to describe mountpoints
using file descriptors, we allow programmers and system administrators to
continue using the tools they are used to, while at the same time enriching the
semantics allowed for mountpoints.   Some of the desired benefits of describing
mountpoints as file descriptors are as follows:

o We wish to be able to use common APIs such as read(2) and write(2) to
  communicate with a mountpoint.  This would be useful for communicating mount
  options specific to the filesystem, as well as with the VFS layer directly.

o We wish to be able to enumerate mountpoints somehow such that they may be
  modified without causing any path traversals to occur.  This has the added
  benefit that we may access mountpoint configurations for mountpoints that are
  covered by other filesystems.

These mountpoint descriptors will most likely be accessible via a new mount
system call, mount2.  Mount2 will multiplex the following actions:

o 'Mount' -- Take a mountpoint file descriptor and mount it on a directory,
  specified by a second file descriptor.

o 'Unmount' -- Given a mountpoint file descriptor, attempt to unmount the
  filesystem if it isn't busy.

o 'LazyUnmount' -- Given a mountpoint file descriptor, detach the filesystem
  from its namespace.  Perform a lazy cleanup of resources when the filesystem
  is no longer in use.

o 'ForcedUnmount' -- Given a mountpoint file descriptor, force an unmount to
  occur. Forcing unmounts is useful for filesystems such as hung NFS shares.

o 'Bind' -- Given a source directory file descriptor, create a new mountpoint
  file descriptor that can later be mounted on any given directory file
  descriptor using the Mount sub-command.

o 'GetMfd' -- Given a directory file descriptor, this command will return the
  directory's associated mountpoint file descriptor if the directory is the base
  of a mountpoint.

o 'GetDirFd' -- Given a mountpoint file descriptor, this command will return an
  open directory as a file descriptor.  This directory file descriptor will
  represent the base of the mountpoint as described by the mountpoint file
  descriptor.

o 'GetFirstChild/GetNextChild' -- Facilities will also be put in place to
  navigate the children mountpoint file descriptors of a given mountpoint file
  descriptor.

Reading from a mountpoint file descriptor will result in a summary of the
underlying filesystem, such as its type, the options it is using and its
absolute path within the current namespace.

When a mountpoint file descriptor is unmounted using either the Unmount or
LazyUnmount commands, the mountpoint it represents would remain valid.  Instead
of being directly associated within a namespace, the mountpoint is considered
'floating'.  A floating mountpoint can be re-associated with a namespace by
performing the Mount command.  One of the benefits of floating mountpoints is
that one can mount a filesystem without associating it with a namespace.  The
floating mountpoint can then be navigated by first acquiring the base directory
of the mountpoint using the GetDirFd command and then changing the current
working directory to it using fchdir(2).

Because of the way support for forcing unmounts is implemented, the
ForcedUnmount command will invalidate the given mountpoint file descriptor upon
successful completion.  Any attempts to access the base directory on a
forcefully unmounted filesystem will result in an error.

Together, these commands allow one to implement all of the mount operations with
which we are familiar.  For example, assuming a filesystem is mounted at /from,
a move operation can be achieved in the following steps:

----------
sourcefd = open("/from")
targetfd = open("/to")
mfd = mount2(GetMfd, sourcefd)
mount2(LazyUnmount, mfd)
mount2(Mount, mfd, targetfd)
----------

This example takes advantage of the fact that the underlying filesystem is still
valid when it is lazily unmounted.  We effectively disassociate the filesystem
with the current namespace (using LazyUnmount) and then re-associate it back
with the namespace by calling Mount.  Similarly, a recursive bind operation may
be done by recursively visiting each mountpoint and creating new floating
mountpoints using the Bind operation.  These new mountpoints may be stitched
together in userspace using the Mount operation along with directory file
descriptors obtained using the GetDirfd operation before finally associating the
new tree of mountpoints in the namespace using the Mount operation.

6.2 Native Expiry Support
-------------------------

David Howell from Red Hat has already implemented an expiry system that may
eventually make it into the mainline kernel.  His implementation is used to add
automount functionality to the AFS filesystem. Specifically, the AFS filesystem
implementation catches dangling symlinks whose symlink target is formatted to
contain all the information needed in order to mount an AFS cell.  His expiry
implementation extends the VFS API such that one can construct a mountpoint and
have it grafted into the current namespace's tree, while simultaneously linking
the mountpoint into an expiry run list.  This list is provided by the filesystem
implementation.  Linking into an expiry run list is handled by the VFS layer so
that the filesystem itself need not worry about the locking semantics involved.

The experimental AFS automount patch periodically calls a new VFS function,
mark_mounts_for_expiry.  This function will traverse a list of vfsmounts and
determine which are not in use and marks them appropriately.  These markings
state that the mountpoint has been inactive since that last
mark_mounts_for_expiry run.  If a later mark_mounts_for_expiry run comes across
a vfsmount that already has a marking and is still inactive, the mountpoint is
scheduled to be detached from the namespace.  These markings are cleared on all
calls to mntput, so any user which uses the mount between calls to
mark_mounts_for_expiry will either put the mountpoint in an active state, or
transition back to an inactive state but also clear the marking.  

The mark_mounts_for_expiry patch has a few limitations that will need to be
dealt with in order to completely integrate it with the VFS sub-system:

o The VFS layer currently delegates the run of mark_mounts_for_expiry to each
  individual filesystem.  The delegation forces duplicate code between
  filesystems that wish to support mountpoint expiry.  It also keeps a user from
  marking arbitrary mounts as being expirable.  Each filesystem type must hold
  onto a list_head for their own expiry list, of which the filesystem code is
  not allowed to traverse without acquiring VFS-owned locks.  These lists should
  be consolidated into the VFS layer directly.  The VFS layer would in turn
  periodically call mark_mounts_for_expiry.

o Using a boolean marking forces the expiry timeout to be the within one and two
  times the period between calls to mark_mounts_for_expiry.  This is fine,
  however it neglects the possibility of having per-mountpoint configurable
  timeouts.  Greater configurability and granularity can be achieved by having
  each vfsmount store a timeout period value.  Instead of using a boolean
  marking, a counter would be used that would count up to the timeout value
  before expiring.

In the mark_mounts_for_expiry patch, expiry is specified by a call to
do_add_mount.  This call now takes an additional argument, a list_head used to
enumerate all mountpoints that should expire.  By having the VFS layer handle
expiry natively, we would no longer need to have this API addition.  Instead,
the VFS layer would intercept the vfsexpire mount option and will update its
mount table and internal expiry run list to reflect these changes.

The proposed solution to this would see child mountpoints recursively associated
as being part of an expiry when the parent mountpoint is linked into the expiry
list. These associations will need to be cleared when any mountpoint
manipulation occurs on the child mountpoints.  They will be verified when
checking the active state of the parent mountpoint to determine whether a child
mountpoint is part of the parent mountpoint's expiry.  The consistency of these
associations will need to be managed by the VFS layer, which will simply remove
any associations when a mountpoint is modified (possibly via a bind or a
mountpoint move operation).  The exception to this occurs when a namespace is
cloned.  In this case, any markings will need to be updated to remain consistent
within the new namespace.

The following sequence of events and descriptions attempts to describe the
semantics described above by example:

----------
mount -o vfsexpire=10 /dev/hda1 /usr
----------

The mountpoint at /usr is set to expire after ten seconds.

----------
mount /dev/hda2 /usr/src
----------

The mountpoint at /usr cannot expire because it is held busy by the filesystem
mounted at /usr/src.

----------
mount -o remount,vfsexpire=20 /usr
----------

The mountpoint at /usr will now expire along with /usr/src after 20 seconds of
both mountpoints being inactive.  They will expire together atomically; e.g.
Under no circumstances will /usr/src be unmounted by an expiry run without also
removing the mountpoint at /usr.

----------
mount /dev/hda3 /usr/local
----------

The mountpoint at /usr cannot expire because it now has a new child mountpoint
that is not associated with the expiry.

----------
mount --move /usr/local /local
----------

The mountpoint at /usr can now expire along with /usr/src after 20 seconds
because it no longer has any child mountpoints that aren't associated with the
expiry.

----------
mount --move /usr/src /src
----------

The mountpoint that was at /usr/src will no longer expire.  Its association with
the expiry of /usr is lost.  The mountpoint at /usr will continue to expire
after 20 seconds of inactivity.

----------
mount --move /src /usr/src
----------

The mountpoint at /usr will not expire because it is held busy by the mountpoint
at /usr/src.

----------
mount -o remount,vfsexpire=0 /usr
----------

The mountpoint has its expiry disabled.

6.3 Cloning super_block
-----------------------

When a namespace is cloned, all the super_blocks for each of the currently
mounted filesystems are shared between both old and new namespaces.  Because
filesystem-specific mount options are stored at the super_block layer, this
creates the problem that changes to a mounted filesystem will affect all
occurrences of the associated super_block.  Sharing a super_block across
namespaces opens the door to cross namespace tampering and contradicts our goal
of keeping namespace configurations as isolated as possible.

The implications are less apparent with other types of filesystems.  For
example, given that an ext3 filesystem may be mounted in several places, it is a
fundamental requirement that there only exists one running configuration of the
ext3 filesystem at a given time, i.e. you wouldn't want to mount the filesystem
in one place with data=journal and in another location with data=ordered (two
contradicting options).  This running configuration is represented as a single
super_block, and the VFS layer ensures that only one super_block exists for any
block device-backed filesystem.  There is no such requirement for pseudo-device
filesystems (those which do not have block devices backing them).

In order to allow namespaces to be cloned without letting changes within one
namespace effect the other, we must develop a way for mount options to be kept
distinct across the clone.  Several alternatives are possible, some more
immediate than others:

1) Do nothing.  Allow cloned namespaces to share automount configuration within
   shared super_blocks.

Pros:
   o No special work needs to be done

Cons:
   o Can never be sure if a super_block is associated with a different
     namespace. This is a breach of isolation between namespaces.

   o It becomes impossible to clone a namespace and update the automount
     configuration without affecting other namespaces save unmounting all autofs
     filesystem occurrences and replacing them with new instances.

Unfortunately, this option is not very viable as it does not achieve our goal of
isolating automount configuration across cloned namespaces.  A more complex
method needs to be devised:

2) Allow a super_block to clone itself for the purposes of namespace cloning.
   This is preferably implemented as a new optional callback in
   super_operations.  When called, the callback will generate a new super_block
   instance with the same configuration as the input instance.  All directory
   entries (dentries) and inodes of the input super_block will also need to be
   duplicated so that filesystems mounted on top of the cloned filesystem may be
   stitched into the new namespace.  

Pros
   o Allows completely distinct automount triggers across cloned-namespaces.

   o Filesystems that are mounted within a cloned super_block will still be
     accessible within the new namespace.

Cons
   o Duplicating all dentries and inodes for a given super_block in a consistent
     manner is not feasible given the locking and coherency semantics involved.

Unfortunately, the second option does not lend itself to dealing with cloning
any sub-mountpoints easily.  Mountpoints are internally dependent on dentries,
which in turn are dependent on super_blocks.  In order to clone a complete
namespace while allowing the cloning of super_blocks as discussed in the second
option above, we would have to not only clone the super_block, but also recreate
any dentries and inodes associated with the super_block.  This is a very
difficult task to accomplish given the locking and coherency semantics involved.

This method is the only possible way conceived of guaranteeing the isolation of
automount trigger configurations across cloned namespaces.  The capability to
clone super_blocks is needed and further investigation as to how this can be
accomplished is required.

6.3.1 The --bind problem
````````````````````````

When a mountpoint is bound (using mount(8)'s --bind option), the system is left
in a state where two mountpoints exist that both use the same super_block.  This
leads to questionable behavior.  Should remount options on one mountpoint affect
the other?  These semantics are currently being worked out, especially with the
soon-to-be introduced per-mountpoint read-only mount option.

For the sake of simplicity, we may choose not to clone super_blocks for
mountpoints when the mount bind operation occurs.  However, this leads to
strange semantics when mixed with the cloning of namespaces.  For example,
consider an autofs filesystem located at /foo.  Super_blocks are shared on bind
operations, so,

----------
mount --bind /foo /bar
----------

would result in two mountpoints sharing the same super_block.  This allows any
configuration changes performed on /foo to also affect /bar.  

Assuming we naively clone super_blocks for autofs filesystems and a new
namespace is then created, each of the mountpoints mentioned would each get its
own super_block.  With independent super_blocks for each mountpoint, changes to
/foo would no longer affect the autofs mountpoint on /bar.  The semantic of
blindly cloning super_blocks for each mountpoint regardless of the number of
mountpoints using the super_block results in a derived namespace that does not
behave in exactly the same way as its parent namespace.

For these reasons, we extend the semantic description of cloning super_blocks
when cloning namespaces.  Instead of simply cloning the super_blocks that
require it as we traverse the namespace, we keep a list of the cloned
super_block pairs and re-use the newly cloned super_blocks for each mountpoint
duplicated that referred to the ancestor super_block.  This solves the --bind
problem by ensuring that any mountpoints that referred to a single super_block
will continue referring to a single super_block within the new namespace and
that the two namespaces will continue to behave alike.

7 Scalability
=============

Moving from the customary practice of using a daemon to using a usermode helper
to perform automounting brings up the question about scalability.  In this
design, a new process is created every time a trigger occurs.  This may lead to
many small processes being created that have a very short lifespan.  As such,
the problem of having a lot of process overhead becomes a possible issue.  The
memory footprint for running a lot of small processes also becomes an issue.

The argument against these claims is that the process overhead in Linux is
comparatively small, and is far outweighed by any network communication that
will be occurring as part of the automount process.  The time spent
communicating with networked nameservices (such as NIS or LDAP),latency spent in
communicating with networked nameservices (such as NIS or LDAP) as well as
network communication with a remote NFS server is many magnitudes larger than
the overhead introduced by spawning a new process.

There does, however, remain the possibility of a denial of service attack by a
user attempting to simultaneously trigger all of the automount triggers in a
large system.  Appropriate countermeasures to such activities can be put in
place, such as defining a maximum possible number of simultaneous automounts
triggered by a given user.  This kind of issue remains an area of research and
suggestions are welcome in dealing with this problem.

8 Conclusion
============

Linux automounting has always lacked full support for Solaris-style automount
maps.  This has long been the case due to technical limitations imposed by
design as well as to lack of interest and time by the primary developers.  It is
our goal to make Linux able to support Solaris-style automounter maps completely
and reliably.  In order to achieve this goal, we need to redesign the way
automounting works.  

Namespaces provide a new and exciting way of dealing with security concerns,
however, they make the problem space of automounting much more complex.  By
using a usermode helper in lieu of a daemon, we gain namespace accessibility.
Namespace-local automount configuration and mount operations are at our
disposal.  We also gain the benefit of no longer having to maintain state in
userspace, a task which is vulnerable to subtle changes in semantics
(""Simultaneous" mounts causing weird behaviour"
http://linux.kernel.org/pipermail/autofs/2003-November/000367.html).

We also take the opportunity to define the semantics of automounting across
cloned namespaces.  These semantics require the ability to clone super_blocks in
order to isolate automount configurations across namespaces.  This appears at
first to be an ugly hack, but in reality it makes sense considering the options
that are available.

Another automounting task that has always caused problems in the past is the
expiry of mountpoints.  By moving mountpoint expiry into the VFS layer where it
belongs, we eliminate any possible races.  Expiring mountpoints also becomes
available to anyone wishing to do so, whether it be part of the automount
process or not.

Related to expiry is the ability for userspace to reliably navigate mountpoints
so that covered mountpoints may be accessed and remounted.  We've outlined a
possible solution that will accommodate this need.   The semantics involved are
not yet completely defined and require insight from the primary consumers of
such an interface.

It is hoped that the design outlined in this document is thorough enough to
spark discussion as to how automounting should be implemented in the future.  By
implementing the core kernel facilities listed above, it is felt that a complete
automount solution may be developed.  This implementation would be completely
capable of handling Solaris-style automount maps and would continue to work
reliably in a multi-namespace environment.

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 19:55 [RFC] Towards a Modern Autofs Mike Waychison
@ 2004-01-06 21:01 ` H. Peter Anvin
  2004-01-06 21:44   ` Mike Waychison
  2004-01-06 21:50   ` Tim Hockin
  2004-01-07 21:14 ` [autofs] [RFC] Towards a Modern Autofs Jim Carter
  1 sibling, 2 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-06 21:01 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Kernel Mailing List, autofs mailing list

Mike Waychison wrote:
> 
> The attached paper was written an attempt to design an automount system
> with complete Solaris-style autofs functionality.  This includes
> browsing, direct maps and lazy mounting of multimounts.  The paper can
> also be found online at:
>                                                                               

Sorry to sound like sour grapes, but this is a requirements document,
not a proposed implementation.  Furthermore, as I have expressed before,
I think your claim that expiry should be done in the VFS to be incorrect.

I think you're on the completely wrong track, because you're starting
with the wrong problem.  The implementation needs to start with the VFS
implementation and derive from that.

Finally, throwing out the daemon is a huge step backwards.  Most of the
problems with autofs v3 (and to a lesser extent v4) are due to the
*lack* of state in userspace (the current daemon is mostly stateless);
putting additional state in userspace would be a benefit in my experience.

Pardon me for sounding harsh, but I'm seriously sick of the oft-repeated
idiocy that effectively boils down to "the daemon can die and would lose
its state, so let's put it all in the kernel."  A dead daemon is a
painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN
condition.  By cramming it into the kernel, you're in fact making the
system less stable, not more, because the kernel being tainted with
faulty code is a total system malfunction; a crashed userspace daemon is
"merely" a messy cleanup.  In practice, the autofs daemon does not die
unless a careless system administrator kills it.  It is a non-problem.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 21:01 ` [autofs] " H. Peter Anvin
@ 2004-01-06 21:44   ` Mike Waychison
  2004-01-06 21:50   ` Tim Hockin
  1 sibling, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-06 21:44 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Kernel Mailing List, autofs mailing list

[-- Attachment #1: Type: text/plain, Size: 4328 bytes --]

Hi Peter,

H. Peter Anvin wrote:

>Mike Waychison wrote:
>  
>
>>The attached paper was written an attempt to design an automount system
>>with complete Solaris-style autofs functionality.  This includes
>>browsing, direct maps and lazy mounting of multimounts.  The paper can
>>also be found online at:
>>                                                                              
>>    
>>
>
>Sorry to sound like sour grapes, but this is a requirements document,
>not a proposed implementation.  
>
You surely read the whole thing, didn't you?

>Furthermore, as I have expressed before,
>I think your claim that expiry should be done in the VFS to be incorrect.
>  
>
Why?  You haven't convinced me that it should be elsewhere. 

>I think you're on the completely wrong track, because you're starting
>with the wrong problem.  The implementation needs to start with the VFS
>implementation and derive from that.
>  
>

In which sense?   Re-design it?

>Finally, throwing out the daemon is a huge step backwards.  Most of the
>problems with autofs v3 (and to a lesser extent v4) are due to the
>*lack* of state in userspace (the current daemon is mostly stateless);
>putting additional state in userspace would be a benefit in my experience.
>  
>
Bull.   Having a single process for each autofs filesystem is state in 
itself.   Eg:

- setup an auto_home map on /home
- mkdir /home2
- mount --bind /home /home2

The state that you manage with your automount processes themselves is 
now inconsistent with what the kernel has.  

>Pardon me for sounding harsh, but I'm seriously sick of the oft-repeated
>idiocy that effectively boils down to "the daemon can die and would lose
>its state, so let's put it all in the kernel."  A dead daemon is a
>painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN
>condition.  
>

You've completely discarded the fact that a daemon breaks namespaces in 
your argument.

You somehow mistook the arguments I've presented and assume that we get 
rid of the daemon solely so that we eliminate state in userspace.  The 
point of getting rid of the daemon is that tying a single process to 
each mountpoint:

- breaks on mount --bind operations
- breaks on namespace clones

These _can_ be circumvented by using a single process daemon which 
catches _ALL_ automount requests from the kernel, however:

- There are NO facilities for changes namespaces, and there doesn't 
appear to be any plans to implement them.   This doesn't only affect the 
mount operations themselves, but also reading the /etc/auto_* maps in 
the different namespace.
- This limits a running system to _exactly_ one policy system for 
handling automount points.  Differing namespaces may have different 
automounter maps and even automounters themselves if they want to under 
the scheme I've outlined.

Also, the current implementation uses pathnames to do everything.  This 
breaks:

- mountpount binds in another way
- mountpoint moves

My goal here is to fix all of the mountpoint logic in automounting that 
relies on there being a single namespace. 

Now, going back to your argument of reliability and reconnectivity, yes, 
I agree that the daemon dying is something that _SHOULD NOT HAPPEN_.  
But it does in practice.  Getting rid of the daemon the way I've 
outlined simply eliminates that from ever happening as an added bonus.

>By cramming it into the kernel, you're in fact making the
>system less stable, not more, because the kernel being tainted with
>faulty code is a total system malfunction; a crashed userspace daemon is
>"merely" a messy cleanup.  In practice, the autofs daemon does not die
>unless a careless system administrator kills it.  It is a non-problem.
>  
>
"Faulty code"?    I haven't even presented you with code yet.  Nice.

Somehow, you got the impression that the system I've proposed would be 
more complex than what we have today, when in fact I believe it's a lot 
simpler.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 21:01 ` [autofs] " H. Peter Anvin
  2004-01-06 21:44   ` Mike Waychison
@ 2004-01-06 21:50   ` Tim Hockin
  2004-01-06 22:06     ` H. Peter Anvin
  1 sibling, 1 reply; 85+ messages in thread
From: Tim Hockin @ 2004-01-06 21:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

On Tue, Jan 06, 2004 at 01:01:46PM -0800, H. Peter Anvin wrote:
> Finally, throwing out the daemon is a huge step backwards.  Most of the
> problems with autofs v3 (and to a lesser extent v4) are due to the
> *lack* of state in userspace (the current daemon is mostly stateless);
> putting additional state in userspace would be a benefit in my experience.

Can you maybe share some details?  I think this deign moves MORE state to
userspace (expiry aside).  The "state" in kernel is really mostly sent back
to userspace.  No more passing pipes into the kernel (state) or tracking the
pgid of the daemon (state).

> Pardon me for sounding harsh, but I'm seriously sick of the oft-repeated
> idiocy that effectively boils down to "the daemon can die and would lose
> its state, so let's put it all in the kernel."  A dead daemon is a
> painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN

But it *does* happen.

> condition.  By cramming it into the kernel, you're in fact making the
> system less stable, not more, because the kernel being tainted with
> faulty code is a total system malfunction; a crashed userspace daemon is

I don't think this design crams anything into the kernel.  It doesn't put a
whole lot more into the kernel than is currently in there (expiry and new
mount stuff, aside).  All the work still happens in userland.

The daemon as it stands does NOT handle namespaces, does NOT handle expiry
well, and is a pretty sad copy of an old design.

> "merely" a messy cleanup.  In practice, the autofs daemon does not die
> unless a careless system administrator kills it.  It is a non-problem.

I have some customers I'd love to send to you, if you really think that's
true.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 21:50   ` Tim Hockin
@ 2004-01-06 22:06     ` H. Peter Anvin
  2004-01-06 22:17       ` Tim Hockin
                         ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-06 22:06 UTC (permalink / raw)
  To: thockin; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

Tim Hockin wrote:
> On Tue, Jan 06, 2004 at 01:01:46PM -0800, H. Peter Anvin wrote:
> 
>>Finally, throwing out the daemon is a huge step backwards.  Most of the
>>problems with autofs v3 (and to a lesser extent v4) are due to the
>>*lack* of state in userspace (the current daemon is mostly stateless);
>>putting additional state in userspace would be a benefit in my experience.
> 
> Can you maybe share some details?  I think this deign moves MORE state to
> userspace (expiry aside).  The "state" in kernel is really mostly sent back
> to userspace.  No more passing pipes into the kernel (state) or tracking the
> pgid of the daemon (state).
> 

If you want to fire up a new daemon, all that state that was supposed to
be kept in userspace has to be reconstructed.  That means the kernel has
to have all that information; this would include stuff like what kind of
umount policy you want for each key entry (the current daemon doesn't do
that because it doesn't have the proper state.)

>>Pardon me for sounding harsh, but I'm seriously sick of the oft-repeated
>>idiocy that effectively boils down to "the daemon can die and would lose
>>its state, so let's put it all in the kernel."  A dead daemon is a
>>painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN
>  
> But it *does* happen.

I don't believe it happens on any significant degree in cases where you
wouldn't have a kernel panic if you put the stuff in the kernel, *or* a
careless system admininistrator killed it.  In fact, I suspect it's
virtually all the latter.

>>condition.  By cramming it into the kernel, you're in fact making the
>>system less stable, not more, because the kernel being tainted with
>>faulty code is a total system malfunction; a crashed userspace daemon is
> 
> I don't think this design crams anything into the kernel.  It doesn't put a
> whole lot more into the kernel than is currently in there (expiry and new
> mount stuff, aside).  All the work still happens in userland.
> 
> The daemon as it stands does NOT handle namespaces, does NOT handle expiry
> well, and is a pretty sad copy of an old design.

First of all, I'll be blunt: namespaces currently provide zero benefit
in Linux, and virtually noone uses them.  I have discussed this with
Linus in the past, and neither one of us see namespaces as being worth
jumping though hoops to support.  That being said, it's doable by either
having different daemons for different namespaces (useful for policy) or
by having them gain access to the requisite namespaces.

Second, what you say about the state of the daemon is obviously true.
autofs v3 was developed on Linux 2.0 which had a vastly different VFS,
and it has by and large bitrotted.  Furthermore, at that point Linux
didn't support threading in any useful way, which meant that keeping the
appropriate state the in daemon was too painful -- hence the largely
stateless design with its associated problems.

>>"merely" a messy cleanup.  In practice, the autofs daemon does not die
>>unless a careless system administrator kills it.  It is a non-problem.
> 
> I have some customers I'd love to send to you, if you really think that's
> true.

As root, I can kill the system too by doing "cat /dev/zero > /dev/mem".
 If you do stupid shit as root you're dead.  What's the news?

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:06     ` H. Peter Anvin
@ 2004-01-06 22:17       ` Tim Hockin
       [not found]       ` <20040106221502.GA7398@hockin.org>
  2004-01-06 22:28       ` name spaces good (was: [autofs] [RFC] Towards a Modern Autofs) Dax Kelson
  2 siblings, 0 replies; 85+ messages in thread
From: Tim Hockin @ 2004-01-06 22:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: autofs mailing list, Kernel Mailing List

(sorry for the resend, forgot to CC the lists)

On Tue, Jan 06, 2004 at 02:06:34PM -0800, H. Peter Anvin wrote:
> > Can you maybe share some details?  I think this deign moves MORE state to
> > userspace (expiry aside).  The "state" in kernel is really mostly sent back
> > to userspace.  No more passing pipes into the kernel (state) or tracking the
> > pgid of the daemon (state).
> 
> If you want to fire up a new daemon, all that state that was supposed to
> be kept in userspace has to be reconstructed.  That means the kernel has
> to have all that information; this would include stuff like what kind of
> umount policy you want for each key entry (the current daemon doesn't do
> that because it doesn't have the proper state.)

I'm not really sure what you're saying., here.  I'm sorry.  Not trying to be
thick, just not understanding.

What umount policy?  What state is supposed to be kept in userspace that isn't?

> > The daemon as it stands does NOT handle namespaces, does NOT handle expiry
> > well, and is a pretty sad copy of an old design.
> 
> First of all, I'll be blunt: namespaces currently provide zero benefit
> in Linux, and virtually noone uses them.  I have discussed this with
> Linus in the past, and neither one of us see namespaces as being worth

Let's get rid of them, then.  Make life that much easier.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
       [not found]       ` <20040106221502.GA7398@hockin.org>
@ 2004-01-06 22:20         ` H. Peter Anvin
  2004-01-07 16:19           ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-06 22:20 UTC (permalink / raw)
  To: Tim Hockin; +Cc: autofs, linux-kernel

Tim Hockin wrote:
> On Tue, Jan 06, 2004 at 02:06:34PM -0800, H. Peter Anvin wrote:
> 
>>>Can you maybe share some details?  I think this deign moves MORE state to
>>>userspace (expiry aside).  The "state" in kernel is really mostly sent back
>>>to userspace.  No more passing pipes into the kernel (state) or tracking the
>>>pgid of the daemon (state).
>>
>>If you want to fire up a new daemon, all that state that was supposed to
>>be kept in userspace has to be reconstructed.  That means the kernel has
>>to have all that information; this would include stuff like what kind of
>>umount policy you want for each key entry (the current daemon doesn't do
>>that because it doesn't have the proper state.)
> 
> I'm not really sure what you're saying., here.  I'm sorry.  Not trying to be
> thick, just not understanding.
> 
> What umount policy?  What state is supposed to be kept in userspace that isn't?
> 

The current autofs daemon, for example, does not handle different
procedures on umount.  This is particularly important when you have
mount trees.

> 
>>>The daemon as it stands does NOT handle namespaces, does NOT handle expiry
>>>well, and is a pretty sad copy of an old design.
>>
>>First of all, I'll be blunt: namespaces currently provide zero benefit
>>in Linux, and virtually noone uses them.  I have discussed this with
>>Linus in the past, and neither one of us see namespaces as being worth
> 
> Let's get rid of them, then.  Make life that much easier.
> 

That's what the Linux community is doing, de facto.  The Linux userspace
simply is not set up to handle namespaces, and the autofs daemon is no
exception.  Consider such a simple thing as /etc/mtab - /proc/mounts
which is necessary for most of the mount(8) functionality to work.  It
doesn't support namespaces and really cannot be made to.

namespace support in Linux is at the best a far-off future goal.  It is
one thing to put in infrastructure, especially since it has some other
nice benefits; it's another thing to revamp all of userspace to use it;
it's nowhere close and autofs is no exception.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: name spaces good (was: [autofs] [RFC] Towards a Modern Autofs)
  2004-01-06 22:06     ` H. Peter Anvin
  2004-01-06 22:17       ` Tim Hockin
       [not found]       ` <20040106221502.GA7398@hockin.org>
@ 2004-01-06 22:28       ` Dax Kelson
  2004-01-06 22:48         ` name spaces good H. Peter Anvin
  2 siblings, 1 reply; 85+ messages in thread
From: Dax Kelson @ 2004-01-06 22:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: thockin, Mike Waychison, autofs mailing list, Kernel Mailing List

On Tue, 2004-01-06 at 15:06, H. Peter Anvin wrote:
> First of all, I'll be blunt: namespaces currently provide zero benefit
> in Linux, and virtually noone uses them.

I strongly disagree.

I find them very useful, and there are lots of problems that are not
cleanly solved any other way. In particular they are very useful in
security hardening, compartmentalization scenarios.

The abysmal state of Linux autofs is something that needs fixing
yesterday.

Dax Kelson


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: name spaces good
  2004-01-06 22:28       ` name spaces good (was: [autofs] [RFC] Towards a Modern Autofs) Dax Kelson
@ 2004-01-06 22:48         ` H. Peter Anvin
  0 siblings, 0 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-06 22:48 UTC (permalink / raw)
  To: Dax Kelson
  Cc: thockin, Mike Waychison, autofs mailing list, Kernel Mailing List

Dax Kelson wrote:
> On Tue, 2004-01-06 at 15:06, H. Peter Anvin wrote:
> 
>>First of all, I'll be blunt: namespaces currently provide zero benefit
>>in Linux, and virtually noone uses them.
> 
> 
> I strongly disagree.
> 
> I find them very useful, and there are lots of problems that are not
> cleanly solved any other way. In particular they are very useful in
> security hardening, compartmentalization scenarios.
> 

Excellent... if so it would be useful to have a discussion about the
proper semantics for these scenarios.  So far the consensus opinion
among most of the VFS people seems to have been "when you clone a
namespace you get an unanimated namespace"; it would be useful ito know
if that applies to your scenario, assuming it matters, and if so why/why
not.

Al Viro has been working on a key piece of infrastructure for doing
autofs right called mount traps.  This is the main reason -- even more
so than the lack of time on my part -- that not much work has been done
on the new version of autofs.  mount traps, combined with
"pseudo-symlinks" (non-S_IFLNK nodes which have follow_link methods), do
most of the tasks that have been proven necessary in the kernel.

The consensus I have seen seems to be that namespaces is mostly used, as
you said, for compartmentalizing and security, you pretty much have two
scenarios as far as I can see it:

a) You're running autofs "outside" the compartmentalization, in a global
namespace.
b) You're running autofs "inside" the compartmentalization, then you
don't want access to anything on the outside.  You thus run the autofs
"inside" and can't access anything else.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:20         ` H. Peter Anvin
@ 2004-01-07 16:19           ` Mike Waychison
  2004-01-07 17:55             ` H. Peter Anvin
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-07 16:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Tim Hockin, autofs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2431 bytes --]

H. Peter Anvin wrote:
> Tim Hockin wrote:
> 
>>On Tue, Jan 06, 2004 at 02:06:34PM -0800, H. Peter Anvin wrote:
 >>
>>>
>>>First of all, I'll be blunt: namespaces currently provide zero benefit
>>>in Linux, and virtually noone uses them.  I have discussed this with
>>>Linus in the past, and neither one of us see namespaces as being worth
>>
>>Let's get rid of them, then.  Make life that much easier.
>>
> 
> 
> That's what the Linux community is doing, de facto.  The Linux userspace
> simply is not set up to handle namespaces, and the autofs daemon is no
> exception.  Consider such a simple thing as /etc/mtab - /proc/mounts
> which is necessary for most of the mount(8) functionality to work.  It
> doesn't support namespaces and really cannot be made to.
> 
> namespace support in Linux is at the best a far-off future goal.  It is
> one thing to put in infrastructure, especially since it has some other
> nice benefits; it's another thing to revamp all of userspace to use it;
> it's nowhere close and autofs is no exception.
> 

This is clearly not 'all of userspace'.  Autofs is an exception.  As is 
/etc/mtab.  The way I see it, automounting is a 'mount facility', as are 
namespaces.  The two should be made to work together.  Yes, mount(8) 
should probably be fixed one way or another as well due to /etc/mtab 
breakage. Why? Because it too is a mount facility.

There are a couple problems inherent with namespaces.  Most of these are 
mount facilities that are broken such as mentioned above.  They *should* 
be fixed to work nicely.

Other parts of userspace get confused with namespaces, eg: cron and atd. 
  These programs clearly need infrastructure added that somehow allows 
for arbitrary namespace joining/saving.  If you have suggestions for how 
we can solve this issue, please do let me know.  I'm stumped :\  I'd be 
more than happy to discuss this with you.

One not-so-far fetched approach would be to associate cron/at jobs with 
automount configurations so that a namespace can be re-constructed at 
runtime.


-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 16:19           ` Mike Waychison
@ 2004-01-07 17:55             ` H. Peter Anvin
  2004-01-07 21:13               ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-07 17:55 UTC (permalink / raw)
  To: linux-kernel

Mike Waychison wrote:

> This is clearly not 'all of userspace'.  Autofs is an exception.  As is 
> /etc/mtab.  The way I see it, automounting is a 'mount facility', as are 
> namespaces.  The two should be made to work together.  Yes, mount(8) 
> should probably be fixed one way or another as well due to /etc/mtab 
> breakage. Why? Because it too is a mount facility.
> 
> There are a couple problems inherent with namespaces.  Most of these are 
> mount facilities that are broken such as mentioned above.  They *should* 
> be fixed to work nicely.

For that one needs to know how the namespaces are used, not just how 
they are implemented.  There was a long discussion on this on #kernel 
yesterday, by the way.

> Other parts of userspace get confused with namespaces, eg: cron and atd. 
>  These programs clearly need infrastructure added that somehow allows 
> for arbitrary namespace joining/saving.  If you have suggestions for how 
> we can solve this issue, please do let me know.  I'm stumped :\  I'd be 
> more than happy to discuss this with you.

Do they?  In order for that to be a "clearly", I believe one needs to 
understand how namespaces are used in practice.  It may not be desirable 
or even possible; this starts getting into a policy decision.

> One not-so-far fetched approach would be to associate cron/at jobs with 
> automount configurations so that a namespace can be re-constructed at 
> runtime.

I am not entirely sure what you mean with this, but it sounds incredibly 
dangerous to me.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 17:55             ` H. Peter Anvin
@ 2004-01-07 21:13               ` Mike Waychison
  0 siblings, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-07 21:13 UTC (permalink / raw)
  To: Kernel Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 3434 bytes --]

H. Peter Anvin wrote:

> Mike Waychison wrote:
>
>> This is clearly not 'all of userspace'.  Autofs is an exception.  As 
>> is /etc/mtab.  The way I see it, automounting is a 'mount facility', 
>> as are namespaces.  The two should be made to work together.  Yes, 
>> mount(8) should probably be fixed one way or another as well due to 
>> /etc/mtab breakage. Why? Because it too is a mount facility.
>>
>> There are a couple problems inherent with namespaces.  Most of these 
>> are mount facilities that are broken such as mentioned above.  They 
>> *should* be fixed to work nicely.
>
>
> For that one needs to know how the namespaces are used, not just how 
> they are implemented.  There was a long discussion on this on #kernel 
> yesterday, by the way.
>
The one between you and viro?   I read the logs last night.  I didn't
see much discussion at all.

>> Other parts of userspace get confused with namespaces, eg: cron and 
>> atd.  These programs clearly need infrastructure added that somehow 
>> allows for arbitrary namespace joining/saving.  If you have 
>> suggestions for how we can solve this issue, please do let me know.  
>> I'm stumped :\  I'd be more than happy to discuss this with you.
>
>
> Do they?  In order for that to be a "clearly", I believe one needs to 
> understand how namespaces are used in practice.  It may not be 
> desirable or even possible; this starts getting into a policy decision.
>
Yes.  It is somewhat policy, but as it currently stands, the kernel not
being able to give userspace the option is itself quite restricting.

>> One not-so-far fetched approach would be to associate cron/at jobs 
>> with automount configurations so that a namespace can be 
>> re-constructed at runtime.
>
>
> I am not entirely sure what you mean with this, but it sounds 
> incredibly dangerous to me.
>
>
Basically, consider Plan 9 namespaces (I admit I'm no expert on Plan 9
:).  I'm told it allows one to somehow share namespaces with other users
/ processes as long as they exists.   Linux has a per-process namespace
model that (currently) doesn't allow this to happen.  Once all processes
that share a namespace die, the entire namespace ceases to exist.  In
order to 'reconstruct' a namespace, a service could somehow say: "Use
this automount configuration" (manually by creating a fresh namespace,
removing all non-essential mounts, and installing new mount-traps within
this namespace).

This of course has corner cases like the chicken and egg problem where
the configuration files have to be available in that namespace already,
but with some thought we could figure that out.  (Something similar to
the way 2.6 uses rootfs could be used to strap into this fresh
namespace, entirely from userspace).

This would in effect allow me to say "Take a snapshot of my namespace"
(this would probably require more help from individual filesystem
implementations in order to get all the mount information used) which
would dump an automount map that could later be used to lazily recreate it.

Just a thought.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



[-- Attachment #1.2: file:///tmp/nsmail.pgp --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 19:55 [RFC] Towards a Modern Autofs Mike Waychison
  2004-01-06 21:01 ` [autofs] " H. Peter Anvin
@ 2004-01-07 21:14 ` Jim Carter
  2004-01-07 22:55   ` Mike Waychison
  2004-01-08  0:48   ` Ian Kent
  1 sibling, 2 replies; 85+ messages in thread
From: Jim Carter @ 2004-01-07 21:14 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Kernel Mailing List, autofs mailing list

On Tue, 6 Jan 2004, Mike Waychison wrote:
> We've spent some time over the past couple months researching how Linux
> autofs can be brought to a level that is comparable to that found on
> other major Unix systems out there.
>
> ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.txt
> ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.pdf

Mounting on a file descriptor is nice but it takes work for all filesystems
to perform it.  Not to discourage work toward this goal, I suggest not
entangling autofs with that work.  Instead, if we're doing the userspace
helper thing, the kernel knows the process group of the helper it started.
Do "oz" mode for that PG, and revoke the privilege when it exits.  Do the
same thing again for unmounting.

If the userspace helper is invoked in the triggering process' namespace,
any full paths given to it will be resolved in that namespace.  This
bypasses one of the main justifications for having autofs work only with FD
mounts.

If a sysop mounts autofs filesystems (installs triggers), that will and
should happen in the namespace inhabited by him, not in any cloned
namespaces.  Without needing to wait for someone to work through kernel
politics and make FD mounts happen.

> The exception to this rule is when the map entry for /home contains the
> option 'browse':

Solaris 2.6 and above has the -browse option on indirect maps, so the set
of subdirs potentially mountable can be seen, without mounting them. I
don't see where this is implemented in Linux, nor do I see how it's done,
documented in Solaris NFS man pages, but I didn't put a lot of time into
the search.  I *hope* rpc.mountd has an opcode to enumerate every
filesystem it's willing to export.  Does it "stat" and return the stat
data?  That would be important for "ls".

> In order to maintain some form of coherency between changing maps, these
> dummy directory entries will remain in place within the dcache so that
> the kernel doesn't need to query the usermode helper as often.  These
> entries will periodically timeout and will be unhashed from the dcache.

Browsetimeout -- Each autofs instance necessarily has an in-core list of
its subdirectories.  If the caller stats any of these and that one (or
alternatively, any of the known subdirs) is not in the dcache, the module
needs to run the helper again, refreshing all dcache entries.  But you
still need a timeout because the mode etc. might change on the server, but
it's rare.  Let's avoid committing a lot of coding effort and CPU time to
supporting events tht might happen once per year.

> Executing the usermode helper within the namespace of the triggering
> application does have a problem when browsing is used.  We are caching
> map keys in kernelspace and can run into coherency problems when an
> autofs super_block is associated with multiple namespaces which have
> differing automount maps in /etc. This kind of situation may occur if a
> namespace is cloned and a new /etc directory with a different auto_home
> map is mounted.

The uncloned superblock problem is discussed later in the paper.  It looks
to me like the VFS layer ought to be responsible for cloning superblocks.
Not to discourage work towards that goal, but I suggest not delaying autofs
until it happens.  The result is that some users will see mount points
(mounted or potentially mountable) that within-namespace policy says should
be invisible.  That's not too bad, since we rely on UNIX file permissions
or ACLs for security, not visibility in the automount map.  If an indirect
map entry was formerly absent but now present, presumably the userspace
helper will consult the then-prevailing automount map and find it
successfully.

> Sect. 5.2 Direct Maps

> 2) The map key for the direct mount entry is now passed as a new mount
> option called 'mapkey'.

I don't quite see the need for the mapkey mount option.  It seems to me
that the name of the mount point is always equal to the map key.  In my
model, mounting on open FDs isn't going to be implementable, and so the
userspace helper has to know the full path name of the mount point, anyway.

> 5.3 Multimounts and Offsets

> /usr/src                hosta:/export/src	\
>             /linux      hostb:/export/linuxsrc

Suppose someone accesses /usr/src/linux.  Is it not true that both the
original process and mount(8) have to first access /usr/src, triggering
automounting of hostA:/export/src, and only when the stat info and readdir
from that step have come through at least twice, can they go on to monkey
with /usr/src/linux, triggering mounting of hostB:/export/otherlinux? Thus
I don't see the need for multimounts.  The conceptual idea of mounting both
dirs "as a unit" is maybe attractive when not looked at too closely, but it
seems to me that by just punting, you get infinitesimally slower service to
the user and a significant section of logic avoided in the code.

The kernel would need to know to install an autofs structure (trigger) on
/usr/src/linux even though /usr/src was represented by only an autofs
structure, not actually mounted yet, just like we see in procfs.  I doubt
that's a showstopper, although you'd have to write the kernel code
carefully.  The example of userD/server{1,2} indicates that you intend for
the autofs structure, with nothing mounted on it, ought to be a really
existing and traversable directory on whose subdirs other autofs FS's can
be mounted.  Good.

But in sec. 5.3.2 I see you making filesystem dirs in /tmp which seem to
substitute for the synthetic autofs directories.  Bad, if I've understood
the example.  Comments suggest that you need the /tmp directory to avoid
setting off the autofs trigger.  Better: if a synthetic autofs directory
has no corresponding entry in an automount map, you don't mount anything on
it.  But if it *does* have a map entry, you need to mount it in order to
stat it (the server's instance) to determine if the user has permission to
traverse it, before even considering whether to mount the subdir. Remember
that in my model I'm leaving aside FD mounts, so traversing containing
directories by name is a valid concept.

What is the significance of "lazy mount"?  I don't see the word "lazy" in
any of the Solaris NFS or automount docs I looked at.  In sec. 5.3.1
you say it means "mount only when accessed".  Thus the whole idea of autofs
is to "lazy mount" vast numbers of filesystems.  Right?

> 5.4 Expiry

> Handling expiry of mounts is difficult to get right.  Several different
> aspects need to be considered before being able to properly perform
> expiry.

The current daemon (with latest patches) seems to get it right most of the
time.

> The autofs filesystem really should know as little about VFS internal
> structures as possible.  In this case, the filesystem code is charged
> with walking across mountpoints and manually counting reference counts.
> This task is much better left to the VFS internals.

Someone with a more thorough understanding of the code should comment on
this, but I didn't notice the module rooting through VFS data; it looks
like it relies on use counts maintained by the VFS layer, similar to what
mount(2) relies on to declare a mount to be busy.

> Unmounting the filesystem from userspace is racy, as any program can
> begin using a mount between the time the daemon has received a path to
> expire and the time it actually makes the umount(2) system call.

So the helper's umount() will fail.  OK, it failed.  The kernel module
should not recognize the mounted dir as being gone, until the module itself
has seen that it's gone.  This policy also helps in cases where the sysop
manually unmounts an automounted directory for repair purposes.

A common problem is stale NFS filehandles, and in this case we'd like the
userspace helper to be aggressive in using "umount -f" or other advanced
techniques.  The freedom to fail is important here.

> These points suggest that the kernel's VFS sub-system should be charged
> with handling expiry.

The point is well taken that a VFS layer expiry mechanism would be welcomed
by many filesystems.  But autofs has to work with the kernel as it lies
now.

> As described above, we may be installing multiple mounts upon each
> trigger. This tree of mounts will need to expire together as an atomic
> unit.  We will need to register this block of mounts to some expiry
> system.  This will be done by performing a remount on the base
> automounted filesystem after any nested offset mounts have been installed

A filesystem is "in use" if anything is mounted on its subdirs.  That
precludes premature auto-unmounting of a containing directory, in the case
of a multi-mount or jimc's recommended non-implementation thereof.  I don't
see that a multi-mount stack needs to expire as a unit -- just let the
components expire normally, leaf to root.  It doesn't bother jimc that some
members are mounted and some aren't; by the principle of lazy mounting,
that's what we're trying to accomplish.

> 5.5 Handling Changing Maps

The whole issue of changed maps is closely related to the case of cloning a
namespace and discovering that an autofs map is non-identical in the new
namespace.

As pointed out in 5.5.1, when the maps change a userspace program will have
to detect some added or deleted items.  This program will have to run
separately in the context of every namespace.  Thus, we should probably
burden the sysop with remembering to run it if he wants his new/deleted
maps to be recognized. But we'll have to use some ioctl to stimulate the
kernel module to enumerate all known namespaces and run the updater for
each one.

> 5.5.2 Forcing Expiry to Occur

When I do this the reason is generally that I'm going to take down a
server.  Then I don't want "lazy unmounts"; I want immediate unmounts that
will be fatal to the processes using the filesystem.  When the server is
already dead, then I may do a lazy unmount with the expectation that the
structure will never be cleaned up until the client is rebooted, but at
least the client can continue to run.

> 7 Scalability

Necessarily mount(8) is used to mount filesystems, since only it has all
the spaghetti code and pseudo-object-oriented executables to deal with the
various filesystem types.  Hence at least one process (and most likely a
parent shell script) is expected per mount.  We need to be frugal in
writing the userspace helper (and this is a reason to roll our own, not use
hotplug), but the idea of using a userspace helper to mount, rather than a
persistent daemon, doesn't sound scary to me.

For me the biggest attraction of a Solaris-style automount upgrade is
the ability to create wildcard maps with substitutible variables, e.g.
rather than having a kludgey programmatic map that creates little map
files on the fly looking like "* tupelo:/&", a host map can be implemented
via "* $SERVER:/&".  Of course Solaris has a native "-host" map type,
which is also good.


James F. Carter          Voice 310 825 2897    FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA  90095-1555
Email: jimc@math.ucla.edu    http://www.math.ucla.edu/~jimc (q.v. for PGP key)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:14 ` [autofs] [RFC] Towards a Modern Autofs Jim Carter
@ 2004-01-07 22:55   ` Mike Waychison
  2004-01-08 12:00     ` Ian Kent
                       ` (3 more replies)
  2004-01-08  0:48   ` Ian Kent
  1 sibling, 4 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-07 22:55 UTC (permalink / raw)
  To: Jim Carter; +Cc: Kernel Mailing List, autofs mailing list

[-- Attachment #1: Type: text/plain, Size: 21233 bytes --]

Hi Jim

Thanks for taking the time to read the document thoroughly and for Great 
feedback!  

Please see responses inlined below.

Jim Carter wrote:

>On Tue, 6 Jan 2004, Mike Waychison wrote:
>  
>
>>We've spent some time over the past couple months researching how Linux
>>autofs can be brought to a level that is comparable to that found on
>>other major Unix systems out there.
>>
>>ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.txt
>>ftp://ftp-eng.cobalt.com/pub/whitepapers/autofs/towards_a_modern_autofs.pdf
>>    
>>
>
>Mounting on a file descriptor is nice but it takes work for all filesystems
>to perform it.  Not to discourage work toward this goal, I suggest not
>entangling autofs with that work.  Instead, if we're doing the userspace
>helper thing, the kernel knows the process group of the helper it started.
>Do "oz" mode for that PG, and revoke the privilege when it exits.  Do the
>same thing again for unmounting.
>
>If the userspace helper is invoked in the triggering process' namespace,
>any full paths given to it will be resolved in that namespace.  This
>bypasses one of the main justifications for having autofs work only with FD
>mounts.
>
>If a sysop mounts autofs filesystems (installs triggers), that will and
>should happen in the namespace inhabited by him, not in any cloned
>namespaces.  Without needing to wait for someone to work through kernel
>politics and make FD mounts happen.
>
>  
>

Yes, this is most likely the way it will happen.  Note that I 'mounted 
on a file descriptor' in the examples
for multimounts by doing a fchdir(fd) and a mount --move 
/tmp/<unique_dir> '.'    Using file descriptors is however important for 
maintaining up to date direct mounts on the system. 

>>The exception to this rule is when the map entry for /home contains the
>>option 'browse':
>>    
>>
>
>Solaris 2.6 and above has the -browse option on indirect maps, so the set
>of subdirs potentially mountable can be seen, without mounting them. I
>don't see where this is implemented in Linux, nor do I see how it's done,
>documented in Solaris NFS man pages, but I didn't put a lot of time into
>the search.  
>

Yes.   Ian Kent has something similar in his release of autofs 4.1.0 
called ghosting.  Unfortunately, I haven't had the chance to play with 
it very much.

>I *hope* rpc.mountd has an opcode to enumerate every
>filesystem it's willing to export.  
>

# showmount -e hostname    ?

>Does it "stat" and return the stat
>data?  That would be important for "ls".
>
>  
>
Yes, an 'ls' actually does an lstat on every file.   This is cool 
because it doesn't follow links, which is how direct mounts and most 
likely browsing will work.   There are other cases where userspace will 
inadvertedly stat (instead of lstat) or getxattr (instead of lgetxattr) 
and these will need to be fixed.

Other known things that will break is gnu find(1).   For some reason, it 
now does:

lstat('dir')
chdir('dir')
lstat('.')

and compares st_dev and st_ino from the two lstat calls.

This obviously breaks when you use browsing and direct mounts.

>>In order to maintain some form of coherency between changing maps, these
>>dummy directory entries will remain in place within the dcache so that
>>the kernel doesn't need to query the usermode helper as often.  These
>>entries will periodically timeout and will be unhashed from the dcache.
>>    
>>
>
>Browsetimeout -- Each autofs instance necessarily has an in-core list of
>its subdirectories.  If the caller stats any of these and that one (or
>alternatively, any of the known subdirs) is not in the dcache, the module
>needs to run the helper again, refreshing all dcache entries.  But you
>still need a timeout because the mode etc. might change on the server, but
>it's rare.  Let's avoid committing a lot of coding effort and CPU time to
>supporting events tht might happen once per year.
>
>  
>
In some environments, maps change fairly often (a couple times a day).  
A timeout of 10 or 15 minutes is reasonable to me for this timeout to 
occur.  Of course, the way things are setup, a stale entry will still 
fail and return ENOENT if it has been removed from the maps since the 
last browse update.

>>Executing the usermode helper within the namespace of the triggering
>>application does have a problem when browsing is used.  We are caching
>>map keys in kernelspace and can run into coherency problems when an
>>autofs super_block is associated with multiple namespaces which have
>>differing automount maps in /etc. This kind of situation may occur if a
>>namespace is cloned and a new /etc directory with a different auto_home
>>map is mounted.
>>    
>>
>
>The uncloned superblock problem is discussed later in the paper.  It looks
>to me like the VFS layer ought to be responsible for cloning superblocks.
>Not to discourage work towards that goal, but I suggest not delaying autofs
>until it happens.  The result is that some users will see mount points
>(mounted or potentially mountable) that within-namespace policy says should
>be invisible.
>
Agreed.  This can hold off until later, as it isn't neccesarily an easy 
thing to do either.

>  That's not too bad, since we rely on UNIX file permissions
>or ACLs for security, not visibility in the automount map.  If an indirect
>map entry was formerly absent but now present, presumably the userspace
>helper will consult the then-prevailing automount map and find it
>successfully.
>
>  
>
Yes, but then when the other namespace accesses this entry and attempts 
to mount it and no longer finds it in the map, it is unhashed and no 
enumerated as a cache entry, which is still valid in the first 
namespace.  This cache coherency is a subtle point.  The main point is 
that without super_block cloning, we are left with two namespaces that 
can effectively alter each other's automount policy be remounting the 
filesystem.

>>Sect. 5.2 Direct Maps
>>    
>>
>
>  
>
>>2) The map key for the direct mount entry is now passed as a new mount
>>option called 'mapkey'.
>>    
>>
>
>I don't quite see the need for the mapkey mount option.  It seems to me
>that the name of the mount point is always equal to the map key.  In my
>model, mounting on open FDs isn't going to be implementable, and so the
>userspace helper has to know the full path name of the mount point, anyway.
>
>  
>
This is the subtle difference between direct and indirect maps.   The 
direct map keys are absolute paths, not path components.  We are 
implementing direct mounts as individual filesystems that will trap on 
traversal into their base directory.  This filesystem has no idea where 
it is located as far as the user is concerned.  We need to tell the 
filesystem directly so that the usermode helper can look it up.  
Conversely, the indirect map uses the sub-directory name as a mapkey.

As noted, we don't actually rely on this value as an absolute path.  
This means that we can move or bind the direct mount trapping 
filesystem.   As for mounting on open fd's, the fchdir(fd); mount --move 
/tmp/foo '.' still works.

>>5.3 Multimounts and Offsets
>>    
>>
>
>  
>
>>/usr/src                hosta:/export/src	\
>>            /linux      hostb:/export/linuxsrc
>>    
>>
>
>Suppose someone accesses /usr/src/linux.  Is it not true that both the
>original process and mount(8) have to first access /usr/src, triggering
>automounting of hostA:/export/src, and only when the stat info and readdir
>from that step have come through at least twice, can they go on to monkey
>with /usr/src/linux, triggering mounting of hostB:/export/otherlinux? Thus
>I don't see the need for multimounts.  The conceptual idea of mounting both
>dirs "as a unit" is maybe attractive when not looked at too closely, but it
>seems to me that by just punting, you get infinitesimally slower service to
>the user and a significant section of logic avoided in the code.
>
>  
>
This is pretty much needed no matter how you look at it.   If you set it 
up so that it peeked at the NFS share for /usr/src to get permission 
information, you also have to verify that it contains a directory 
'linux'.  This doesn't seem like much, but these things can change from 
underneath us.

My understanding of NFS is that you cannot 'pin' a directory on the 
server in order to keep it there as your mountpoint in the client.  You 
have to simply look it up and pin it in the client.  If you don't mount 
/usr/src, then you also won't have permission changes on it's base 
directory reflected on your system either.

>The kernel would need to know to install an autofs structure (trigger) on
>/usr/src/linux even though /usr/src was represented by only an autofs
>structure, not actually mounted yet, just like we see in procfs.  I doubt
>that's a showstopper, although you'd have to write the kernel code
>carefully.  The example of userD/server{1,2} indicates that you intend for
>the autofs structure, with nothing mounted on it, ought to be a really
>existing and traversable directory on whose subdirs other autofs FS's can
>be mounted.  Good.
>
>But in sec. 5.3.2 I see you making filesystem dirs in /tmp which seem to
>substitute for the synthetic autofs directories.  Bad, if I've understood
>the example.  Comments suggest that you need the /tmp directory to avoid
>setting off the autofs trigger.  Better: if a synthetic autofs directory
>has no corresponding entry in an automount map, you don't mount anything on
>it.  But if it *does* have a map entry, you need to mount it in order to
>stat it (the server's instance) to determine if the user has permission to
>traverse it, before even considering whether to mount the subdir. Remember
>that in my model I'm leaving aside FD mounts, so traversing containing
>directories by name is a valid concept.
>
>  
>
The directory /tmp/<unique_dir> is _not_ a synthetic autofs directory, 
it is a point where we perform our mounts before we move them.  The 
synthetic directories for multimounts w/o root offsets are handled by a 
tmpfs filesystem simply because it reduces code duplication.

>What is the significance of "lazy mount"?  I don't see the word "lazy" in
>any of the Solaris NFS or automount docs I looked at.  In sec. 5.3.1
>you say it means "mount only when accessed".  Thus the whole idea of autofs
>is to "lazy mount" vast numbers of filesystems.  Right?
>
>  
>
The term 'lazy mount' as used in the document refers to lazily mounting 
the offsets (subdirectories) of a multimount on an as needed basis.  
 From the Solaris 9 automount(1M) manpage:

  Multiple Mounts
     A multiple mount entry takes the form:
                                                                                 

     key [-mount-options] [[mountpoint] [-mount-options] location...]...
                                                                                 

     The initial /[mountpoint] is optional for  the  first  mount
     and  mandatory  for  all  subsequent  mounts.  The  optional
     mountpoint is taken as a pathname relative to the  directory
     named  by  key.  If  mountpoint  is  omitted  in  the  first
     occurrence, a mountpoint of / (root) is implied.
                                                                                 

     Given an entry in the indirect map for /src
                                                                                 

     beta     -ro\
       /           svr1,svr2:/export/src/beta  \
       /1.0        svr1,svr2:/export/src/beta/1.0  \
       /1.0/man    svr1,svr2:/export/src/beta/1.0/man
                                                                                 

     All offsets must exist on the server under  beta.  automount
     will   automatically  mount  /src/beta,  /src/beta/1.0,  and
     /src/beta/1.0/man, as needed,  from  either  svr1  or  svr2,
     whichever host is nearest and responds first.

The key is the 'as needed' bit, something we don't have in Linux yet.  

For justification to it's worth, some institutions have file servers 
that export hundreds or even thousands of shares over NFS.   As /net is 
really just a kind of executable indirect map that returns multimounts 
for each hostname used as a key,  just doing 'cd /net/hostname' may 
potentially mount hundreds of filesystems.  This is not cool! 
                                                                               


>>5.4 Expiry
>>    
>>
>
>  
>
>>Handling expiry of mounts is difficult to get right.  Several different
>>aspects need to be considered before being able to properly perform
>>expiry.
>>    
>>
>
>The current daemon (with latest patches) seems to get it right most of the
>time.
>
>  
>
It's the rest of the time we want to deal with.  I know Ian has done a 
lot of good work on this over the past few months and I hope we will be 
able to use his insight to get everything right.

>>The autofs filesystem really should know as little about VFS internal
>>structures as possible.  In this case, the filesystem code is charged
>>with walking across mountpoints and manually counting reference counts.
>>This task is much better left to the VFS internals.
>>    
>>
>
>Someone with a more thorough understanding of the code should comment on
>this, but I didn't notice the module rooting through VFS data; it looks
>like it relies on use counts maintained by the VFS layer, similar to what
>mount(2) relies on to declare a mount to be busy.
>
>  
>
It manually walks through dentry trees and vfsmount trees (albeit the v3 
code doesn't do the latter). It manually does reference count checks for 
business which can change over time.  It also has to do this all with 
locking, by grabbing vfs specific locks.  I'm pretty sure these 
structures are _not_ meant to be traversed by anything outside the vfs 
and the fact that autofs has gotten away with it is a remnant of the 
fact that dcache_lock used to encompass a lot.  In fact, in 2.5, the 
vfsmount structures that autofs walks is has split locks and now uses 
vfsmount_lock, which isn't exported to modules at all.

This is a good example of why this stuff should probably be merged into 
VFS,  autofs4 has yet to be updated to use this lock.  This comes with 
the decision to a) no longer support it as a module, only built in, or 
b) make vfsmount_lock accessible to modules.

But yes, someone with a more thorough understanding of the code should 
comment  :) 

>>Unmounting the filesystem from userspace is racy, as any program can
>>begin using a mount between the time the daemon has received a path to
>>expire and the time it actually makes the umount(2) system call.
>>    
>>
>
>So the helper's umount() will fail.  OK, it failed.  The kernel module
>should not recognize the mounted dir as being gone, until the module itself
>has seen that it's gone.  This policy also helps in cases where the sysop
>manually unmounts an automounted directory for repair purposes.
>  
>
But this leads to races which cause partial expiries to occur in autofs4.

>A common problem is stale NFS filehandles, and in this case we'd like the
>userspace helper to be aggressive in using "umount -f" or other advanced
>techniques.  The freedom to fail is important here.
>  
>
I'd much much rather see umount -l happen.  At least with -l, there is a 
slight chance that the file system will come back and the processes 
affected will be able to continue operating as usual.

>  
>
>>These points suggest that the kernel's VFS sub-system should be charged
>>with handling expiry.
>>    
>>
>
>The point is well taken that a VFS layer expiry mechanism would be welcomed
>by many filesystems.  But autofs has to work with the kernel as it lies
>now.
>
>  
>
Why? Things change in the kernel all the time.  Please note, we will be 
doing development against 2.6. 

I'd like to see an independent patch out there for those who want it on 
2.4, but the fact of the matter is that alot has changed since 2.4 and 
the amount of work required may not be worth it.

>>As described above, we may be installing multiple mounts upon each
>>trigger. This tree of mounts will need to expire together as an atomic
>>unit.  We will need to register this block of mounts to some expiry
>>system.  This will be done by performing a remount on the base
>>automounted filesystem after any nested offset mounts have been installed
>>    
>>
>
>A filesystem is "in use" if anything is mounted on its subdirs.  That
>precludes premature auto-unmounting of a containing directory, in the case
>of a multi-mount or jimc's recommended non-implementation thereof.  I don't
>see that a multi-mount stack needs to expire as a unit -- just let the
>components expire normally, leaf to root.  It doesn't bother jimc that some
>members are mounted and some aren't; by the principle of lazy mounting,
>that's what we're trying to accomplish.
>
>  
>
The thing is that we use autofs filesystems as traps.  Following from 
the previous /usr/src/linux example:

# cat /proc/mounts
rootfs /
autofs /usr/src
# cd /usr/src
# cat /proc/mounts
rootfs /
autofs /usr/src
hosta:/src /usr/src
autofs /usr/src/linux
# cd linux
# cat /proc/mounts
rootfs /
autofs /usr/src
hosta:/src /usr/src
autofs /usr/src/linux
hostb:/linux /usr/src/linux
#cd /

Now, Assume that nobody is using /usr/src and /usr/src/linux.   The 
first fs to expire is going to be the nfs from hostb on /usr/src/linux

# cat /proc/mounts
rootfs /
autofs /usr/src
hosta:/src /usr/src
autofs /usr/src/linux

Next, /usr/src should go.  The thing is, we do _not_ want to unmount the 
autofs filesystem at /usr/src/linux before unmounting the nfs filesystem 
at /usr/src because that would open ourselves up to a user coming in and 
doing chdir(/usr/src/linux).  We would catch the traversal because our 
trigger on 'linux' is gone.  We also shouldn't unmount the nfs 
filesystem from hosta now, because somebody is using it.  

However, if we had removed the two filesystems toghether atomically, 
then everything works fine.

Does that clear it up a bit?  

>>5.5 Handling Changing Maps
>>    
>>
>
>The whole issue of changed maps is closely related to the case of cloning a
>namespace and discovering that an autofs map is non-identical in the new
>namespace.
>
>As pointed out in 5.5.1, when the maps change a userspace program will have
>to detect some added or deleted items.  This program will have to run
>separately in the context of every namespace.  Thus, we should probably
>burden the sysop with remembering to run it if he wants his new/deleted
>maps to be recognized. But we'll have to use some ioctl to stimulate the
>kernel module to enumerate all known namespaces and run the updater for
>each one.
>
>  
>
Nah.   I leave that as a namespace-aware cron job problem ;)


>>5.5.2 Forcing Expiry to Occur
>>    
>>
>
>When I do this the reason is generally that I'm going to take down a
>server.  Then I don't want "lazy unmounts"; I want immediate unmounts that
>will be fatal to the processes using the filesystem.  When the server is
>already dead, then I may do a lazy unmount with the expectation that the
>structure will never be cleaned up until the client is rebooted, but at
>least the client can continue to run.
>
>  
>
Lazy unmounts appear immediately in your system.  

This may not be the only functionality needed, yes.  I'm sure there are 
more options required given the circumstances of the kill.  I probably 
shouldn't have mentioned the lazy unmounting for the forced expiry. 

I'd be interested to hear more about the different types of 
(expire/kill) operations that sysadmins prefer.


>>7 Scalability
>>    
>>
>
>Necessarily mount(8) is used to mount filesystems, since only it has all
>the spaghetti code and pseudo-object-oriented executables to deal with the
>various filesystem types.  Hence at least one process (and most likely a
>parent shell script) is expected per mount.  We need to be frugal in
>writing the userspace helper (and this is a reason to roll our own, not use
>hotplug), but the idea of using a userspace helper to mount, rather than a
>persistent daemon, doesn't sound scary to me.
>
>For me the biggest attraction of a Solaris-style automount upgrade is
>the ability to create wildcard maps with substitutible variables, e.g.
>rather than having a kludgey programmatic map that creates little map
>files on the fly looking like "* tupelo:/&", a host map can be implemented
>via "* $SERVER:/&".  Of course Solaris has a native "-host" map type,
>which is also good.
>
>  
>
The substitution stuff I think Ian had worked on: Ian correct me if I'm 
wrong here.

The -host map really is does act like an executable indirect map.   This 
is traditionally implemented on Linux as scripts, but that does keep you 
from using 'The Same Automounter Maps' on linux and solaris.   (It's 
also a big Linux customer complaint afaict).

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:14 ` [autofs] [RFC] Towards a Modern Autofs Jim Carter
  2004-01-07 22:55   ` Mike Waychison
@ 2004-01-08  0:48   ` Ian Kent
  1 sibling, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-08  0:48 UTC (permalink / raw)
  To: Jim Carter; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

On Wed, 7 Jan 2004, Jim Carter wrote:

>
> > The exception to this rule is when the map entry for /home contains the
> > option 'browse':
>
> Solaris 2.6 and above has the -browse option on indirect maps, so the set
> of subdirs potentially mountable can be seen, without mounting them. I
> don't see where this is implemented in Linux, nor do I see how it's done,
> documented in Solaris NFS man pages, but I didn't put a lot of time into
> the search.  I *hope* rpc.mountd has an opcode to enumerate every
> filesystem it's willing to export.  Does it "stat" and return the stat
> data?  That would be important for "ls".

So, even after our most recent email conversation, you still haven't
checked out autofs 4.1.0 and my kernel module kit.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 22:55   ` Mike Waychison
@ 2004-01-08 12:00     ` Ian Kent
  2004-01-08 15:39       ` Mike Waychison
  2004-01-08 17:34       ` H. Peter Anvin
  2004-01-08 12:29     ` Olivier Galibert
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-08 12:00 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List


Don't expect we'll get many readers of posts this long ...

On Wed, 7 Jan 2004, Mike Waychison wrote:

Mike can you enlighten me with a few words about how namespaces are useful
in the design. I have not seen or heard much about them so please be
gentle.

I don't understand the super block cloning problem you describe either.
Some words on that would be greatly appreciated as well.

What is the form of the trigger talked about? Identifying the automount
points in the autofs filesystem has always been hard and error prone.

Please clearify what we are talking about WRT kernel support for
automount. Is the plan a new kernel module or are we talking about
unspecified 'in VFS' support or both?

> >
> >Solaris 2.6 and above has the -browse option on indirect maps, so the set
> >of subdirs potentially mountable can be seen, without mounting them. I
> >don't see where this is implemented in Linux, nor do I see how it's done,
> >documented in Solaris NFS man pages, but I didn't put a lot of time into
> >the search.
> >
>
> Yes.   Ian Kent has something similar in his release of autofs 4.1.0
> called ghosting.  Unfortunately, I haven't had the chance to play with
> it very much.

Yes. In 4.1 NIS, LDAP and file maps are browsable for both direct and
indirect maps. The browsability, only, requires my kernel patch.
The daemon detects the updated modules' presence, and if the option is
specified 'ghosts' the directories, mounting them only when accessed.

>
> >I *hope* rpc.mountd has an opcode to enumerate every
> >filesystem it's willing to export.
> >
>
> # showmount -e hostname    ?
>
> >Does it "stat" and return the stat
> >data?  That would be important for "ls".
> >
> >
> >
> Yes, an 'ls' actually does an lstat on every file.   This is cool
> because it doesn't follow links, which is how direct mounts and most
> likely browsing will work.   There are other cases where userspace will
> inadvertedly stat (instead of lstat) or getxattr (instead of lgetxattr)
> and these will need to be fixed.
>
> Other known things that will break is gnu find(1).   For some reason, it
> now does:
>
> lstat('dir')
> chdir('dir')
> lstat('.')

This suggestion has been made by others several times but doesn't seem
to be a problem in practice. In all my testing I have only been able to
find one case that does'nt work as needed when ghosted. This is the
situation where a home directory in a map exported from a server, is
actually not available (eg does not exist) and someone logs into the
account using wu-ftpd. In this case wu-ftpd thinks all is ok but of course
an error is returned when the directory access is attempted. In fact an
error should have been returned at login. Further, I believe this can be
solved with as little as an additional revalidate call in sys_stat (I
think the problem call was sys_stst ???).

> >
> >
> In some environments, maps change fairly often (a couple times a day).
> A timeout of 10 or 15 minutes is reasonable to me for this timeout to
> occur.  Of course, the way things are setup, a stale entry will still
> fail and return ENOENT if it has been removed from the maps since the
> last browse update.

My thoughts on map info and cacheing of it will come when I have had more
time to digest your paper.

> This is the subtle difference between direct and indirect maps.   The
> direct map keys are absolute paths, not path components.  We are
> implementing direct mounts as individual filesystems that will trap on
> traversal into their base directory.  This filesystem has no idea where
> it is located as far as the user is concerned.  We need to tell the
> filesystem directly so that the usermode helper can look it up.
> Conversely, the indirect map uses the sub-directory name as a mapkey.

I'm not sure what you are saying here. Does this mean there is a mount for
every direct mount (this might be what you call a trigger)?

AIX implemented automounts by mounting everything in each map. This
made the mount listing very ugly.

>
> >What is the significance of "lazy mount"?  I don't see the word "lazy" in
> >any of the Solaris NFS or automount docs I looked at.  In sec. 5.3.1
> >you say it means "mount only when accessed".  Thus the whole idea of autofs
> >is to "lazy mount" vast numbers of filesystems.  Right?
> >

>
> The key is the 'as needed' bit, something we don't have in Linux yet.
>
> For justification to it's worth, some institutions have file servers
> that export hundreds or even thousands of shares over NFS.   As /net is
> really just a kind of executable indirect map that returns multimounts
> for each hostname used as a key,  just doing 'cd /net/hostname' may
> potentially mount hundreds of filesystems.  This is not cool!

This sounds like the stat/lstat question again.

I have been able to provide lazy mounts in 4.1 with directory
browsing but have had to resort to internal sub-mounts when browsing is
not requested or available. This process sounds similar to some of
discussion of muti-mount maps in the paper.

>
>
>
> >>5.4 Expiry
> >>
> >>
> >
> >
> >
> >>Handling expiry of mounts is difficult to get right.  Several different
> >>aspects need to be considered before being able to properly perform
> >>expiry.
> >>
> >>
> >
> >The current daemon (with latest patches) seems to get it right most of the
> >time.
> >
> >
> >
> It's the rest of the time we want to deal with.  I know Ian has done a
> lot of good work on this over the past few months and I hope we will be
> able to use his insight to get everything right.
>
> >>The autofs filesystem really should know as little about VFS internal
> >>structures as possible.  In this case, the filesystem code is charged
> >>with walking across mountpoints and manually counting reference counts.
> >>This task is much better left to the VFS internals.
> >>
> >>
> >
> >Someone with a more thorough understanding of the code should comment on
> >this, but I didn't notice the module rooting through VFS data; it looks
> >like it relies on use counts maintained by the VFS layer, similar to what
> >mount(2) relies on to declare a mount to be busy.
> >
> >
> >
> It manually walks through dentry trees and vfsmount trees (albeit the v3
> code doesn't do the latter). It manually does reference count checks for
> business which can change over time.  It also has to do this all with
> locking, by grabbing vfs specific locks.  I'm pretty sure these
> structures are _not_ meant to be traversed by anything outside the vfs
> and the fact that autofs has gotten away with it is a remnant of the
> fact that dcache_lock used to encompass a lot.  In fact, in 2.5, the
> vfsmount structures that autofs walks is has split locks and now uses
> vfsmount_lock, which isn't exported to modules at all.
>
> This is a good example of why this stuff should probably be merged into
> VFS,  autofs4 has yet to be updated to use this lock.  This comes with
> the decision to a) no longer support it as a module, only built in, or
> b) make vfsmount_lock accessible to modules.
>
> But yes, someone with a more thorough understanding of the code should
> comment  :)

Mmm. The vfsmount_lock is available to modules in 2.6. At least it was in
test11. I'm sure I compiled the module under 2.6 as well???

I thought that, taking the dcache_lock was the correct thing to do when
traversing a dentry list?

In any case after a mail discussion with Maneesh Soni regarding the
autofs4 expiry code I rewrote it. Maneesh felt that using reference counts
was unreliable and recommended that it use VFS api calls where possible. I
did that and that code is now part of my autofs4 module kit for 2.4 and is
also present in the patch set I offered to Andrew Morten for inclusion
in 2.6. It seems to work well. The dentry structures are traversed
and the dcache_lock is obtained as needed. When I can go no further
within the autofs filesystem I resort to traversing the vfsmount
structures to check the mount counts. Maybe we can get some usefull code
from this.

>
> >>Unmounting the filesystem from userspace is racy, as any program can
> >>begin using a mount between the time the daemon has received a path to
> >>expire and the time it actually makes the umount(2) system call.
> >>
> >>
> >
> >So the helper's umount() will fail.  OK, it failed.  The kernel module
> >should not recognize the mounted dir as being gone, until the module itself
> >has seen that it's gone.  This policy also helps in cases where the sysop
> >manually unmounts an automounted directory for repair purposes.

The autofs4 moudule blocks (auto) mounts during the umount callback.
Surely this is the sensible thing to do.

> >
> >>These points suggest that the kernel's VFS sub-system should be charged
> >>with handling expiry.
> >>
> >>
> >
> >The point is well taken that a VFS layer expiry mechanism would be welcomed
> >by many filesystems.  But autofs has to work with the kernel as it lies
> >now.
> >
> >
> >
> Why? Things change in the kernel all the time.  Please note, we will be
> doing development against 2.6.

Mmm ... exirey in VFS ... later also.

>
> I'd like to see an independent patch out there for those who want it on
> 2.4, but the fact of the matter is that alot has changed since 2.4 and
> the amount of work required may not be worth it.
>
> >>As described above, we may be installing multiple mounts upon each
> >>trigger. This tree of mounts will need to expire together as an atomic
> >>unit.  We will need to register this block of mounts to some expiry
> >>system.  This will be done by performing a remount on the base
> >>automounted filesystem after any nested offset mounts have been installed
> >>
> >>
> >
> >A filesystem is "in use" if anything is mounted on its subdirs.  That
> >precludes premature auto-unmounting of a containing directory, in the case
> >of a multi-mount or jimc's recommended non-implementation thereof.  I don't
> >see that a multi-mount stack needs to expire as a unit -- just let the
> >components expire normally, leaf to root.  It doesn't bother jimc that some
> >members are mounted and some aren't; by the principle of lazy mounting,
> >that's what we're trying to accomplish.

My understanding of the multi-mount/tree mounts is flawed. Don't look to
autofs v4 for correct functionality ... bummer ... missed that.

>
> >>5.5 Handling Changing Maps
> >>
> >>
> >
> >The whole issue of changed maps is closely related to the case of cloning a
> >namespace and discovering that an autofs map is non-identical in the new
> >namespace.
> >
> >As pointed out in 5.5.1, when the maps change a userspace program will have
> >to detect some added or deleted items.  This program will have to run
> >separately in the context of every namespace.  Thus, we should probably
> >burden the sysop with remembering to run it if he wants his new/deleted
> >maps to be recognized. But we'll have to use some ioctl to stimulate the
> >kernel module to enumerate all known namespaces and run the updater for
> >each one.
> >
> >
> >
> Nah.   I leave that as a namespace-aware cron job problem ;)

More info please?
Cloning namespaces?

>
>
> >>5.5.2 Forcing Expiry to Occur
> >>
> >>
> >
> >When I do this the reason is generally that I'm going to take down a
> >server.  Then I don't want "lazy unmounts"; I want immediate unmounts that
> >will be fatal to the processes using the filesystem.  When the server is
> >already dead, then I may do a lazy unmount with the expectation that the
> >structure will never be cleaned up until the client is rebooted, but at
> >least the client can continue to run.
> >
> >
> >
> Lazy unmounts appear immediately in your system.
>
> This may not be the only functionality needed, yes.  I'm sure there are
> more options required given the circumstances of the kill.  I probably
> shouldn't have mentioned the lazy unmounting for the forced expiry.
>
> I'd be interested to hear more about the different types of
> (expire/kill) operations that sysadmins prefer.

Hang on. From the discussion my impression of a lazy mount is that it is
not actually mounted!

Indeed, why should it be, it's basically a directory or a dentry in the
kernel.

>
>
> >>7 Scalability
> >>
> >>
> >
> >Necessarily mount(8) is used to mount filesystems, since only it has all
> >the spaghetti code and pseudo-object-oriented executables to deal with the
> >various filesystem types.  Hence at least one process (and most likely a
> >parent shell script) is expected per mount.  We need to be frugal in
> >writing the userspace helper (and this is a reason to roll our own, not use
> >hotplug), but the idea of using a userspace helper to mount, rather than a
> >persistent daemon, doesn't sound scary to me.
> >
> >For me the biggest attraction of a Solaris-style automount upgrade is
> >the ability to create wildcard maps with substitutible variables, e.g.
> >rather than having a kludgey programmatic map that creates little map
> >files on the fly looking like "* tupelo:/&", a host map can be implemented
> >via "* $SERVER:/&".  Of course Solaris has a native "-host" map type,
> >which is also good.
> >
> >
> >
> The substitution stuff I think Ian had worked on: Ian correct me if I'm
> wrong here.
>
> The -host map really is does act like an executable indirect map.   This
> is traditionally implemented on Linux as scripts, but that does keep you
> from using 'The Same Automounter Maps' on linux and solaris.   (It's
> also a big Linux customer complaint afaict).

If wildcard map entries are not in autofs v3 then Jeremy implemented this
in v4.

And yes the host map is basically a program map and that's all. Worse, as
pointed out in the paper it mounts everything under it. This is a source
of stress for mount and umount. I have put in a fair bit of time on ugly
hacks to work around this. This same problem is also evident in startup
and shutdown for master maps with a good number of entries (~50 or more).
A consequence of the current multiple daemon approach.

Ian


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 22:55   ` Mike Waychison
  2004-01-08 12:00     ` Ian Kent
@ 2004-01-08 12:29     ` Olivier Galibert
  2004-01-08 13:20       ` Robin Rosenberg
  2004-01-08 16:23       ` Mike Waychison
  2004-01-08 12:35     ` Ian Kent
  2004-01-08 18:20     ` Jim Carter
  3 siblings, 2 replies; 85+ messages in thread
From: Olivier Galibert @ 2004-01-08 12:29 UTC (permalink / raw)
  To: Kernel Mailing List

On Wed, Jan 07, 2004 at 05:55:23PM -0500, Mike Waychison wrote:
> Yes, an 'ls' actually does an lstat on every file.

I guess you haven't met the plague called color-ls yet.  Lucky you.

Most modern file browsers also seem to feel obligated to follow
symlinks to check whether they're dangling.  A mis-click on "up" when
you're on your home directory could cause a beautiful mount-storm.

  OG.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 22:55   ` Mike Waychison
  2004-01-08 12:00     ` Ian Kent
  2004-01-08 12:29     ` Olivier Galibert
@ 2004-01-08 12:35     ` Ian Kent
  2004-01-08 13:08       ` Ian Kent
  2004-01-08 18:20     ` Jim Carter
  3 siblings, 1 reply; 85+ messages in thread
From: Ian Kent @ 2004-01-08 12:35 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Wed, 7 Jan 2004, Mike Waychison wrote:

>
> This is a good example of why this stuff should probably be merged into
> VFS,  autofs4 has yet to be updated to use this lock.  This comes with
> the decision to a) no longer support it as a module, only built in, or
> b) make vfsmount_lock accessible to modules.

Please don't say it this way.

A new implementation may mean current autofs becomes depricated but
this is a deprecation process, not a slash and burn, and needs to be
managed.

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:35     ` Ian Kent
@ 2004-01-08 13:08       ` Ian Kent
  0 siblings, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-08 13:08 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Thu, 8 Jan 2004, Ian Kent wrote:

Oh! This should have related to the comments about removing autofs from
the kernel.

Sorry about the confusion.

> On Wed, 7 Jan 2004, Mike Waychison wrote:
>
> >
> > This is a good example of why this stuff should probably be merged into
> > VFS,  autofs4 has yet to be updated to use this lock.  This comes with
> > the decision to a) no longer support it as a module, only built in, or
> > b) make vfsmount_lock accessible to modules.
>
> Please don't say it this way.
>
> A new implementation may mean current autofs becomes depricated but
> this is a deprecation process, not a slash and burn, and needs to be
> managed.
>



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:29     ` Olivier Galibert
@ 2004-01-08 13:20       ` Robin Rosenberg
  2004-01-08 16:23       ` Mike Waychison
  1 sibling, 0 replies; 85+ messages in thread
From: Robin Rosenberg @ 2004-01-08 13:20 UTC (permalink / raw)
  To: Olivier Galibert, Kernel Mailing List

torsdagen den 8 januari 2004 13.29 skrev Olivier Galibert:
> On Wed, Jan 07, 2004 at 05:55:23PM -0500, Mike Waychison wrote:
> > Yes, an 'ls' actually does an lstat on every file.
>
> I guess you haven't met the plague called color-ls yet.  Lucky you.
>
> Most modern file browsers also seem to feel obligated to follow
> symlinks to check whether they're dangling.  A mis-click on "up" when
> you're on your home directory could cause a beautiful mount-storm.
>

Not to mention the more complex graphical environments like Konqueror in KDE which produces a 
nice icon with a preview of whatever the a link points to. It also scans directories in 
order to tag the large icon with an even smaller icons to indicate what type of files the directory 
contains. It is very nice, but very different from ls.

-- robin


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:00     ` Ian Kent
@ 2004-01-08 15:39       ` Mike Waychison
  2004-01-09 18:20         ` Ian Kent
  2004-01-08 17:34       ` H. Peter Anvin
  1 sibling, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-08 15:39 UTC (permalink / raw)
  To: Ian Kent; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 13238 bytes --]

Ian Kent wrote:

>Don't expect we'll get many readers of posts this long ...
>
>On Wed, 7 Jan 2004, Mike Waychison wrote:
>
>Mike can you enlighten me with a few words about how namespaces are useful
>in the design. I have not seen or heard much about them so please be
>gentle.
>  
>

Your best bet to learn more about namespaces is probably to read 
copy_namespace() in fs/namespace.c.  There isn't much to google for, 
other than the CLONE_NEWNS flag for clone(2).  Basically, the idea is 
that you can give a new process its own independent mount table to play 
with.  Any changes to it are not seen by any other processes and vice-versa.

As for usefulness, the use namespaces in general is up for debate.  
IMHO, namespaces in Linux are ill designed, however I'm told that their 
uses are still far off and it is understood that they break several 
things.  

AFAIK, the long-term goal of namespaces is to one day be able to do 
user-priviledged mounting.  Basically allowing users to play in their 
own sandbox mounttable, mounting/moving/binding/unmounting filesystems 
as they see fit, without affecting the overall security of the machine 
and without disturbing other users.  Someone correct me here if I'm wrong.

>I don't understand the super block cloning problem you describe either.
>Some words on that would be greatly appreciated as well.
>
>  
>
One of the benefits of namespace cloning is complete mount configuration 
isolation between processes.  In my eyes, automounting is a part of that 
configuration.  To over-simplify the problem, any given filesystem may 
have a single set of mount options.  When a namespace is cloned, every 
mounted filesystem is shared between the two namespaces.  Now we have 
the problem that a change in mount options in one namespace affects the 
other.  This breaks the mountpoint isolation namespaces tried to achieve. 

The 'quick-fix' to this is that filesystems should be allowed to 
determine if they should clone themselves when a namespace is cloned.  
This would ensure that each namespace now has its own copy of the 
filesystem, each with individual sets of mount options.

>What is the form of the trigger talked about? Identifying the automount
>points in the autofs filesystem has always been hard and error prone.
>
>  
>
I don't understand what you mean by the identifying part.   However, the 
'trigger' would the traditional method used in autofsv3/4 for indirect 
maps and probably based off what you already have for doing the browsing 
stuff.

The direct map 'triggers' will be taken care of by another filesystem 
with a magic root directory that will catch traversals using some 
follow_link magic.   I wrote a prototype for this last summer, but 
haven't released it as the userspace stuff completely does not fit in 
with the existing daemon that was out at the time do the the mess of 
glue that was pgids, pipes and processes.   It worked in the simple 
case, but it didn't extend to being able to direct mount an indirect 
map, nor was it able to do the lazy mounting in multimounts as I had 
desired.

>Please clearify what we are talking about WRT kernel support for
>automount. Is the plan a new kernel module or are we talking about
>unspecified 'in VFS' support or both?
>
>  
>
This module will have its own new autofs module (hopefully named 
something other than autofs to avoid confusion/mishaps).  The VFS will 
have native support for expiry.  The VFS will also be slightly extended 
to allow the super_block cloning on namespace clone (although this can 
probably hold off a while, it's more a semantic issue than anything else).

>
>Yes. In 4.1 NIS, LDAP and file maps are browsable for both direct and
>indirect maps. The browsability, only, requires my kernel patch.
>The daemon detects the updated modules' presence, and if the option is
>specified 'ghosts' the directories, mounting them only when accessed.
>
>  
>
What is the difference between Solaris's -browse and your ghosting then? 

>>lstat('dir')
>>chdir('dir')
>>lstat('.')
>>    
>>
>
>This suggestion has been made by others several times but doesn't seem
>to be a problem in practice. In all my testing I have only been able to
>find one case that does'nt work as needed when ghosted. This is the
>situation where a home directory in a map exported from a server, is
>actually not available (eg does not exist) and someone logs into the
>account using wu-ftpd. In this case wu-ftpd thinks all is ok but of course
>an error is returned when the directory access is attempted. In fact an
>error should have been returned at login. Further, I believe this can be
>solved with as little as an additional revalidate call in sys_stat (I
>think the problem call was sys_stst ???).
>
>  
>
The find(1) issue is fairly recent.   This check was added some time 
within the last two years (?) and only appears in the latest distros.

Another problem were the ACL patches for ls(1) and friends.  I *really* 
think they should be lgetxattr ing instead of getxattr.  They even 
explicitly check via an lstat _before hand_ to verify if the file 
S_ISLNK, and only then will it getxattr if it isn't.  Why not extend 
it?   I duno.

>>This is the subtle difference between direct and indirect maps.   The
>>direct map keys are absolute paths, not path components.  We are
>>implementing direct mounts as individual filesystems that will trap on
>>traversal into their base directory.  This filesystem has no idea where
>>it is located as far as the user is concerned.  We need to tell the
>>filesystem directly so that the usermode helper can look it up.
>>Conversely, the indirect map uses the sub-directory name as a mapkey.
>>    
>>
>
>I'm not sure what you are saying here. Does this mean there is a mount for
>every direct mount (this might be what you call a trigger)?
>
>  
>
Yes, it is its own filesystem (type autofs).  This is needed because we 
need to overlay direct triggers within NFS filesystems for multimounts.

Browsing however obviously doesn't need that because we control the 
parent directory.

>AIX implemented automounts by mounting everything in each map. This
>made the mount listing very ugly.
>
>  
>
??  Really?  I find that hard to believe.  I thought Solaris shared it's 
automounter with HPUX and AIX.  I may be wrong though.

>This sounds like the stat/lstat question again.
>
>I have been able to provide lazy mounts in 4.1 with directory
>browsing but have had to resort to internal sub-mounts when browsing is
>not requested or available. This process sounds similar to some of
>discussion of muti-mount maps in the paper.
>
>  
>
Yup. We use your browsing stuff for indirect maps with -browse, and we 
use nested direct triggers for the offsets within the multimounts.

>>
>>    
>>
>>>>5.4 Expiry
>>>>
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>>>>Handling expiry of mounts is difficult to get right.  Several different
>>>>aspects need to be considered before being able to properly perform
>>>>expiry.
>>>>
>>>>
>>>>        
>>>>
>>>The current daemon (with latest patches) seems to get it right most of the
>>>time.
>>>
>>>
>>>
>>>      
>>>
>>It's the rest of the time we want to deal with.  I know Ian has done a
>>lot of good work on this over the past few months and I hope we will be
>>able to use his insight to get everything right.
>>
>>    
>>
>>>>The autofs filesystem really should know as little about VFS internal
>>>>structures as possible.  In this case, the filesystem code is charged
>>>>with walking across mountpoints and manually counting reference counts.
>>>>This task is much better left to the VFS internals.
>>>>
>>>>
>>>>        
>>>>
>>>Someone with a more thorough understanding of the code should comment on
>>>this, but I didn't notice the module rooting through VFS data; it looks
>>>like it relies on use counts maintained by the VFS layer, similar to what
>>>mount(2) relies on to declare a mount to be busy.
>>>
>>>
>>>
>>>      
>>>
>>It manually walks through dentry trees and vfsmount trees (albeit the v3
>>code doesn't do the latter). It manually does reference count checks for
>>business which can change over time.  It also has to do this all with
>>locking, by grabbing vfs specific locks.  I'm pretty sure these
>>structures are _not_ meant to be traversed by anything outside the vfs
>>and the fact that autofs has gotten away with it is a remnant of the
>>fact that dcache_lock used to encompass a lot.  In fact, in 2.5, the
>>vfsmount structures that autofs walks is has split locks and now uses
>>vfsmount_lock, which isn't exported to modules at all.
>>
>>This is a good example of why this stuff should probably be merged into
>>VFS,  autofs4 has yet to be updated to use this lock.  This comes with
>>the decision to a) no longer support it as a module, only built in, or
>>b) make vfsmount_lock accessible to modules.
>>
>>But yes, someone with a more thorough understanding of the code should
>>comment  :)
>>    
>>
>
>Mmm. The vfsmount_lock is available to modules in 2.6. At least it was in
>test11. I'm sure I compiled the module under 2.6 as well???
>
>I thought that, taking the dcache_lock was the correct thing to do when
>traversing a dentry list?
>
>  
>
Walking dentrys still takes the dcache_lock, however walking vfsmounts 
takes the vfsmount_lock.  dcache_lock is no longer used for fast path 
walking either (to the best of my understanding).

find . -name '*.[ch]' -not -path '*SCCS*' | xargs grep vfsmount_lock | 
grep EXPORT

shows no results for vfsmount_lock being exported to modules in 2.6.

>In any case after a mail discussion with Maneesh Soni regarding the
>autofs4 expiry code I rewrote it. Maneesh felt that using reference counts
>was unreliable and recommended that it use VFS api calls where possible. I
>did that and that code is now part of my autofs4 module kit for 2.4 and is
>also present in the patch set I offered to Andrew Morten for inclusion
>in 2.6. It seems to work well. The dentry structures are traversed
>and the dcache_lock is obtained as needed. When I can go no further
>within the autofs filesystem I resort to traversing the vfsmount
>structures to check the mount counts. Maybe we can get some usefull code
>from this.
>
>  
>
I haven't had the chance to step through your new module code 
completely.  sorry.

>>>>Unmounting the filesystem from userspace is racy, as any program can
>>>>begin using a mount between the time the daemon has received a path to
>>>>expire and the time it actually makes the umount(2) system call.
>>>>
>>>>
>>>>        
>>>>
>>>So the helper's umount() will fail.  OK, it failed.  The kernel module
>>>should not recognize the mounted dir as being gone, until the module itself
>>>has seen that it's gone.  This policy also helps in cases where the sysop
>>>manually unmounts an automounted directory for repair purposes.
>>>      
>>>
>
>The autofs4 moudule blocks (auto) mounts during the umount callback.
>Surely this is the sensible thing to do.
>
>  
>
The raciness comes from the fact that we now support the lazy-mounting 
of multimount offsets using embedded direct mounts.  Autofs4 mounts all 
(or as much as it can) from the multimount all together, and unmounts it 
all on expiry.

>>>As pointed out in 5.5.1, when the maps change a userspace program will have
>>>to detect some added or deleted items.  This program will have to run
>>>separately in the context of every namespace.  Thus, we should probably
>>>burden the sysop with remembering to run it if he wants his new/deleted
>>>maps to be recognized. But we'll have to use some ioctl to stimulate the
>>>kernel module to enumerate all known namespaces and run the updater for
>>>each one.
>>>
>>>      
>>>
>>Nah.   I leave that as a namespace-aware cron job problem ;)
>>    
>>
>
>More info please?
>Cloning namespaces?
>
>  
>
I think this 'stimulation' you called it should be the responsibility of 
the namespace cloner.  They could fork off their own little daemon that 
will call 'automount update' every so often.

>>Lazy unmounts appear immediately in your system.
>>
>>This may not be the only functionality needed, yes.  I'm sure there are
>>more options required given the circumstances of the kill.  I probably
>>shouldn't have mentioned the lazy unmounting for the forced expiry.
>>
>>I'd be interested to hear more about the different types of
>>(expire/kill) operations that sysadmins prefer.
>>    
>>
>
>Hang on. From the discussion my impression of a lazy mount is that it is
>not actually mounted!
>
>  
>
Lazy _un_mounts as opposed to lazy mounts. Lazy unmounts are described 
in umount(8):

       -l     Lazy unmount. Detach the filesystem from the filesystem  
hierar-
              chy now, and cleanup all references to the filesystem as 
soon as
              it is not busy anymore.  (Requires kernel 2.4.11 or later.)

HTH,

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:29     ` Olivier Galibert
  2004-01-08 13:20       ` Robin Rosenberg
@ 2004-01-08 16:23       ` Mike Waychison
  1 sibling, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-08 16:23 UTC (permalink / raw)
  To: Olivier Galibert; +Cc: Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

Olivier Galibert wrote:
> On Wed, Jan 07, 2004 at 05:55:23PM -0500, Mike Waychison wrote:
> 
>>Yes, an 'ls' actually does an lstat on every file.
> 
> 
> I guess you haven't met the plague called color-ls yet.  Lucky you.
> 
> Most modern file browsers also seem to feel obligated to follow
> symlinks to check whether they're dangling.  A mis-click on "up" when
> you're on your home directory could cause a beautiful mount-storm.

Why would any file browser or even ls feel compelled to 'stat' something 
right after an 'lstat' says it is not a symbolic link though?

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:00     ` Ian Kent
  2004-01-08 15:39       ` Mike Waychison
@ 2004-01-08 17:34       ` H. Peter Anvin
  2004-01-08 19:41         ` Mike Waychison
                           ` (2 more replies)
  1 sibling, 3 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-08 17:34 UTC (permalink / raw)
  To: Ian Kent; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

Ian Kent wrote:
> 
> If wildcard map entries are not in autofs v3 then Jeremy implemented this
> in v4.
> 

v3 has had wildcard map entries and substitutions for a very, very, very 
long time... it was a v2 feature, in fact.

> And yes the host map is basically a program map and that's all. Worse, as
> pointed out in the paper it mounts everything under it. This is a source
> of stress for mount and umount. I have put in a fair bit of time on ugly
> hacks to work around this. This same problem is also evident in startup
> and shutdown for master maps with a good number of entries (~50 or more).
> A consequence of the current multiple daemon approach.

This is why one wants to implement a mount tree with "direct mount 
pads"; which also means keeping some state in the daemon.

For example, let's say one has a mount tree like:

/foo		server1:/export/foo \
/foo/bar	server1:/export/bar \
/bar		server2:/export/bar

... then you actually have four diffenent filesystems involved: first, 
some kind of "scaffolding" (this can be part of the autofs filesystem 
itself or a ramfs) that hold the "foo" and "bar" directories, and then 
foo, foo/bar, and bar.

Consider the following implementation: when one encounters the above, 
the daemon stashes this away as an already-encountered map entry (in 
case the map entries change, we don't want to be inconsistent), creates 
a ramfs for the scaffolding, creates the "foo" and "bar" subdirectories 
and mount-traps "foo" and "bar".  Then it releases userspace.  When it 
encounters an access on "foo", it gets invoked again, looks it up in its 
"partial mounts" state, then mounts "foo" and mount-traps "foo/bar", 
then releases userspace.

In many ways this returns to the simplicity of the autofs v3 design 
where the atomicity constraints where guaranteed by the VFS itself, *as 
long as* mount traps can be atomically destroyed with umounting the 
underlying filesystem.

	-hpa

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 22:55   ` Mike Waychison
                       ` (2 preceding siblings ...)
  2004-01-08 12:35     ` Ian Kent
@ 2004-01-08 18:20     ` Jim Carter
  2004-01-08 21:01       ` H. Peter Anvin
  3 siblings, 1 reply; 85+ messages in thread
From: Jim Carter @ 2004-01-08 18:20 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Kernel Mailing List, autofs mailing list

On Wed, 7 Jan 2004, Mike Waychison wrote:
> Jim Carter wrote:

> >  That's not too bad, since we rely on UNIX file permissions
> >or ACLs for security, not visibility in the automount map.  If an indirect
> >map entry was formerly absent but now present, presumably the userspace
> >helper will consult the then-prevailing automount map and find it
> >successfully.
>
> Yes, but then when the other namespace accesses this entry and attempts
> to mount it and no longer finds it in the map, it is unhashed and no
> enumerated as a cache entry, which is still valid in the first
> namespace.  This cache coherency is a subtle point.  The main point is
> that without super_block cloning, we are left with two namespaces that
> can effectively alter each other's automount policy be remounting the
> filesystem.

So for browsing ("ls" an indirect map's mountpoint without statting each
file), one namespace will see targets not in its version of the map, or the
other namespace will fail to see targets in its map.  Hmm, in the strict
userspace helper model, how does the helper get the file list into the
kernel module's data structures?  Perhaps we need an "inverse stat" ioctl
to pass a stat struct down to the kernel.  Plus another ioctl or a special
variant of mkdir, to populate the kernel's view of an indirect map with
names, but not stat data.  Running a pipe/socket/etc. between the kernel
and userspace is yucky.  By the way, IPSec handles the problem by letting
its userspace daemon create a socket with address family PF_KEY.

(About multimounts:)

> This is pretty much needed no matter how you look at it.   If you set it
> up so that it peeked at the NFS share for /usr/src to get permission
> information, you also have to verify that it contains a directory
> 'linux'.  This doesn't seem like much, but these things can change from
> underneath us.

I don't see that.  What I do see is, if /usr/src/linux is an autofs direct
map, and /usr/src is also a direct (or indirect?) map, then both
/usr/src/linux and /usr/src must have autofs filesystems (local kernel data
structures) mounted on them at all times, whether or not the NFS
filesystems were mounted.  And when /usr/src eventually gets NFS mounted,
the /usr/src/linux autofs FS has to percolate upward, and percolate back
when /usr/src is unmounted.  Or else, after /usr/src is NFS mounted you
need some magic (the multimount mechanism) to install an autofs filesystem
on /usr/src/linux.  The two approaches are very similar, but I think the
difference is that in Sun's implementation you have this special feature
with syntax and logic to support it, whereas as described by me, the man
page would just say "don't worry about autofs mount points located in a
filesystem that isn't mounted yet; we'll take care of it one way or
another."

> For justification to it's worth, some institutions have file servers
> that export hundreds or even thousands of shares over NFS.   As /net is
> really just a kind of executable indirect map that returns multimounts
> for each hostname used as a key,  just doing 'cd /net/hostname' may
> potentially mount hundreds of filesystems.  This is not cool!

Definitely not cool.  But some users (yours truly among them) do "alias ls
'ls -F'", which requires "ls" to stat (and thus mount) every exported
filesystem.  More uncool, and I don't see any non-disgusting way around it.

> >So the helper's umount() will fail.  OK, it failed.  The kernel module
> >should not recognize the mounted dir as being gone, until the module itself
> >has seen that it's gone.  This policy also helps in cases where the sysop
> >manually unmounts an automounted directory for repair purposes.

> But this leads to races which cause partial expiries to occur in autofs4.

But it's a fact of life that some umounts will fail.  Perhaps that's one
reason why I'm dragging my heels so hard about the multimounts: they depend
on being mounted and unmounted as a unit, and that atomicity can't be
guaranteed.  Whereas if the subdir and containing dir are unmounted
independently, the use counts will insure that the subdir is unmounted
first, and the containing dir is unmounted (and the subdir's autofs FS
mount is put back in a "storage" state) only after successful unmounting of
the subdir.

Aha, I hear someone snarling, "you can't umount the containing dir if an
autofs FS is mounted on the subdir, and conversely, you can't mount the
subdir autofs FS until after the containing dir is mounted".  So the autofs
private data for the containing dir needs a chain saying "there are
supposed to be autofs subdirs mounted on these subdirs (relative paths or
"offsets").  Perhaps we're both talking about the same mechanism for
multimounts, but I'm just resisting some of the extras that go with them,
such as the atomicity and the special syntax.

> >A filesystem is "in use" if anything is mounted on its subdirs.  That
> >precludes premature auto-unmounting of a containing directory, in the case
> >of a multi-mount or jimc's recommended non-implementation thereof.  I don't
> >see that a multi-mount stack needs to expire as a unit -- just let the
> >components expire normally, leaf to root.  It doesn't bother jimc that some
> >members are mounted and some aren't; by the principle of lazy mounting,
> >that's what we're trying to accomplish.

> The thing is that we use autofs filesystems as traps.  Following from
> the previous /usr/src/linux example:

---- snip most of example ----

> Now, Assume that nobody is using /usr/src and /usr/src/linux.   The
> first fs to expire is going to be the nfs from hostb on /usr/src/linux
>
> # cat /proc/mounts
> rootfs /
> autofs /usr/src
> hosta:/src /usr/src
> autofs /usr/src/linux
>
> Next, /usr/src should go.  The thing is, we do _not_ want to unmount the
> autofs filesystem at /usr/src/linux before unmounting the nfs filesystem
> at /usr/src because that would open ourselves up to a user coming in and
> doing chdir(/usr/src/linux).  We would catch the traversal because our
> trigger on 'linux' is gone.  We also shouldn't unmount the nfs
> filesystem from hosta now, because somebody is using it.

Solution: do a "move" remount, remounting the NFS filesystem from /usr/src
to /tmp/_garbage/src.  In the instant after that finishes, a wayward user
does "cd /usr/src/linux".  Since only the autofs FS is currently on
/usr/src, it triggers and forks another userspace helper to mount
serverA:/export/src on /usr/src, and it *atomically* mounts an autofs FS
on /usr/src/linux before signalling the caller that /usr/src is ready for
use.  Then when the first userspace helper regains the CPU, all the stuff
on /tmp/_garbage/src would be broken down with no need to worry about race
conditions.

Minor detail, applying to both Sun-style multimounts and my ideas: can you
"mount" an autofs FS without statting its mount point?  Probably not.
This means that the kernel has to run the userspace helper twice, once to
mount the containing dir and again to implant the autofs FS on the subdir,
before reporting to the caller that the containing dir is ready.
Alternatively the helper should infer that the subdir needs an autofs FS
when it's mounting the containing dir (potentially needing to consult every
map file and NIS map in the system to figure that out).  Hmm, am I arguing
in favor of the special syntax of Sun multimounts?

More on /tmp/_garbage: when a server crashes and you aren't sure whether
forced or lazy unmounts will get rid of the mount strucures, if you move
the mount into /tmp/_garbage then the main automount tree will still be
functional.  A problem I see from time to time is, serverX is rebooted, the
client has a stale NFS filehandle, and I can't make the broken mount
disappear, hence can't mount that filesystem from the revived serverX.
This is particularly a problem on Solaris 2.6; on Linux I can usually
recover by sufficiently many "umount -f" or "umount -l" or "kill -9".

James F. Carter          Voice 310 825 2897    FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA  90095-1555
Email: jimc@math.ucla.edu    http://www.math.ucla.edu/~jimc (q.v. for PGP key)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 17:34       ` H. Peter Anvin
@ 2004-01-08 19:41         ` Mike Waychison
  2004-01-08 23:42         ` Michael Clark
  2004-01-09 18:32         ` Ian Kent
  2 siblings, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-08 19:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ian Kent, Mike Waychison, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2726 bytes --]

H. Peter Anvin wrote:
> Ian Kent wrote:
> 
>>
>> If wildcard map entries are not in autofs v3 then Jeremy implemented this
>> in v4.
>>
> 
> v3 has had wildcard map entries and substitutions for a very, very, very 
> long time... it was a v2 feature, in fact.
> 
>> And yes the host map is basically a program map and that's all. Worse, as
>> pointed out in the paper it mounts everything under it. This is a source
>> of stress for mount and umount. I have put in a fair bit of time on ugly
>> hacks to work around this. This same problem is also evident in startup
>> and shutdown for master maps with a good number of entries (~50 or more).
>> A consequence of the current multiple daemon approach.
> 
> 
> This is why one wants to implement a mount tree with "direct mount 
> pads"; which also means keeping some state in the daemon.
> 
> For example, let's say one has a mount tree like:
> 
> /foo        server1:/export/foo \
> /foo/bar    server1:/export/bar \
> /bar        server2:/export/bar
> 
> ... then you actually have four diffenent filesystems involved: first, 
> some kind of "scaffolding" (this can be part of the autofs filesystem 
> itself or a ramfs) that hold the "foo" and "bar" directories, and then 
> foo, foo/bar, and bar.
> 
> Consider the following implementation: when one encounters the above, 
> the daemon stashes this away as an already-encountered map entry (in 
> case the map entries change, we don't want to be inconsistent), creates 
> a ramfs for the scaffolding, creates the "foo" and "bar" subdirectories 
> and mount-traps "foo" and "bar".  Then it releases userspace.  When it 
> encounters an access on "foo", it gets invoked again, looks it up in its 
> "partial mounts" state, then mounts "foo" and mount-traps "foo/bar", 
> then releases userspace.
> 
> In many ways this returns to the simplicity of the autofs v3 design 
> where the atomicity constraints where guaranteed by the VFS itself, *as 
> long as* mount traps can be atomically destroyed with umounting the 
> underlying filesystem.
> 

Great!

This is exactly what I found when looking into the situation.  However, 
namespaces still break automounting unless you can rid yourself of the 
daemon.  Move events into call_usermodehelper calls in current's 
namespace and maintain what little state you need as a set of tokens.


-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 18:20     ` Jim Carter
@ 2004-01-08 21:01       ` H. Peter Anvin
  0 siblings, 0 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-08 21:01 UTC (permalink / raw)
  To: Jim Carter; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

Jim Carter wrote:
> 
>>For justification to it's worth, some institutions have file servers
>>that export hundreds or even thousands of shares over NFS.   As /net is
>>really just a kind of executable indirect map that returns multimounts
>>for each hostname used as a key,  just doing 'cd /net/hostname' may
>>potentially mount hundreds of filesystems.  This is not cool!
> 
> Definitely not cool.  But some users (yours truly among them) do "alias ls
> 'ls -F'", which requires "ls" to stat (and thus mount) every exported
> filesystem.  More uncool, and I don't see any non-disgusting way around it.
> 

No, it doesn't... this has been covered several times already.  It
requires ls to *lstat* the point; it only does a stat() if the resulting
entry is S_IFLNK.  The same is true for GUI tools.  There is a fairly
easy way to distinguish lstat() from virtually all other filesystem
calls -- it doesn't invoke follow_link.  So the answer is simply to
create an inode which is S_IFDIR but has a follow_link method.  The
follow_link method triggers a mount.  This is called a "pseudo-symlink
directory" or sometimes "ghost directory".

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 17:34       ` H. Peter Anvin
  2004-01-08 19:41         ` Mike Waychison
@ 2004-01-08 23:42         ` Michael Clark
  2004-01-09 20:28           ` Mike Waychison
  2004-01-09 18:32         ` Ian Kent
  2 siblings, 1 reply; 85+ messages in thread
From: Michael Clark @ 2004-01-08 23:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ian Kent, Mike Waychison, autofs mailing list, Kernel Mailing List

On 01/09/04 01:34, H. Peter Anvin wrote:
> In many ways this returns to the simplicity of the autofs v3 design 
> where the atomicity constraints where guaranteed by the VFS itself, *as 
> long as* mount traps can be atomically destroyed with umounting the 
> underlying filesystem.

Do we need to revive Tigran's forced unmount patch 'badfs' ala FreeBSD's
deadfs? Although it doesn't guarantee atomic unmount, it could help
a lot with the tendancy to get stuck autofs mounts.

   http://tinyurl.com/2hto8

I've been long waiting for this functionality in mainline.

I wonder if binding badfs over the mountpoint at the beginning of the
potentially lengthy unmount process would improve the atomicity
to userspace. ie although the unmount would proceed in the background,
badfs would have been mounted at that point at the start of the process
- mounts are atomic no?

~mc


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 15:39       ` Mike Waychison
@ 2004-01-09 18:20         ` Ian Kent
  2004-01-09 20:06           ` Mike Waychison
  2004-01-09 20:51           ` Jim Carter
  0 siblings, 2 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-09 18:20 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Thu, 8 Jan 2004, Mike Waychison wrote:

> >
> >Mike can you enlighten me with a few words about how namespaces are useful
> >in the design. I have not seen or heard much about them so please be
> >gentle.
> >
> >
>

Think I have enough on namespaces to understand your proposal now. Thanks.

> >What is the form of the trigger talked about? Identifying the automount
> >points in the autofs filesystem has always been hard and error prone.
> >
> >
> >
> I don't understand what you mean by the identifying part.   However, the
> 'trigger' would the traditional method used in autofsv3/4 for indirect
> maps and probably based off what you already have for doing the browsing
> stuff.
>
> The direct map 'triggers' will be taken care of by another filesystem
> with a magic root directory that will catch traversals using some
> follow_link magic.   I wrote a prototype for this last summer, but
> haven't released it as the userspace stuff completely does not fit in
> with the existing daemon that was out at the time do the the mess of
> glue that was pgids, pipes and processes.   It worked in the simple
> case, but it didn't extend to being able to direct mount an indirect
> map, nor was it able to do the lazy mounting in multimounts as I had
> desired.

Is this the stuf that Al Viro is working on?

>
> >Please clearify what we are talking about WRT kernel support for
> >automount. Is the plan a new kernel module or are we talking about
> >unspecified 'in VFS' support or both?
> >
> >
> >
> This module will have its own new autofs module (hopefully named
> something other than autofs to avoid confusion/mishaps).  The VFS will
> have native support for expiry.  The VFS will also be slightly extended
> to allow the super_block cloning on namespace clone (although this can
> probably hold off a while, it's more a semantic issue than anything else).

Yep. Got that as well.

>
> >
> >Yes. In 4.1 NIS, LDAP and file maps are browsable for both direct and
> >indirect maps. The browsability, only, requires my kernel patch.
> >The daemon detects the updated modules' presence, and if the option is
> >specified 'ghosts' the directories, mounting them only when accessed.
> >
> >
> >
> What is the difference between Solaris's -browse and your ghosting then?

Well I don't know, nothing really. I was working to the requirement of
providing browsable mount trees. The 'doing it properly' was secondary to
satisfying my spec. Mind there are a number of things I haven't done.
Since I don't have a need for tree-mounts (closest would be multi-mount) I
haven't done anything there. As you say in v4 they are a mount/umount
everthing. Consequenty, only the top level leaves are browsable. Indeed, I
haven't solved my requirement of a transparent autofs filesystem aka.
Solaris automounter again. A difficult problem that will require
considerable effort.

>
> >>lstat('dir')
> >>chdir('dir')
> >>lstat('.')
> >>
> >>
> >
> >This suggestion has been made by others several times but doesn't seem
> >to be a problem in practice. In all my testing I have only been able to
> >find one case that does'nt work as needed when ghosted. This is the
> >situation where a home directory in a map exported from a server, is
> >actually not available (eg does not exist) and someone logs into the
> >account using wu-ftpd. In this case wu-ftpd thinks all is ok but of course
> >an error is returned when the directory access is attempted. In fact an
> >error should have been returned at login. Further, I believe this can be
> >solved with as little as an additional revalidate call in sys_stat (I
> >think the problem call was sys_stst ???).
> >
> >
> >
> The find(1) issue is fairly recent.   This check was added some time
> within the last two years (?) and only appears in the latest distros.
>
> Another problem were the ACL patches for ls(1) and friends.  I *really*
> think they should be lgetxattr ing instead of getxattr.  They even
> explicitly check via an lstat _before hand_ to verify if the file
> S_ISLNK, and only then will it getxattr if it isn't.  Why not extend
> it?   I duno.

Looks like I have more testing to do to get a better feel for the way this
behaves.

>
> >>This is the subtle difference between direct and indirect maps.   The
> >>direct map keys are absolute paths, not path components.  We are
> >>implementing direct mounts as individual filesystems that will trap on
> >>traversal into their base directory.  This filesystem has no idea where
> >>it is located as far as the user is concerned.  We need to tell the
> >>filesystem directly so that the usermode helper can look it up.
> >>Conversely, the indirect map uses the sub-directory name as a mapkey.
> >>
> >>
> >
> >I'm not sure what you are saying here. Does this mean there is a mount for
> >every direct mount (this might be what you call a trigger)?
> >
> >
> >
> Yes, it is its own filesystem (type autofs).  This is needed because we
> need to overlay direct triggers within NFS filesystems for multimounts.

Ahh. I see, you are talking about the cross filesystem problem. I haven't
solved that in what I have done either. Fortuneately I still get a good
hit rate in satisfying peoples' needs as in practice many people don't use
full automounter functionality.

>
> Browsing however obviously doesn't need that because we control the
> parent directory.
>
> >AIX implemented automounts by mounting everything in each map. This
> >made the mount listing very ugly.
> >
> >
> >
> ??  Really?  I find that hard to believe.  I thought Solaris shared it's
> automounter with HPUX and AIX.  I may be wrong though.

Old versions perhaps. AIX 4.x was the last I used. It was definately like
that then. 500+ automounts tends to cluter the mount display a bit.

> >Mmm. The vfsmount_lock is available to modules in 2.6. At least it was in
> >test11. I'm sure I compiled the module under 2.6 as well???
> >
> >I thought that, taking the dcache_lock was the correct thing to do when
> >traversing a dentry list?
> >
> >
> >
> Walking dentrys still takes the dcache_lock, however walking vfsmounts
> takes the vfsmount_lock.  dcache_lock is no longer used for fast path
> walking either (to the best of my understanding).
>
> find . -name '*.[ch]' -not -path '*SCCS*' | xargs grep vfsmount_lock |
> grep EXPORT

Strange. How does the module compile I wonder? How does it load without
unresolved symbols? Another little mystery to work on.

>
> shows no results for vfsmount_lock being exported to modules in 2.6.
>
> >
> >The autofs4 moudule blocks (auto) mounts during the umount callback.
> >Surely this is the sensible thing to do.
> >
> >
> >
> The raciness comes from the fact that we now support the lazy-mounting
> of multimount offsets using embedded direct mounts.  Autofs4 mounts all
> (or as much as it can) from the multimount all together, and unmounts it
> all on expiry.

But 4.1 does lazy mount for several maps. Mounts that are triggered
during the umount step of the expire are put on a wait queue along with
the task requesting the umount. I think autofs always worked that way.

> >
> >Hang on. From the discussion my impression of a lazy mount is that it is
> >not actually mounted!
> >
> >
> >
> Lazy _un_mounts as opposed to lazy mounts. Lazy unmounts are described
> in umount(8):

But will this leave the filesystem in a consistent state and allow further
mount activity on that mount point?

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 17:34       ` H. Peter Anvin
  2004-01-08 19:41         ` Mike Waychison
  2004-01-08 23:42         ` Michael Clark
@ 2004-01-09 18:32         ` Ian Kent
  2004-01-09 20:52           ` Mike Waychison
  2 siblings, 1 reply; 85+ messages in thread
From: Ian Kent @ 2004-01-09 18:32 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

On Thu, 8 Jan 2004, H. Peter Anvin wrote:

> Ian Kent wrote:
> >
> > If wildcard map entries are not in autofs v3 then Jeremy implemented this
> > in v4.
> >
>
> v3 has had wildcard map entries and substitutions for a very, very, very
> long time... it was a v2 feature, in fact.
>
> > And yes the host map is basically a program map and that's all. Worse, as
> > pointed out in the paper it mounts everything under it. This is a source
> > of stress for mount and umount. I have put in a fair bit of time on ugly
> > hacks to work around this. This same problem is also evident in startup
> > and shutdown for master maps with a good number of entries (~50 or more).
> > A consequence of the current multiple daemon approach.
>
> This is why one wants to implement a mount tree with "direct mount
> pads"; which also means keeping some state in the daemon.
>
> For example, let's say one has a mount tree like:
>
> /foo		server1:/export/foo \
> /foo/bar	server1:/export/bar \
> /bar		server2:/export/bar
>
> ... then you actually have four diffenent filesystems involved: first,
> some kind of "scaffolding" (this can be part of the autofs filesystem
> itself or a ramfs) that hold the "foo" and "bar" directories, and then
> foo, foo/bar, and bar.
>
> Consider the following implementation: when one encounters the above,
> the daemon stashes this away as an already-encountered map entry (in
> case the map entries change, we don't want to be inconsistent), creates
> a ramfs for the scaffolding, creates the "foo" and "bar" subdirectories
> and mount-traps "foo" and "bar".  Then it releases userspace.  When it
> encounters an access on "foo", it gets invoked again, looks it up in its
> "partial mounts" state, then mounts "foo" and mount-traps "foo/bar",
> then releases userspace.
>

Umm. The cross filesystem problem again.

This may sound a little silly but it may be able to be done using
stackable filesystem methods (aka. Zadok et. al.). I'm thinking of an
autofs filesystem stacked on a host filesystem. The dentrys corresponding
to mount points marked in some way and the mount occuring under it, on top
of the host filesystem. Yes I know it sounds ugly but maybe it's not.
Maybe it's actually quite simple. I can't give an opinion yet as I'm still
thinking it through and haven't done any feasibility. However, this
approach would lend itself to providing autofs filesystem transparency. A
requirement as yet not discussed.

Ian





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 18:20         ` Ian Kent
@ 2004-01-09 20:06           ` Mike Waychison
  2004-01-10  5:43             ` Ian Kent
  2004-01-09 20:51           ` Jim Carter
  1 sibling, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 20:06 UTC (permalink / raw)
  To: Ian Kent; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 5300 bytes --]

Ian Kent wrote:

>On Thu, 8 Jan 2004, Mike Waychison wrote:
>
>  
>
>>The direct map 'triggers' will be taken care of by another filesystem
>>with a magic root directory that will catch traversals using some
>>follow_link magic.   I wrote a prototype for this last summer, but
>>haven't released it as the userspace stuff completely does not fit in
>>with the existing daemon that was out at the time do the the mess of
>>glue that was pgids, pipes and processes.   It worked in the simple
>>case, but it didn't extend to being able to direct mount an indirect
>>map, nor was it able to do the lazy mounting in multimounts as I had
>>desired.
>>    
>>
>
>Is this the stuf that Al Viro is working on?
>
>  
>
Al is proposing doing the same thing directly in the VFS instead of 
using a magic filesystem as I've described in the document.  

> Indeed, I
>haven't solved my requirement of a transparent autofs filesystem aka.
>Solaris automounter again. A difficult problem that will require
>considerable effort.
>
>  
>
What do you mean by this?  Something that doesn't show up in 
/proc/mounts?  I don't see this as much of an issue..  On any decently 
large machine, there are so many entries anyway that /etc/mtab and 
/proc/mounts become humanly unparseable anyhow.

>>>>This is the subtle difference between direct and indirect maps.   The
>>>>direct map keys are absolute paths, not path components.  We are
>>>>implementing direct mounts as individual filesystems that will trap on
>>>>traversal into their base directory.  This filesystem has no idea where
>>>>it is located as far as the user is concerned.  We need to tell the
>>>>filesystem directly so that the usermode helper can look it up.
>>>>Conversely, the indirect map uses the sub-directory name as a mapkey.
>>>>
>>>>
>>>>        
>>>>
>>>I'm not sure what you are saying here. Does this mean there is a mount for
>>>every direct mount (this might be what you call a trigger)?
>>>
>>>
>>>
>>>      
>>>
>>Yes, it is its own filesystem (type autofs).  This is needed because we
>>need to overlay direct triggers within NFS filesystems for multimounts.
>>    
>>
>
>Ahh. I see, you are talking about the cross filesystem problem. I haven't
>solved that in what I have done either. Fortuneately I still get a good
>hit rate in satisfying peoples' needs as in practice many people don't use
>full automounter functionality.
>
>  
>
Yup.  But still, one of the nice things about a full automounter 
solution is accessing a netapp with hundreds of exports through /net in 
a reasonably fast way.

>>??  Really?  I find that hard to believe.  I thought Solaris shared it's
>>automounter with HPUX and AIX.  I may be wrong though.
>>    
>>
>
>Old versions perhaps. AIX 4.x was the last I used. It was definately like
>that then. 500+ automounts tends to cluter the mount display a bit.
>
>  
>
Could be.  Either way, on a system with a thousand NFS shares 
automounted, I'm not really concerned about what the mounttable looks 
like.  It won't be human parseable anyway.

>>>Mmm. The vfsmount_lock is available to modules in 2.6. At least it was in
>>>test11. I'm sure I compiled the module under 2.6 as well???
>>>
>>>I thought that, taking the dcache_lock was the correct thing to do when
>>>traversing a dentry list?
>>>
>>>
>>>
>>>      
>>>
>>Walking dentrys still takes the dcache_lock, however walking vfsmounts
>>takes the vfsmount_lock.  dcache_lock is no longer used for fast path
>>walking either (to the best of my understanding).
>>
>>find . -name '*.[ch]' -not -path '*SCCS*' | xargs grep vfsmount_lock |
>>grep EXPORT
>>    
>>
>
>Strange. How does the module compile I wonder? How does it load without
>unresolved symbols? Another little mystery to work on.
>
>  
>
No, you're module doesn't use vfsmount_lock.  At least the module in 
autofs4-2.4-module-20031201.tar.gz doesn't.

>>The raciness comes from the fact that we now support the lazy-mounting
>>of multimount offsets using embedded direct mounts.  Autofs4 mounts all
>>(or as much as it can) from the multimount all together, and unmounts it
>>all on expiry.
>>    
>>
>
>But 4.1 does lazy mount for several maps. Mounts that are triggered
>during the umount step of the expire are put on a wait queue along with
>the task requesting the umount. I think autofs always worked that way.
>
>  
>
This isn't lazy mounting per se.  If you are talking about autofs4's use 
of AUTOFS_INF_EXPIRING, it's there to prevent somebody from walking into 
a multimount while it is expiring. 

>>>Hang on. From the discussion my impression of a lazy mount is that it is
>>>not actually mounted!
>>>
>>>
>>>
>>>      
>>>
>>Lazy _un_mounts as opposed to lazy mounts. Lazy unmounts are described
>>in umount(8):
>>    
>>
>
>But will this leave the filesystem in a consistent state and allow further
>mount activity on that mount point?
>  
>
The underlying autofs filesystem?  Sure.


-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 23:42         ` Michael Clark
@ 2004-01-09 20:28           ` Mike Waychison
  2004-01-09 20:54             ` H. Peter Anvin
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 20:28 UTC (permalink / raw)
  To: Michael Clark
  Cc: H. Peter Anvin, Ian Kent, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2194 bytes --]

Michael Clark wrote:

> On 01/09/04 01:34, H. Peter Anvin wrote:
>
>> In many ways this returns to the simplicity of the autofs v3 design 
>> where the atomicity constraints where guaranteed by the VFS itself, 
>> *as long as* mount traps can be atomically destroyed with umounting 
>> the underlying filesystem.
>
>
> Do we need to revive Tigran's forced unmount patch 'badfs' ala FreeBSD's
> deadfs? Although it doesn't guarantee atomic unmount, it could help
> a lot with the tendancy to get stuck autofs mounts.
>
>   http://tinyurl.com/2hto8
>
> I've been long waiting for this functionality in mainline.


This is an interesting approach to killing off a mountpoint.  However, 
the problem in question is not the destruction of the mountpoints, but 
rather being able to 
check_activity_of_a_hierarchy_of_mountpoints/unmount_them_together 
atomically.  This cannot be done cleanly in userspace even when given an 
interface to do the check, someone can race in before userspace 
initiates the unmounts.  The alternative is to have userspace detach the 
hierarchy of mountpoints using the '-l' option to umount(8), but then we 
may still unneccesarily unmount the filesystem will someone is in it. 

I think that both HPA and I agree that this capability is needed in 
order to support lazy mounting of multimounts properly.    The issue 
that remains is *how* to do it. 

>
> I wonder if binding badfs over the mountpoint at the beginning of the
> potentially lengthy unmount process would improve the atomicity
> to userspace. ie although the unmount would proceed in the background,
> badfs would have been mounted at that point at the start of the process
> - mounts are atomic no?
>
> ~mc
>
The time required to unmount something is constant if we detach the 
mountpoint using a lazy umount.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 18:20         ` Ian Kent
  2004-01-09 20:06           ` Mike Waychison
@ 2004-01-09 20:51           ` Jim Carter
  2004-01-10  5:56             ` Ian Kent
  1 sibling, 1 reply; 85+ messages in thread
From: Jim Carter @ 2004-01-09 20:51 UTC (permalink / raw)
  To: Ian Kent; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

On Sat, 10 Jan 2004, Ian Kent wrote:
> On Thu, 8 Jan 2004, Mike Waychison wrote:
> > This module will have its own new autofs module (hopefully named
> > something other than autofs to avoid confusion/mishaps).  The VFS will

autofs v3 -> autofs.o
autofs v4 -> autofs4.o
May I suggest autofs5.o?  It should still be named "autofs-something",
after all.

James F. Carter          Voice 310 825 2897    FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA  90095-1555
Email: jimc@math.ucla.edu    http://www.math.ucla.edu/~jimc (q.v. for PGP key)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 18:32         ` Ian Kent
@ 2004-01-09 20:52           ` Mike Waychison
  2004-01-10  6:05             ` Ian Kent
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 20:52 UTC (permalink / raw)
  To: Ian Kent; +Cc: H. Peter Anvin, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]

Ian Kent wrote:

>On Thu, 8 Jan 2004, H. Peter Anvin wrote:
>
>  
>
>>Ian Kent wrote:
>>    
>>
>>>If wildcard map entries are not in autofs v3 then Jeremy implemented this
>>>in v4.
>>>
>>>      
>>>
>>v3 has had wildcard map entries and substitutions for a very, very, very
>>long time... it was a v2 feature, in fact.
>>
>>    
>>
>>>And yes the host map is basically a program map and that's all. Worse, as
>>>pointed out in the paper it mounts everything under it. This is a source
>>>of stress for mount and umount. I have put in a fair bit of time on ugly
>>>hacks to work around this. This same problem is also evident in startup
>>>and shutdown for master maps with a good number of entries (~50 or more).
>>>A consequence of the current multiple daemon approach.
>>>      
>>>
>>This is why one wants to implement a mount tree with "direct mount
>>pads"; which also means keeping some state in the daemon.
>>
>>For example, let's say one has a mount tree like:
>>
>>/foo		server1:/export/foo \
>>/foo/bar	server1:/export/bar \
>>/bar		server2:/export/bar
>>
>>... then you actually have four diffenent filesystems involved: first,
>>some kind of "scaffolding" (this can be part of the autofs filesystem
>>itself or a ramfs) that hold the "foo" and "bar" directories, and then
>>foo, foo/bar, and bar.
>>
>>Consider the following implementation: when one encounters the above,
>>the daemon stashes this away as an already-encountered map entry (in
>>case the map entries change, we don't want to be inconsistent), creates
>>a ramfs for the scaffolding, creates the "foo" and "bar" subdirectories
>>and mount-traps "foo" and "bar".  Then it releases userspace.  When it
>>encounters an access on "foo", it gets invoked again, looks it up in its
>>"partial mounts" state, then mounts "foo" and mount-traps "foo/bar",
>>then releases userspace.
>>
>>    
>>
>
>Umm. The cross filesystem problem again.
>
>This may sound a little silly but it may be able to be done using
>stackable filesystem methods (aka. Zadok et. al.). I'm thinking of an
>autofs filesystem stacked on a host filesystem. The dentrys corresponding
>to mount points marked in some way and the mount occuring under it, on top
>of the host filesystem. Yes I know it sounds ugly but maybe it's not.
>Maybe it's actually quite simple. I can't give an opinion yet as I'm still
>thinking it through and haven't done any feasibility. However, this
>approach would lend itself to providing autofs filesystem transparency. A
>requirement as yet not discussed.
>
>Ian
>
>  
>
Doing stackable filesystems is still an area of OS research.  It turns 
out to be a very hard problem to solve (if it's possible at all).   
Although there are systems in the wild that appear to work, they are 
usually sub-optimal because there remains alot of issues such as 
maintaining coherent caches, as well as just staying coherent given that 
one filesystem may be directly accessible while also accessed from 
another overlayed filesystem.

Not really something you'd want to waste alot of time on unless your 
looking for a phd thesis. ;)

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:28           ` Mike Waychison
@ 2004-01-09 20:54             ` H. Peter Anvin
  2004-01-09 21:43               ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-09 20:54 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Michael Clark, Ian Kent, autofs mailing list, Kernel Mailing List

Mike Waychison wrote:
> 
> This is an interesting approach to killing off a mountpoint.  However,
> the problem in question is not the destruction of the mountpoints, but
> rather being able to
> check_activity_of_a_hierarchy_of_mountpoints/unmount_them_together
> atomically.  This cannot be done cleanly in userspace even when given an
> interface to do the check, someone can race in before userspace
> initiates the unmounts.  The alternative is to have userspace detach the
> hierarchy of mountpoints using the '-l' option to umount(8), but then we
> may still unneccesarily unmount the filesystem will someone is in it.
> I think that both HPA and I agree that this capability is needed in
> order to support lazy mounting of multimounts properly.    The issue
> that remains is *how* to do it.
> 

I would argue even stronger: allowing the administrator to umount
directories manually is a hard requirement.  This means that partial
hierarchies *will* occur.  Thus, relying on the hierarchy being
atomically destructed in inherently broken.

This means that constructing the hierarchy with direct-mount automount
triggers in between the filesystems is mandatory; you get lazy mounting
for free, then -- it's a userspace policy decision whether or not to
release the waiting processes before the hierarchy is complete or not.

Now, once you recognize that the administrator needs to be able to do
umounts, expiry in userspace becomes quite trivial, since expiry is
inherently probabilistic: it can simply mimic an administrator preening
the trees, and if it fails, stop (or re-mount the submounts, policy
decision.)  Having a simple kernel-assist to avoid needless umount
operations is a good thing if (and only if!) it's cheap, but it doesn't
have to be foolproof.

Again, the atomicity constraint that umounting a filesystem needs to
destroy the mount traps above it derives from the need to cleanly deal
with nonatomic destruction.

>
> The time required to unmount something is constant if we detach the
> mountpoint using a lazy umount.
> 

You probably don't want to do that -- you could end up with some really
odd timing-related bugs if you then re-mount the filesystem.  It's also
unnecessary, since expiry is not a triggered event and therefore doesn't
keep anything that needs to happen from happening.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:54             ` H. Peter Anvin
@ 2004-01-09 21:43               ` Mike Waychison
  0 siblings, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 21:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Clark, Ian Kent, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3876 bytes --]

H. Peter Anvin wrote:

>Mike Waychison wrote:
>  
>
>>This is an interesting approach to killing off a mountpoint.  However,
>>the problem in question is not the destruction of the mountpoints, but
>>rather being able to
>>check_activity_of_a_hierarchy_of_mountpoints/unmount_them_together
>>atomically.  This cannot be done cleanly in userspace even when given an
>>interface to do the check, someone can race in before userspace
>>initiates the unmounts.  The alternative is to have userspace detach the
>>hierarchy of mountpoints using the '-l' option to umount(8), but then we
>>may still unneccesarily unmount the filesystem will someone is in it.
>>I think that both HPA and I agree that this capability is needed in
>>order to support lazy mounting of multimounts properly.    The issue
>>that remains is *how* to do it.
>>
>>    
>>
>
>I would argue even stronger: allowing the administrator to umount
>directories manually is a hard requirement.  This means that partial
>hierarchies *will* occur.  Thus, relying on the hierarchy being
>atomically destructed in inherently broken.
>  
>
Yes, but they shouldn't occur due to normal operation of the system.  
Yes, the administrator can manually prune things away, yet the remaining 
bits should still be able to expire atomically.

On the other end of the spectrum is the situation where if I had 
accessed my homedir, /home/mikew, and then I manually mounted something 
in /home/mikew/mnt as root in another window, /home/mikew should _not_ 
expire.  /home/mikew/mnt is not managed by the automounter, so it 
shouldn't be expired by it either.

>This means that constructing the hierarchy with direct-mount automount
>triggers in between the filesystems is mandatory; you get lazy mounting
>for free, then -- it's a userspace policy decision whether or not to
>release the waiting processes before the hierarchy is complete or not.
>
>  
>
Yes, and this policy in my proposal is handled by the automount 
useragent.  The system is constructed such that any waiting processes 
are released when the useragent dies off.  If userspace wanted to let 
people in before it finished construction, it would fork and exit in the 
parent process.

>Now, once you recognize that the administrator needs to be able to do
>umounts, expiry in userspace becomes quite trivial, since expiry is
>inherently probabilistic: it can simply mimic an administrator preening
>the trees, and if it fails, stop (or re-mount the submounts, policy
>decision.)  Having a simple kernel-assist to avoid needless umount
>operations is a good thing if (and only if!) it's cheap, but it doesn't
>have to be foolproof.
>
>  
>
But it doesn't work as a daemon when you have namespaces created left 
and right.  It *would maybe* work as a cron job, if cron was namespace 
aware.

>Again, the atomicity constraint that umounting a filesystem needs to
>destroy the mount traps above it derives from the need to cleanly deal
>with nonatomic destruction.
>
>  
>
??

>>The time required to unmount something is constant if we detach the
>>mountpoint using a lazy umount.
>>
>>    
>>
>
>You probably don't want to do that -- you could end up with some really
>odd timing-related bugs if you then re-mount the filesystem.  It's also
>unnecessary, since expiry is not a triggered event and therefore doesn't
>keep anything that needs to happen from happening.
>
>  
>
Off the top of my head, I don't see any issues, but you are right in 
that something may creep up. 

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:06           ` Mike Waychison
@ 2004-01-10  5:43             ` Ian Kent
  2004-01-12 13:07               ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: Ian Kent @ 2004-01-10  5:43 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Fri, 9 Jan 2004, Mike Waychison wrote:

>
> > Indeed, I
> >haven't solved my requirement of a transparent autofs filesystem aka.
> >Solaris automounter again. A difficult problem that will require
> >considerable effort.
> >
> >
> >
> What do you mean by this?  Something that doesn't show up in
> /proc/mounts?  I don't see this as much of an issue..  On any decently
> large machine, there are so many entries anyway that /etc/mtab and
> /proc/mounts become humanly unparseable anyhow.

Transparency of an autofs filesystem (as I'm calling it) is the situation
where, given a map

/usr	/man1	server:/usr/man1
	/man2	server:/usr/man2

where the filesystem /usr contains, say a directory lib, that needs to be
available while also seeing the automounted directories.

>
> >>>Mmm. The vfsmount_lock is available to modules in 2.6. At least it was in
> >>>test11. I'm sure I compiled the module under 2.6 as well???
> >>>
> >>>I thought that, taking the dcache_lock was the correct thing to do when
> >>>traversing a dentry list?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>Walking dentrys still takes the dcache_lock, however walking vfsmounts
> >>takes the vfsmount_lock.  dcache_lock is no longer used for fast path
> >>walking either (to the best of my understanding).
> >>
> >>find . -name '*.[ch]' -not -path '*SCCS*' | xargs grep vfsmount_lock |
> >>grep EXPORT
> >>
> >>
> >
> >Strange. How does the module compile I wonder? How does it load without
> >unresolved symbols? Another little mystery to work on.
> >
> >
> >
> No, you're module doesn't use vfsmount_lock.  At least the module in
> autofs4-2.4-module-20031201.tar.gz doesn't.

This is the 2.4 code. I do (or though I was able to) use the vfsmount_lock
in the 2.6 patches I have in
kernel.org/pub/linux/kernel/people/raven/autofs4-2.6. This is bad for me.

>
> >>The raciness comes from the fact that we now support the lazy-mounting
> >>of multimount offsets using embedded direct mounts.  Autofs4 mounts all
> >>(or as much as it can) from the multimount all together, and unmounts it
> >>all on expiry.
> >>
> >>
> >
> >But 4.1 does lazy mount for several maps. Mounts that are triggered
> >during the umount step of the expire are put on a wait queue along with
> >the task requesting the umount. I think autofs always worked that way.
> >
> >
> >
> This isn't lazy mounting per se.  If you are talking about autofs4's use
> of AUTOFS_INF_EXPIRING, it's there to prevent somebody from walking into
> a multimount while it is expiring.

Or any umount when sending the expire request to userspace.




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:51           ` Jim Carter
@ 2004-01-10  5:56             ` Ian Kent
  0 siblings, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-10  5:56 UTC (permalink / raw)
  To: Jim Carter; +Cc: Mike Waychison, autofs mailing list, Kernel Mailing List

On Fri, 9 Jan 2004, Jim Carter wrote:

> On Sat, 10 Jan 2004, Ian Kent wrote:
> > On Thu, 8 Jan 2004, Mike Waychison wrote:
> > > This module will have its own new autofs module (hopefully named
> > > something other than autofs to avoid confusion/mishaps).  The VFS will
>
> autofs v3 -> autofs.o
> autofs v4 -> autofs4.o
> May I suggest autofs5.o?  It should still be named "autofs-something",
> after all.
>

Nop. I will continue to develop under the v4 banner. As far as I'm
concerned Peter Anvin has claimed v5 and I don't want to challenge that.
Mike Waychisons' initiative may possibly be called v6???

In any case the module works fine with v3 and v4 (I haven't tested
4.0.0pre10 for a while though). The 4.1 daemon detects the enhanced module
if present. It is currently dubed 4.04. The 'plays well with others' is a
self imposed design requirement.

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:52           ` Mike Waychison
@ 2004-01-10  6:05             ` Ian Kent
  0 siblings, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-10  6:05 UTC (permalink / raw)
  To: Mike Waychison; +Cc: H. Peter Anvin, autofs mailing list, Kernel Mailing List

On Fri, 9 Jan 2004, Mike Waychison wrote:

> >
> >This may sound a little silly but it may be able to be done using
> >stackable filesystem methods (aka. Zadok et. al.). I'm thinking of an
> >autofs filesystem stacked on a host filesystem. The dentrys corresponding
> >to mount points marked in some way and the mount occuring under it, on top
> >of the host filesystem. Yes I know it sounds ugly but maybe it's not.
> >Maybe it's actually quite simple. I can't give an opinion yet as I'm still
> >thinking it through and haven't done any feasibility. However, this
> >approach would lend itself to providing autofs filesystem transparency. A
> >requirement as yet not discussed.
> >
> >Ian
> >
> >
> >
> Doing stackable filesystems is still an area of OS research.  It turns
> out to be a very hard problem to solve (if it's possible at all).
> Although there are systems in the wild that appear to work, they are
> usually sub-optimal because there remains alot of issues such as
> maintaining coherent caches, as well as just staying coherent given that
> one filesystem may be directly accessible while also accessed from
> another overlayed filesystem.

Yes I see that in what I've read.

But I'm thinking of a very tightly controlled autofs layer controlled
only by automount. Once owned by automount that part of the underlying fs
could only be accessed via automount. The boundry cases obviously are
a sensitive area.

>
> Not really something you'd want to waste alot of time on unless your
> looking for a phd thesis. ;)

A masters one day might be good.

Ian




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-10  5:43             ` Ian Kent
@ 2004-01-12 13:07               ` Mike Waychison
  2004-01-12 16:01                 ` raven
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-12 13:07 UTC (permalink / raw)
  To: Ian Kent; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

Ian Kent wrote:

>On Fri, 9 Jan 2004, Mike Waychison wrote:
>
>  
>
>>>Indeed, I
>>>haven't solved my requirement of a transparent autofs filesystem aka.
>>>Solaris automounter again. A difficult problem that will require
>>>considerable effort.
>>>
>>>
>>>
>>>      
>>>
>>What do you mean by this?  Something that doesn't show up in
>>/proc/mounts?  I don't see this as much of an issue..  On any decently
>>large machine, there are so many entries anyway that /etc/mtab and
>>/proc/mounts become humanly unparseable anyhow.
>>    
>>
>
>Transparency of an autofs filesystem (as I'm calling it) is the situation
>where, given a map
>
>/usr	/man1	server:/usr/man1
>	/man2	server:/usr/man2
>
>where the filesystem /usr contains, say a directory lib, that needs to be
>available while also seeing the automounted directories.
>
>  
>
I see.  This requires direct mount triggers to do properly.  Trying to 
do it with some sort of passthrough to the underlying filesystem is a 
nightmare waiting to happen..

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 13:07               ` Mike Waychison
@ 2004-01-12 16:01                 ` raven
  2004-01-12 16:26                   ` Mike Waychison
                                     ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: raven @ 2004-01-12 16:01 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Mon, 12 Jan 2004, Mike Waychison wrote:

> >
> >Transparency of an autofs filesystem (as I'm calling it) is the situation
> >where, given a map
> >
> >/usr	/man1	server:/usr/man1
> >	/man2	server:/usr/man2
> >
> >where the filesystem /usr contains, say a directory lib, that needs to be
> >available while also seeing the automounted directories.
> >
> >  
> >
> I see.  This requires direct mount triggers to do properly.  Trying to 
> do it with some sort of passthrough to the underlying filesystem is a 
> nightmare waiting to happen..
> 

So what are we saying here?

We install triggers at /usr/man1 and /usr/man2.
Then suppose the map had a nobrowse option.
Does the trigger also take care of hiding man1 and man2?

Is there some definition of these triggers?

Ian


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:01                 ` raven
@ 2004-01-12 16:26                   ` Mike Waychison
  2004-01-12 22:50                     ` Tim Hockin
  2004-01-12 16:28                   ` raven
  2004-01-13 18:46                   ` Mike Waychison
  2 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-12 16:26 UTC (permalink / raw)
  To: raven; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]

raven@themaw.net wrote:

>On Mon, 12 Jan 2004, Mike Waychison wrote:
>
>  
>
>>>Transparency of an autofs filesystem (as I'm calling it) is the situation
>>>where, given a map
>>>
>>>/usr	/man1	server:/usr/man1
>>>	/man2	server:/usr/man2
>>>
>>>where the filesystem /usr contains, say a directory lib, that needs to be
>>>available while also seeing the automounted directories.
>>>
>>> 
>>>
>>>      
>>>
>>I see.  This requires direct mount triggers to do properly.  Trying to 
>>do it with some sort of passthrough to the underlying filesystem is a 
>>nightmare waiting to happen..
>>
>>    
>>
>
>So what are we saying here?
>
>We install triggers at /usr/man1 and /usr/man2.
>Then suppose the map had a nobrowse option.
>Does the trigger also take care of hiding man1 and man2?
>
>Is there some definition of these triggers?
>  
>
The example above is a direct map entry with no root offset.  The 
semantics are different than if it were an indirect map with browsing 
enable. 

I tested this out against other automount implementations and discovered 
that direct map entries with no root offsets should be broken down into 
several direct map entries with root offsets.. so:

/usr   /man1   server:/usr/man1   \
          /man2   server:/usr/man2

is the same as the two distinct entries:

/usr/man1   server:/usr/man1
/usr/man2   server:/usr/man2

Now that I think about it, the discussion in my proposal paper about 
multimounts with no root offsets probably isn't required.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:01                 ` raven
  2004-01-12 16:26                   ` Mike Waychison
@ 2004-01-12 16:28                   ` raven
  2004-01-12 16:58                     ` Mike Waychison
  2004-01-13 18:46                   ` Mike Waychison
  2 siblings, 1 reply; 85+ messages in thread
From: raven @ 2004-01-12 16:28 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Tue, 13 Jan 2004 raven@themaw.net wrote:

> On Mon, 12 Jan 2004, Mike Waychison wrote:
> 
> > >
> > >Transparency of an autofs filesystem (as I'm calling it) is the situation
> > >where, given a map
> > >
> > >/usr	/man1	server:/usr/man1
> > >	/man2	server:/usr/man2
> > >
> > >where the filesystem /usr contains, say a directory lib, that needs to be
> > >available while also seeing the automounted directories.
> > >
> > >  
> > >
> > I see.  This requires direct mount triggers to do properly.  Trying to 
> > do it with some sort of passthrough to the underlying filesystem is a 
> > nightmare waiting to happen..
> > 
> 
> So what are we saying here?
> 
> We install triggers at /usr/man1 and /usr/man2.
> Then suppose the map had a nobrowse option.
> Does the trigger also take care of hiding man1 and man2?
> 
> Is there some definition of these triggers?
> 

And I have another question concerning namespaces.

Given that there may be several namespaces, each of which may or may not 
have a trigger on this dentry, is there some sort of list of triggers?

How do the triggers know who owns them?

Ian


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:28                   ` raven
@ 2004-01-12 16:58                     ` Mike Waychison
  2004-01-13  1:54                       ` Ian Kent
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-12 16:58 UTC (permalink / raw)
  To: raven; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

raven@themaw.net wrote:

>On Tue, 13 Jan 2004 raven@themaw.net wrote:
>
>  
>
>>On Mon, 12 Jan 2004, Mike Waychison wrote:
>>
>>    
>>
>>>>Transparency of an autofs filesystem (as I'm calling it) is the situation
>>>>where, given a map
>>>>
>>>>/usr	/man1	server:/usr/man1
>>>>	/man2	server:/usr/man2
>>>>
>>>>where the filesystem /usr contains, say a directory lib, that needs to be
>>>>available while also seeing the automounted directories.
>>>>
>>>> 
>>>>
>>>>        
>>>>
>>>I see.  This requires direct mount triggers to do properly.  Trying to 
>>>do it with some sort of passthrough to the underlying filesystem is a 
>>>nightmare waiting to happen..
>>>
>>>      
>>>
>>So what are we saying here?
>>
>>We install triggers at /usr/man1 and /usr/man2.
>>Then suppose the map had a nobrowse option.
>>Does the trigger also take care of hiding man1 and man2?
>>
>>Is there some definition of these triggers?
>>
>>    
>>
>
>And I have another question concerning namespaces.
>
>Given that there may be several namespaces, each of which may or may not 
>have a trigger on this dentry, is there some sort of list of triggers?
>
>How do the triggers know who owns them?
>
>
>  
>
This is the reason I went with using distinct filesystems to perform the 
triggers.  If we use follow_link logic, we will have a reference to the 
respective vfsmount.  Dentry's themselves know nothing about the 
triggers, as the triggers just look like a mounted filesystem.   The 
vfsmount information has enough information for autofs to call a 
userspace agent through hotplug and have userspace handle the mount.  In 
effect, there is no daemon so nobody 'owns' a trigger in the same sense 
as with autofs3/4.

As far as userspace is concerned, an autofs filesystem is mounted as is 
any other filesystem.  All that is required is a proper set of mount 
options.  For example, mounting auto_home on /home is:

mount -t autofs -o maptype=indirect,mapname=auto_home auto_home /home

Whenever somebody traverses into a subdir in /home within any namespace 
this autofs filesystem has been inherited, userspace is invoked (in that 
namespace) to perform the mount.  Again, there is no 'ownership' other 
than maybe calling the namespace it resides it the 'owner', as you would 
for any other mountpoint.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:26                   ` Mike Waychison
@ 2004-01-12 22:50                     ` Tim Hockin
  2004-01-12 23:28                       ` Mike Waychison
  2004-01-13  1:30                       ` Ian Kent
  0 siblings, 2 replies; 85+ messages in thread
From: Tim Hockin @ 2004-01-12 22:50 UTC (permalink / raw)
  To: Mike Waychison
  Cc: raven, Jim Carter, autofs mailing list, Kernel Mailing List

On Mon, Jan 12, 2004 at 11:26:30AM -0500, Mike Waychison wrote:
> /usr   /man1   server:/usr/man1   \
>          /man2   server:/usr/man2
> 
> is the same as the two distinct entries:
> 
> /usr/man1   server:/usr/man1
> /usr/man2   server:/usr/man2
> 
> Now that I think about it, the discussion in my proposal paper about 
> multimounts with no root offsets probably isn't required.

The latter requires /usr/man1 and /usr/man2 to exist.  The former only
requires /usr to exist, right?


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 22:50                     ` Tim Hockin
@ 2004-01-12 23:28                       ` Mike Waychison
  2004-01-13  1:30                       ` Ian Kent
  1 sibling, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-12 23:28 UTC (permalink / raw)
  To: Tim Hockin; +Cc: raven, Jim Carter, autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Tim Hockin wrote:

>On Mon, Jan 12, 2004 at 11:26:30AM -0500, Mike Waychison wrote:
>  
>
>>/usr   /man1   server:/usr/man1   \
>>         /man2   server:/usr/man2
>>
>>is the same as the two distinct entries:
>>
>>/usr/man1   server:/usr/man1
>>/usr/man2   server:/usr/man2
>>
>>Now that I think about it, the discussion in my proposal paper about 
>>multimounts with no root offsets probably isn't required.
>>    
>>
>
>The latter requires /usr/man1 and /usr/man2 to exist.  The former only
>requires /usr to exist, right?
>
>  
>
Traditionally, the automount system is allowed to create directories as 
needed.


-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 22:50                     ` Tim Hockin
  2004-01-12 23:28                       ` Mike Waychison
@ 2004-01-13  1:30                       ` Ian Kent
  1 sibling, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-13  1:30 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Mike Waychison, Jim Carter, autofs mailing list, Kernel Mailing List

On Mon, 12 Jan 2004, Tim Hockin wrote:

> On Mon, Jan 12, 2004 at 11:26:30AM -0500, Mike Waychison wrote:
> > /usr   /man1   server:/usr/man1   \
> >          /man2   server:/usr/man2
> >
> > is the same as the two distinct entries:
> >
> > /usr/man1   server:/usr/man1
> > /usr/man2   server:/usr/man2
> >
> > Now that I think about it, the discussion in my proposal paper about
> > multimounts with no root offsets probably isn't required.
>
> The latter requires /usr/man1 and /usr/man2 to exist.  The former only
> requires /usr to exist, right?
>

That's one possibility, but man1 and man2 could simply not call filler in
the readdir call.

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:58                     ` Mike Waychison
@ 2004-01-13  1:54                       ` Ian Kent
  2004-01-13 19:01                         ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: Ian Kent @ 2004-01-13  1:54 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jim Carter, autofs mailing list, Kernel Mailing List

On Mon, 12 Jan 2004, Mike Waychison wrote:

> >
> >And I have another question concerning namespaces.
> >
> >Given that there may be several namespaces, each of which may or may not
> >have a trigger on this dentry, is there some sort of list of triggers?
> >
> >How do the triggers know who owns them?
> >
> >
> >
> >
> This is the reason I went with using distinct filesystems to perform the
> triggers.  If we use follow_link logic, we will have a reference to the
> respective vfsmount.  Dentry's themselves know nothing about the
> triggers, as the triggers just look like a mounted filesystem.   The
> vfsmount information has enough information for autofs to call a
> userspace agent through hotplug and have userspace handle the mount.  In
> effect, there is no daemon so nobody 'owns' a trigger in the same sense
> as with autofs3/4.

I'm not familiar with the follow_link mechanism (no prob. I'll pick it up
as I go).

Correct me if I'm wrong but, the only thing that I can see that is
duplicated in cloning a namespace is the root dentry. The rest of the
dentries on the system remain the same. The increase in complexity to the
VFS to change this would be prohibitive.

I see we want the triggers in the vfsmount struct. Is this a good idea?
The vfsmount struct has always been difficult to get hold of during lookup
and revalidate for me (someone like to help here).

>
> As far as userspace is concerned, an autofs filesystem is mounted as is
> any other filesystem.  All that is required is a proper set of mount
> options.  For example, mounting auto_home on /home is:
>
> mount -t autofs -o maptype=indirect,mapname=auto_home auto_home /home
>
> Whenever somebody traverses into a subdir in /home within any namespace
> this autofs filesystem has been inherited, userspace is invoked (in that
> namespace) to perform the mount.  Again, there is no 'ownership' other
> than maybe calling the namespace it resides it the 'owner', as you would
> for any other mountpoint.

The "mount all automount entries" has always been the simpler option but,
as you know, the kernel still allows only 255 anonymous mounts. This would
have to be the first order of business. Ohh, I was supposed to be working
on sysctl inerface for that. I'll just be quiet now.

Also, something needs to be done about mount table noise. Several hundred
entries is very bad from an administration viewpoint.

Except for the cross namespace issues, which I'm still digesting, I can't
see why your design can't be done entirely as a filesystem using dentries
instead of vfsmount, including expirey. Perhaps you could reinterate a few
of the reasons for this.

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:01                 ` raven
  2004-01-12 16:26                   ` Mike Waychison
  2004-01-12 16:28                   ` raven
@ 2004-01-13 18:46                   ` Mike Waychison
  2 siblings, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-13 18:46 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]

raven@themaw.net wrote:

>On Mon, 12 Jan 2004, Mike Waychison wrote:
>
>  
>
>>>Transparency of an autofs filesystem (as I'm calling it) is the situation
>>>where, given a map
>>>
>>>/usr	/man1	server:/usr/man1
>>>	/man2	server:/usr/man2
>>>
>>>where the filesystem /usr contains, say a directory lib, that needs to be
>>>available while also seeing the automounted directories.
>>>
>>> 
>>>
>>>      
>>>
>>I see.  This requires direct mount triggers to do properly.  Trying to 
>>do it with some sort of passthrough to the underlying filesystem is a 
>>nightmare waiting to happen..
>>
>>    
>>
>
>So what are we saying here?
>
>We install triggers at /usr/man1 and /usr/man2.
>Then suppose the map had a nobrowse option.
>  
>
This is a direct map.  The browse / nobrowse options do not apply to 
direct maps.

>Does the trigger also take care of hiding man1 and man2?
>
>  
>
No.  man1 and man2 appear as directories to anyone doing an lstat on 
them.  Traversing *into* them will cause filesystems to be mounted on 
them.  This appears to be similar to browsing of an indirect map at 
first, however it is a different beast.  With indirect maps, we are 
given the right to cover up /usr to help us detects stats and traversals 
into its sub-directories.  With direct entries, we don't have these 
leisure.  Everything in /usr most be accessible at all times. 

Your need for 'transparency' comes from the fact that you convert direct 
maps into indirect maps, which require the covering of /usr.

>Is there some definition of these triggers?
>
>  
>
This question is up in the air. 

I propose using a magic filesystem, whose root dentry has a follow_link 
callback defined.  When somebody walks into the filesystem, the 
follow_link is called, which does the mount onto a different dentry, and 
then forwards the original caller to the new vfsmount/dentry pair. 

HPA and Viro believe this is better done in the VFS layer directly by 
using special vfsmounts without super_blocks.  The path walking code 
would be modified to know of these 'traps' or 'triggers' natively.

Which solution is best is left as an exercise.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-13  1:54                       ` Ian Kent
@ 2004-01-13 19:01                         ` Mike Waychison
  2004-01-14 15:58                           ` raven
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-13 19:01 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3103 bytes --]

Ian Kent wrote:

>On Mon, 12 Jan 2004, Mike Waychison wrote:
>
>  
>
>>>And I have another question concerning namespaces.
>>>
>>>Given that there may be several namespaces, each of which may or may not
>>>have a trigger on this dentry, is there some sort of list of triggers?
>>>
>>>How do the triggers know who owns them?
>>>
>>>
>>>
>>>
>>>      
>>>
>>This is the reason I went with using distinct filesystems to perform the
>>triggers.  If we use follow_link logic, we will have a reference to the
>>respective vfsmount.  Dentry's themselves know nothing about the
>>triggers, as the triggers just look like a mounted filesystem.   The
>>vfsmount information has enough information for autofs to call a
>>userspace agent through hotplug and have userspace handle the mount.  In
>>effect, there is no daemon so nobody 'owns' a trigger in the same sense
>>as with autofs3/4.
>>    
>>
>
>I'm not familiar with the follow_link mechanism (no prob. I'll pick it up
>as I go).
>
>Correct me if I'm wrong but, the only thing that I can see that is
>duplicated in cloning a namespace is the root dentry. The rest of the
>dentries on the system remain the same. The increase in complexity to the
>VFS to change this would be prohibitive.
>  
>
No.  Dentries are *never* duplicated.  This goes back to Viro's work on 
allowing a filesystem to be mounted in multiple locations.  See 
http://kt.zork.net/kernel-traffic/kt20000424_64.html#9 .

What is duplicated is the current->namespace tree of vfsmounts.  After 
this is done, current->fs vfsmount members are updated to point to their 
cloned counterparts.

>I see we want the triggers in the vfsmount struct. Is this a good idea?
>The vfsmount struct has always been difficult to get hold of during lookup
>and revalidate for me (someone like to help here).
>
>  
>
If triggers in the vfsmount struct are done, then there will be no need 
to handle lookups or revalidates.  In fact, triggers in the vfsmount 
struct will not help at all for indirect maps.

>
>Also, something needs to be done about mount table noise. Several hundred
>entries is very bad from an administration viewpoint.
>  
>
I don't see what you want here.  If you have hundreds of users logged 
into the same machine, you *will* have hundreds of entries in the 
mount-table.

>Except for the cross namespace issues, which I'm still digesting, I can't
>see why your design can't be done entirely as a filesystem using dentries
>instead of vfsmount, including expirey. Perhaps you could reinterate a few
>of the reasons for this.
>  
>
My proposal uses filesystems for all automount mechanism *except* 
expiry. I see expiry as a VFS service, and strongly believe that this is 
where it belongs.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-13 19:01                         ` Mike Waychison
@ 2004-01-14 15:58                           ` raven
  0 siblings, 0 replies; 85+ messages in thread
From: raven @ 2004-01-14 15:58 UTC (permalink / raw)
  To: Mike Waychison; +Cc: autofs mailing list, Kernel Mailing List

On Tue, 13 Jan 2004, Mike Waychison wrote:

> >
> My proposal uses filesystems for all automount mechanism *except* 
> expiry. I see expiry as a VFS service, and strongly believe that this is 
> where it belongs.
> 

I'm certainly thinking alot about this and have made quite a bit of 
progress thanks to the patiience of all.

Now it think it may be time to ponder the expire mechanism.

I was thinking it might be good for me to write up a specification based 
on the discussion so far to make sure that we all have the same 
understanding of what has been discussed. Perhaps this could allow for a 
specification to follow.

Good idea or not?

Ian


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-12 16:57               ` Mike Waychison
@ 2004-01-13  7:39                 ` Ian Kent
  0 siblings, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-13  7:39 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Jeff Garzik, H. Peter Anvin, linux-kernel

On Mon, 12 Jan 2004, Mike Waychison wrote:

> Jeff Garzik wrote:
>
> >
> >
> > You're still using arguments -against- putting software in the kernel.
> > You don't decrease software's chances of "being broken" by putting it
> > in the kernel, the opposite occurs -- you increase the likelihood of
> > making the entire system unstable.  This is one point that Solaris and
> > Win32 have both missed :)
> >
> >     Jeff
> >
> I get what you're saying. :)
>
> However, doing so achieves two goals:
>     - I want kernelspace to provide mechanism, and let userspace define
> policy. In this case, the policy is even finer grained than what we had
> before and can be set at trigger time, rather than at initscript start time.
>     - I want to get rid of the old ioctl poll interface that didn't work
> in namespaces.
>
> The namespace problem effectively limits what we can do in userspace to
> simply prodding the kernel to tell _it_ to unmount stuff.  A daemon
> alone cannot unmount across namespaces.

Another important consideration is "can implementing this functionality be
significanly simplified by doing it within the kernel" or if
functionality is not otherwise able to be done in userspace. I believe
that these points were made in the original proposal.

Ian





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 23:56             ` Jeff Garzik
@ 2004-01-12 16:57               ` Mike Waychison
  2004-01-13  7:39                 ` Ian Kent
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-12 16:57 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: H. Peter Anvin, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

Jeff Garzik wrote:

>
>
> You're still using arguments -against- putting software in the kernel. 
> You don't decrease software's chances of "being broken" by putting it 
> in the kernel, the opposite occurs -- you increase the likelihood of 
> making the entire system unstable.  This is one point that Solaris and 
> Win32 have both missed :)
>
>     Jeff
>
I get what you're saying. :)

However, doing so achieves two goals:
    - I want kernelspace to provide mechanism, and let userspace define 
policy. In this case, the policy is even finer grained than what we had 
before and can be set at trigger time, rather than at initscript start time.
    - I want to get rid of the old ioctl poll interface that didn't work 
in namespaces.

The namespace problem effectively limits what we can do in userspace to 
simply prodding the kernel to tell _it_ to unmount stuff.  A daemon 
alone cannot unmount across namespaces. 

I hope this clarifies where I stand :)

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 21:02         ` H. Peter Anvin
@ 2004-01-09 21:52           ` Mike Waychison
  0 siblings, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 21:52 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: trond.myklebust, viro, linux-kernel, raven, thockin

[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]

H. Peter Anvin wrote:

>Mike Waychison wrote:
>  
>
>>H. Peter Anvin wrote:
>>
>>    
>>
>>>My point is that it's what you get for having an automounter.
>>>
>>>We can't solve Sun's designed-in braindamage, unfortunately.  This is
>>>partially why I'd like people to consider the scope of what automounting
>>>does; there are tons of policy issues not all of which are going to be
>>>appropriate in all contexts.  To some degree, if you have to have an
>>>automounter you have already lost.
>>> 
>>>      
>>>
>>However, we can solve Linux's designed-in braindamage.
>>
>>    
>>
>
>I was referring to the visibility of server-side mount points in NFS 2/3
>and the fact that most of the uses of the automounter is to work around
>this shortcoming.  This is a protocol limitation.
>
>  
>
It's just a different way of looking at it.   NFS exports filesystems,  
not namespaces.  It's the server implementation that decides it should 
try to map these exports to its local namespace.  Indeed, this is what 
exportfs and /etc/exports tries to do.  Nobody said this mapping made 
alot of sense.

>(Don't get me started on stuff like "plus lines" in map, which breaks
>the map paradigm completely.  That's brokenness on a whole other level,
>but which can be reasonably ignored.)
>
>  
>
Your on your own on that one.  I don't see it as an issue as the 
semantics are pretty well defined.

>It's trivial to crash most filesystem drivers (or get to a security leak
>level) by feeding them deliberately bad input.  Robustness against
>corruption in Linux has been with respect to likely data corruption much
>more than deliberate attacks.  It's a major effort; security-auditing
>every filesystem driver.
>  
>
Ok.  Thanks for clearing that up. 

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 21:31             ` Mike Waychison
@ 2004-01-09 21:36               ` H. Peter Anvin
  0 siblings, 0 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-09 21:36 UTC (permalink / raw)
  To: Mike Waychison
  Cc: viro, Ian Kent, autofs mailing list, Ogden, Aaron A.,
	Kernel Mailing List

Mike Waychison wrote:
>
> Unless I'm missing something, implementing this as a seperate filesystem
> type still has the appropriate atomicity guarantees as long as the VFS
> support complex expiry, whereby userspace would tag submounts as being
> part of the overall expiry for a base mountpoint.
> 

It would, but it seems like a vastly more invasive change to the VFS
than ought to be necessary.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 19:57           ` H. Peter Anvin
@ 2004-01-09 21:31             ` Mike Waychison
  2004-01-09 21:36               ` H. Peter Anvin
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 21:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: viro, Ian Kent, autofs mailing list, Ogden, Aaron A.,
	Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2050 bytes --]

H. Peter Anvin wrote:

>Mike Waychison wrote:
>  
>
>>>Special vfsmount mounted somewhere; has no superblock associated with it;
>>>attempt to step on it triggers event; normal result of that event is to
>>>get a normal mount on top of it, at which point usual chaining logics
>>>will make sure that we don't see the trap until it's uncovered by removal
>>>of covering filesystem.  Trap (and everything mounted on it, etc.) can
>>>be removed by normal lazy umount.
>>>
>>>Basically, it's a single-point analog of autofs done entirely in VFS.
>>>The job of automounter is to maintain the traps and react to events.
>>>
>>>      
>>>
>>Is there any clear advantage to doing this in the VFS other than saving
>>a superblock and a dentry/inode pair or two?
>>
>>I remember talking to you about this, and I seem to recall that these
>>mount traps would probably communicate using a struct file, so a
>>trap-user would somehow receive events about when the trap was set
>>off.   Will this communication model continue to work within a cloned
>>namespace?  What happens if the trap-client closes the file?
>>
>>    
>>
>
>The biggest issue is to ensure that the appropriate atomicity guarantees
>can be maintained.  In particular, it must be possible to umount the
>underlying filesystem and all mount traps on top of it atomically.
>Anything less will result in race conditions.
>
>	-hpa
>
>  
>
Unless I'm missing something, implementing this as a seperate filesystem 
type still has the appropriate atomicity guarantees as long as the VFS 
support complex expiry, whereby userspace would tag submounts as being 
part of the overall expiry for a base mountpoint.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 20:37       ` Mike Waychison
@ 2004-01-09 21:02         ` H. Peter Anvin
  2004-01-09 21:52           ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-09 21:02 UTC (permalink / raw)
  To: Mike Waychison; +Cc: trond.myklebust, viro, linux-kernel, raven, thockin

Mike Waychison wrote:
> H. Peter Anvin wrote:
> 
>> My point is that it's what you get for having an automounter.
>>
>> We can't solve Sun's designed-in braindamage, unfortunately.  This is
>> partially why I'd like people to consider the scope of what automounting
>> does; there are tons of policy issues not all of which are going to be
>> appropriate in all contexts.  To some degree, if you have to have an
>> automounter you have already lost.
>>  
> 
> However, we can solve Linux's designed-in braindamage.
> 

I was referring to the visibility of server-side mount points in NFS 2/3
and the fact that most of the uses of the automounter is to work around
this shortcoming.  This is a protocol limitation.

(Don't get me started on stuff like "plus lines" in map, which breaks
the map paradigm completely.  That's brokenness on a whole other level,
but which can be reasonably ignored.)

>> in particular there is no security
>> against root.  Stupid tricks like remapping uid 0 are just that; stupid
>> tricks without any real security value.  You know this, of course.
>> However, if you think the automounter doesn't have the privilege to
>> access the remote server but the user does, then that's false security.
>
> No, the security lies in the fact that the remote server knows the user
> is privileged to access it.  It's a side issue that the mount itself is
> made using an automounter.

Again, it doesn't matter if the user passes credentials to an
automounter or to the kernel.

>> Linux at this point has no ability to support actual user-mounted
>> filesystems.  There are things that could be done to remedy this, but it
>> would require massive changes to every filesystem driver as well as to
>> the VFS. 
> 
> ??  As part of our research into namespaces, we at Sun have gone through
> and tried to identify the number of semantic changes required to achieve
> user-privileged mounting, however we never saw the need to do anything
> special at all in 'each filesystem driver'.  The issue is one of a
> permission model and should be out of scope for individual filesystems.

It's trivial to crash most filesystem drivers (or get to a security leak
level) by feeding them deliberately bad input.  Robustness against
corruption in Linux has been with respect to likely data corruption much
more than deliberate attacks.  It's a major effort; security-auditing
every filesystem driver.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 21:13     ` H. Peter Anvin
  2004-01-08 22:20       ` J. Bruce Fields
@ 2004-01-09 20:37       ` Mike Waychison
  2004-01-09 21:02         ` H. Peter Anvin
  1 sibling, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 20:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: trond.myklebust, viro, linux-kernel, raven, thockin

[-- Attachment #1: Type: text/plain, Size: 2631 bytes --]

H. Peter Anvin wrote:

>My point is that it's what you get for having an automounter.
>
>We can't solve Sun's designed-in braindamage, unfortunately.  This is
>partially why I'd like people to consider the scope of what automounting
>does; there are tons of policy issues not all of which are going to be
>appropriate in all contexts.  To some degree, if you have to have an
>automounter you have already lost.
>  
>
However, we can solve Linux's designed-in braindamage.

>Also, your global machine credential is to some degree "all the security
>you get."  Any security which isn't enforced by the filesystem driver
>doesn't exist in a Unix environment;
>

What does this mean?   I don't understand.

> in particular there is no security
>against root.  Stupid tricks like remapping uid 0 are just that; stupid
>tricks without any real security value.  You know this, of course.
>However, if you think the automounter doesn't have the privilege to
>access the remote server but the user does, then that's false security.
>
>  
>
No, the security lies in the fact that the remote server knows the user 
is privileged to access it.  It's a side issue that the mount itself is 
made using an automounter.

>Linux at this point has no ability to support actual user-mounted
>filesystems.  There are things that could be done to remedy this, but it
>would require massive changes to every filesystem driver as well as to
>the VFS.  
>
??  As part of our research into namespaces, we at Sun have gone through 
and tried to identify the number of semantic changes required to achieve 
user-privileged mounting, however we never saw the need to do anything 
special at all in 'each filesystem driver'.  The issue is one of a 
permission model and should be out of scope for individual filesystems.

>Would it be desirable?  Absolutely.  However, it's partially
>the quagmire that got the HURD stuck for a very long time, even though
>they had the huge advantage of being able to run their filesystem
>drivers in a nonprivileged context.
>
>  
>
Other systems such as plan 9 have done it though..    If anything is 
keeping us from doing it, it's the traditional unix mount semantics and 
the security models that have been built on top of them.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 19:41 ` H. Peter Anvin
  2004-01-08 20:08   ` trond.myklebust
@ 2004-01-09 20:16   ` Mike Waychison
  1 sibling, 0 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 20:16 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: trond.myklebust, viro, linux-kernel, raven, thockin

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

H. Peter Anvin wrote:

>trond.myklebust@fys.uio.no wrote:
>  
>
>>Finally, because the upcall is done in the user's own context, you avoid
>>the whole problem of automounter credentials that are a constant plague
>>to all those daemon-based implementations when working in an environment
>>where you have strong authentication.
>>If anyone wants evidence of how broken the whole daemon thing is, then see
>>the workarounds that had to be made in RFC-2623 to disable strong
>>authentication for GETATTR etc. on the NFSv2/v3 mount point.
>>
>>    
>>
>
>It's not broken as much as what you want to do is outside the scope of
>automount.  automount is one particular user of these facilities, and as
>you correctly point out, it can't solve the problems for all of them.
>The right thing for AFS and NFSv4 is clearly to do something different.
>
>  
>
If automount is going to be mounting NFS shares for users, I don't see 
how this is out of scope.

>Mount traps by themselves are not sufficient for automount, which is why
>I think we will always have a special "autofs" filesystem, for the
>simple reason that automount in typical use doesn't either have an a
>priori complete list of directories!  Even with ghosting you might find
>that you're accessing a new key which has not yet been ghosted, and it
>needs to be handled correctly.  Additionally, not all map types can be
>enumerated, and some aren't even finite in size (consider /net, program
>maps and wildcard map entries.)  Thus, for indirect mountpoints you
>still need a filesystem which can trap on non-enumerated entries.
>
>  
>
Yup.

>That being said, mount traps in particular, and possibly this "trap
>filesystem" are more generic kernel facilities which should be of use to
>other things than automount.  AFS/NFSv4 are the obvious examples, quite
>possibly other things like intermezzo might be interested, and we don't
>want to have to reinvent the wheel every time.
>
>  
>
I could see AFS using these mounttraps, however I don't see any benefit 
for NFS.   If anything, the migration issue is about getting rid of the 
daemon, not mounttraps.  The issues I think Trond is putting forward are:

a) The kernel needs to initiate a remount, but doesn't have nameservice 
functionality.

b) User credentials are needed to perform the initial mount itself 
because some servers don't allow non-authenticated calls to the MOUNT 
program, keeping the system from grabbing a root filehandle.

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-09 19:41         ` Mike Waychison
@ 2004-01-09 19:57           ` H. Peter Anvin
  2004-01-09 21:31             ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-09 19:57 UTC (permalink / raw)
  To: Mike Waychison
  Cc: viro, Ian Kent, autofs mailing list, Ogden, Aaron A.,
	Kernel Mailing List

Mike Waychison wrote:
>>
>> Special vfsmount mounted somewhere; has no superblock associated with it;
>> attempt to step on it triggers event; normal result of that event is to
>> get a normal mount on top of it, at which point usual chaining logics
>> will make sure that we don't see the trap until it's uncovered by removal
>> of covering filesystem.  Trap (and everything mounted on it, etc.) can
>> be removed by normal lazy umount.
>>
>> Basically, it's a single-point analog of autofs done entirely in VFS.
>> The job of automounter is to maintain the traps and react to events.
>>
> Is there any clear advantage to doing this in the VFS other than saving
> a superblock and a dentry/inode pair or two?
> 
> I remember talking to you about this, and I seem to recall that these
> mount traps would probably communicate using a struct file, so a
> trap-user would somehow receive events about when the trap was set
> off.   Will this communication model continue to work within a cloned
> namespace?  What happens if the trap-client closes the file?
> 

The biggest issue is to ensure that the appropriate atomicity guarantees
can be maintained.  In particular, it must be possible to umount the
underlying filesystem and all mount traps on top of it atomically.
Anything less will result in race conditions.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 18:31       ` viro
  2004-01-09 18:43         ` Ian Kent
@ 2004-01-09 19:41         ` Mike Waychison
  2004-01-09 19:57           ` H. Peter Anvin
  1 sibling, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-09 19:41 UTC (permalink / raw)
  To: viro
  Cc: Ian Kent, autofs mailing list, H. Peter Anvin, Ogden, Aaron A.,
	Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]

viro@parcelfarce.linux.theplanet.co.uk wrote:

>On Thu, Jan 08, 2004 at 08:52:31PM +0800, Ian Kent wrote:
>  
>
>>On Wed, 7 Jan 2004, H. Peter Anvin wrote:
>>
>>    
>>
>>>These are the mount traps Al Viro has been architecting.
>>>
>>>      
>>>
>>Please tell me about these.
>>
>>I have`nt seen any discussion on the implementation.
>>
>>Just a few sentences ....
>>    
>>
>
>Special vfsmount mounted somewhere; has no superblock associated with it;
>attempt to step on it triggers event; normal result of that event is to
>get a normal mount on top of it, at which point usual chaining logics
>will make sure that we don't see the trap until it's uncovered by removal
>of covering filesystem.  Trap (and everything mounted on it, etc.) can
>be removed by normal lazy umount.
>
>Basically, it's a single-point analog of autofs done entirely in VFS.
>The job of automounter is to maintain the traps and react to events.
>
>  
>
Is there any clear advantage to doing this in the VFS other than saving 
a superblock and a dentry/inode pair or two?

I remember talking to you about this, and I seem to recall that these 
mount traps would probably communicate using a struct file, so a 
trap-user would somehow receive events about when the trap was set 
off.   Will this communication model continue to work within a cloned 
namespace?  What happens if the trap-client closes the file?

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me, 
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 18:31       ` viro
@ 2004-01-09 18:43         ` Ian Kent
  2004-01-09 19:41         ` Mike Waychison
  1 sibling, 0 replies; 85+ messages in thread
From: Ian Kent @ 2004-01-09 18:43 UTC (permalink / raw)
  To: viro
  Cc: H. Peter Anvin, Jim Carter, Ogden, Aaron A.,
	thockin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Thu, 8 Jan 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:

> Basically, it's a single-point analog of autofs done entirely in VFS.
> The job of automounter is to maintain the traps and react to events.
>
> And yes, I should've done that months ago.  Waaaaay too long backlog -
> bdev work, dev_t stuff, netdev, yadda, yadda.
>

So that's why Peter appears to have not made progress.

Yes. Tell me about the 24 hour days that feel like an hour and feel like
only an hours progress has been made.

Ian




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 22:20       ` J. Bruce Fields
@ 2004-01-08 22:24         ` H. Peter Anvin
  0 siblings, 0 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-08 22:24 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, viro, linux-kernel, raven, Michael.Waychison, thockin

J. Bruce Fields wrote:
>
> On Thu, Jan 08, 2004 at 01:13:24PM -0800, H. Peter Anvin wrote:
> 
>>Also, your global machine credential is to some degree "all the security
>>you get."  Any security which isn't enforced by the filesystem driver
>>doesn't exist in a Unix environment; in particular there is no security
>>against root.
> 
> I only have to trust root on the nfs client machines that I actually
> use.  (In fact, I only really have to trust those machines with a
> short-lived ticket, preventing even those machines from impersonating me
> beyond a limited time.)
> 

And when that ticket expires, you better have NFS itself know how to
renew its credentials, or you're up the creek.  Nothing that autofs can
help you with.

> 
>>Stupid tricks like remapping uid 0 are just that; stupid
>>tricks without any real security value.  You know this, of course.
>>However, if you think the automounter doesn't have the privilege to
>>access the remote server but the user does, then that's false security.
> 
> If the server requires kerberos credentials that only a user has, then
> the automounter can't do anything until the user coughs up those
> credentials.
> 

True, but giving them to a privileged daemon is no different that giving
them to the kernel in that way.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 21:13     ` H. Peter Anvin
@ 2004-01-08 22:20       ` J. Bruce Fields
  2004-01-08 22:24         ` H. Peter Anvin
  2004-01-09 20:37       ` Mike Waychison
  1 sibling, 1 reply; 85+ messages in thread
From: J. Bruce Fields @ 2004-01-08 22:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: trond.myklebust, viro, linux-kernel, raven, Michael.Waychison, thockin

On Thu, Jan 08, 2004 at 01:13:24PM -0800, H. Peter Anvin wrote:
> Also, your global machine credential is to some degree "all the security
> you get."  Any security which isn't enforced by the filesystem driver
> doesn't exist in a Unix environment; in particular there is no security
> against root.

I only have to trust root on the nfs client machines that I actually
use.  (In fact, I only really have to trust those machines with a
short-lived ticket, preventing even those machines from impersonating me
beyond a limited time.)

> Stupid tricks like remapping uid 0 are just that; stupid
> tricks without any real security value.  You know this, of course.
> However, if you think the automounter doesn't have the privilege to
> access the remote server but the user does, then that's false security.

If the server requires kerberos credentials that only a user has, then
the automounter can't do anything until the user coughs up those
credentials.

--Bruce Fields

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 20:08   ` trond.myklebust
@ 2004-01-08 21:13     ` H. Peter Anvin
  2004-01-08 22:20       ` J. Bruce Fields
  2004-01-09 20:37       ` Mike Waychison
  0 siblings, 2 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-08 21:13 UTC (permalink / raw)
  To: trond.myklebust; +Cc: viro, linux-kernel, raven, Michael.Waychison, thockin

trond.myklebust@fys.uio.no wrote:
> 
> My point is that the above problem crops up in almost *all* combinations
> of automounter daemon with remote filesystem and strong authentication.
> In order to correctly mount the remote filesystem, the automounter
> itself needs a minimum set of remote privileges (typically it needs to be
> able to browse the remote filesystem).
> 
> RFC-2623 describes how to add RPCSEC_GSS to NFSv2/v3. The
> workarounds (hacks really) that I refer to above had to be deliberately
> added in order to make Sun's automounter work in this environment.
> The alternative would have been to have a global "machine" credential
> for use by the automounter when browsing /net. Hardly secure...
> 

My point is that it's what you get for having an automounter.

We can't solve Sun's designed-in braindamage, unfortunately.  This is
partially why I'd like people to consider the scope of what automounting
does; there are tons of policy issues not all of which are going to be
appropriate in all contexts.  To some degree, if you have to have an
automounter you have already lost.

Also, your global machine credential is to some degree "all the security
you get."  Any security which isn't enforced by the filesystem driver
doesn't exist in a Unix environment; in particular there is no security
against root.  Stupid tricks like remapping uid 0 are just that; stupid
tricks without any real security value.  You know this, of course.
However, if you think the automounter doesn't have the privilege to
access the remote server but the user does, then that's false security.

Linux at this point has no ability to support actual user-mounted
filesystems.  There are things that could be done to remedy this, but it
would require massive changes to every filesystem driver as well as to
the VFS.  Would it be desirable?  Absolutely.  However, it's partially
the quagmire that got the HURD stuck for a very long time, even though
they had the huge advantage of being able to run their filesystem
drivers in a nonprivileged context.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 19:41 ` H. Peter Anvin
@ 2004-01-08 20:08   ` trond.myklebust
  2004-01-08 21:13     ` H. Peter Anvin
  2004-01-09 20:16   ` Mike Waychison
  1 sibling, 1 reply; 85+ messages in thread
From: trond.myklebust @ 2004-01-08 20:08 UTC (permalink / raw)
  To: hpa; +Cc: viro, linux-kernel, raven, Michael.Waychison, thockin

>> If anyone wants evidence of how broken the whole daemon thing is, then
>> see the workarounds that had to be made in RFC-2623 to disable strong
>> authentication for GETATTR etc. on the NFSv2/v3 mount point.
>>
>
> It's not broken as much as what you want to do is outside the scope of
> automount.  automount is one particular user of these facilities, and as
> you correctly point out, it can't solve the problems for all of them.
> The right thing for AFS and NFSv4 is clearly to do something different.

My point is that the above problem crops up in almost *all* combinations
of automounter daemon with remote filesystem and strong authentication.
In order to correctly mount the remote filesystem, the automounter
itself needs a minimum set of remote privileges (typically it needs to be
able
to browse the remote filesystem).

RFC-2623 describes how to add RPCSEC_GSS to NFSv2/v3. The
workarounds (hacks really) that I refer to above had to be deliberately
added in order to make Sun's automounter work in this environment.
The alternative would have been to have a global "machine" credential
for use by the automounter when browsing /net. Hardly secure...

> That being said, mount traps in particular, and possibly this "trap
> filesystem" are more generic kernel facilities which should be of use to
> other things than automount.  AFS/NFSv4 are the obvious examples, quite
> possibly other things like intermezzo might be interested, and we don't
> want to have to reinvent the wheel every time.

Certainly. I believe CIFS might also have a similar mechanism.

Cheers,
   Trond



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 19:32 trond.myklebust
@ 2004-01-08 19:41 ` H. Peter Anvin
  2004-01-08 20:08   ` trond.myklebust
  2004-01-09 20:16   ` Mike Waychison
  0 siblings, 2 replies; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-08 19:41 UTC (permalink / raw)
  To: trond.myklebust; +Cc: viro, linux-kernel, raven, Michael.Waychison, thockin

trond.myklebust@fys.uio.no wrote:
> 
> Finally, because the upcall is done in the user's own context, you avoid
> the whole problem of automounter credentials that are a constant plague
> to all those daemon-based implementations when working in an environment
> where you have strong authentication.
> If anyone wants evidence of how broken the whole daemon thing is, then see
> the workarounds that had to be made in RFC-2623 to disable strong
> authentication for GETATTR etc. on the NFSv2/v3 mount point.
> 

It's not broken as much as what you want to do is outside the scope of
automount.  automount is one particular user of these facilities, and as
you correctly point out, it can't solve the problems for all of them.
The right thing for AFS and NFSv4 is clearly to do something different.

Mount traps by themselves are not sufficient for automount, which is why
I think we will always have a special "autofs" filesystem, for the
simple reason that automount in typical use doesn't either have an a
priori complete list of directories!  Even with ghosting you might find
that you're accessing a new key which has not yet been ghosted, and it
needs to be handled correctly.  Additionally, not all map types can be
enumerated, and some aren't even finite in size (consider /net, program
maps and wildcard map entries.)  Thus, for indirect mountpoints you
still need a filesystem which can trap on non-enumerated entries.

That being said, mount traps in particular, and possibly this "trap
filesystem" are more generic kernel facilities which should be of use to
other things than automount.  AFS/NFSv4 are the obvious examples, quite
possibly other things like intermezzo might be interested, and we don't
want to have to reinvent the wheel every time.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
@ 2004-01-08 19:32 trond.myklebust
  2004-01-08 19:41 ` H. Peter Anvin
  0 siblings, 1 reply; 85+ messages in thread
From: trond.myklebust @ 2004-01-08 19:32 UTC (permalink / raw)
  To: viro, linux-kernel; +Cc: raven, hpa, Michael.Waychison, thockin

På to , 08/01/2004 klokka 13:31, skreiv
viro@parcelfarce.linux.theplanet.co.uk:

> Special vfsmount mounted somewhere; has no superblock associated with
> it; attempt to step on it triggers event; normal result of that event
> is to get a normal mount on top of it, at which point usual chaining
> logics will make sure that we don't see the trap until it's uncovered
> by removal of covering filesystem.  Trap (and everything mounted on
> it, etc.) can be removed by normal lazy umount.
>
> Basically, it's a single-point analog of autofs done entirely in VFS.
> The job of automounter is to maintain the traps and react to events.

What if the trap is set by the filesystem? I'm thinking about AFS
volumes and NFSv4 migration events here.

Both these need something that goes beyond the current autofs "daemon
waiting on top of a single trap" thinking.

In the NFSv4 migration case we can be walking down the filesystem patch
and enter a directory where we are basically told by the server that
"this volume has been moved" and are given a list of replicated
servername/pathname fields. Those then need to be interpreted in
userland by means of an upcall of some sort, and the new volume needs to
be mounted.

Neither autofs3 nor autofs4 can currently help us do this, because we
don't a priori have a complete list of directories on which to start a
bunch of "automount" daemons (and it wouldn't help anyway since a server
failover event etc. might cause the list to change).

Setting up our own traps, however, and then doing the upcall by means of
an exec & an "intelligent" mount program (as Mike & co. propose) OTOH,
would very much simplify matters, since that allows us to do simple
string parameter passing from the kernel to direct how the mount is to
be set up.
It still leaves the final policy decisions of which server to mount &
where in userland.

Finally, because the upcall is done in the user's own context, you avoid
the whole problem of automounter credentials that are a constant plague
to all those daemon-based implementations when working in an environment
where you have strong authentication.
If anyone wants evidence of how broken the whole daemon thing is, then see
the workarounds that had to be made in RFC-2623 to disable strong
authentication for GETATTR etc. on the NFSv2/v3 mount point.

Cheers,
  Trond





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-08 12:52     ` Ian Kent
@ 2004-01-08 18:31       ` viro
  2004-01-09 18:43         ` Ian Kent
  2004-01-09 19:41         ` Mike Waychison
  0 siblings, 2 replies; 85+ messages in thread
From: viro @ 2004-01-08 18:31 UTC (permalink / raw)
  To: Ian Kent
  Cc: H. Peter Anvin, Jim Carter, Ogden, Aaron A.,
	thockin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Thu, Jan 08, 2004 at 08:52:31PM +0800, Ian Kent wrote:
> On Wed, 7 Jan 2004, H. Peter Anvin wrote:
> 
> >
> > These are the mount traps Al Viro has been architecting.
> >
> 
> Please tell me about these.
> 
> I have`nt seen any discussion on the implementation.
> 
> Just a few sentences ....

Special vfsmount mounted somewhere; has no superblock associated with it;
attempt to step on it triggers event; normal result of that event is to
get a normal mount on top of it, at which point usual chaining logics
will make sure that we don't see the trap until it's uncovered by removal
of covering filesystem.  Trap (and everything mounted on it, etc.) can
be removed by normal lazy umount.

Basically, it's a single-point analog of autofs done entirely in VFS.
The job of automounter is to maintain the traps and react to events.

And yes, I should've done that months ago.  Waaaaay too long backlog -
bdev work, dev_t stuff, netdev, yadda, yadda.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 23:32   ` H. Peter Anvin
@ 2004-01-08 12:52     ` Ian Kent
  2004-01-08 18:31       ` viro
  0 siblings, 1 reply; 85+ messages in thread
From: Ian Kent @ 2004-01-08 12:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jim Carter, Ogden, Aaron A.,
	thockin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Wed, 7 Jan 2004, H. Peter Anvin wrote:

>
> These are the mount traps Al Viro has been architecting.
>

Please tell me about these.

I have`nt seen any discussion on the implementation.

Just a few sentences ....

Ian



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 23:47           ` Mike Waychison
@ 2004-01-07 23:56             ` Jeff Garzik
  2004-01-12 16:57               ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: Jeff Garzik @ 2004-01-07 23:56 UTC (permalink / raw)
  To: Mike Waychison; +Cc: H. Peter Anvin, linux-kernel

Mike Waychison wrote:
> You wouldn't put a bdflush daemon in userspace either would you?  The 
> loop in question is just that; (overly simplified):
> 
> while (1) {
>     f = ask_kernel_if_anything_looks_inactive();
>     if (f) {
>         try_to_umount(f);
>         continue;
>     } else {
>         sleep(x seconds);
>     }
> }
> 
> My point is, if this is the only active action done by userspace, why 
> open it up to being broken?


You're still using arguments -against- putting software in the kernel. 
You don't decrease software's chances of "being broken" by putting it in 
the kernel, the opposite occurs -- you increase the likelihood of making 
the entire system unstable.  This is one point that Solaris and Win32 
have both missed :)

	Jeff




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:24         ` Jeff Garzik
@ 2004-01-07 23:47           ` Mike Waychison
  2004-01-07 23:56             ` Jeff Garzik
  0 siblings, 1 reply; 85+ messages in thread
From: Mike Waychison @ 2004-01-07 23:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Mike Waychison, H. Peter Anvin, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

Jeff Garzik wrote:
> Mike Waychison wrote:
> 
>> To put it into perspective, the I'm calling for the following major 
>> changes:
> 
> [...]
> 
>> 2) move the loop that used to spin around and ask kernelspace if there 
>> was anything to expire into the VFS as well, where it won't be killed.
> 
> [...]
> 
>> (1) and (2) shouldn't be hard at all to do considering David Howells 
>> has done the majority of this already. (3) is needed in order to 
>> manage direct mounts properly for when they are 'covered'.  
>> Admittedly, (4) comes off as an ugly hack.
>>
>> Also, (2) was the only 'active' task the automount daemon was doing. 
>> Everything else it did can be rewritten in the form of a usermode 
>> helper that runs only when it is needed.  This simplifies the 
>> userspace code a lot.
> 
> 
> Just going by your own explanation here, #2 should not be in the kernel.
> 
> If we moving daemons into the kernel just because they won't be killed, 
> we'll have Oracle in-kernel before you know it.  Completely spurious 
> reason.
> 

You wouldn't put a bdflush daemon in userspace either would you?  The 
loop in question is just that; (overly simplified):

while (1) {
	f = ask_kernel_if_anything_looks_inactive();
	if (f) {
		try_to_umount(f);
		continue;
	} else {
		sleep(x seconds);
	}
}

My point is, if this is the only active action done by userspace, why 
open it up to being broken?

-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:11         ` Mike Fedyk
@ 2004-01-07 23:40           ` Jesper Juhl
  0 siblings, 0 replies; 85+ messages in thread
From: Jesper Juhl @ 2004-01-07 23:40 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Mike Waychison, H. Peter Anvin, linux-kernel



On Wed, 7 Jan 2004, Mike Fedyk wrote:

> On Wed, Jan 07, 2004 at 04:04:41PM -0500, Mike Waychison wrote:
> > H. Peter Anvin wrote:
> >
> > >>Also when /home or other important fs are mounted via autofs there is
> > >>not much practical difference between a hung kernel and a hung
> > >>daemon. You have to reboot the system anyways.
> > >
> > >
> > >a) Guess which one is easier to debug?
> >
> > When they may both equally hang your machine, neither.
>
> Let's see.
>
> If it's in userspace, then setup your debug area in an area your system
> doesn't depend on, and wham, the hang won't affect the entire system anymore.
>
> Also, if you have /home automounted then it only affects the users on /home,
> and root's $home should be /home...
>

>From a user point of view I have to agree with you. Keeping it out of the
kernel makes perfect sense to me.

Easier to test your setup - errors will not hang the box.

In the case the implementation is buggy a daemon can easily be restarted
nightly without disrupting other things running on the box (a nightly
reboot is not as friendly).


>From a developer point of view, I also agree.

Debugging kernel code is in general a much harder thing to do than
debugging a userspace daemon. I'd also guess that more people will be
inclined to contribute development time to a userspace program than a
kernel based implementation - just the fact that it's in-kernel will be
percieved as having a much higher barrier-to-entry and I suspect that fact
alone might discourage potential contributers.


- Jesper Juhl


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 23:14 ` Jim Carter
@ 2004-01-07 23:32   ` H. Peter Anvin
  2004-01-08 12:52     ` Ian Kent
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-07 23:32 UTC (permalink / raw)
  To: Jim Carter
  Cc: Ogden, Aaron A.,
	thockin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

Jim Carter wrote:
> 
> To my mind the ideal design goes something like this:
> 
> 1.  you can mount a synthetic autofs filesystem on lots of directories,
> including subdirs of other autofs filesystems.
> 
> 2.  Whenever anything tries to access one of those directories (for a
> direct map) or one of its subdirs whether visible or not (indirect map), if
> nothing is mounted on it [and it hasn't been told by a special flag that
> it's non-mountable, see the /home/user/server{A,B} example], the autofs
> kernel module runs a script in user space (in the namespace context of the
> originally requesting process).  Upon exit, if something is now mounted on
> the subdir, fine.  Otherwise, ENOENT.  The module is not required to know
> anything about autofs maps that the userspace helper may or may not
> consult.
> 
> 3.  Periodically the module should check if mounted filesystems are
> potentially unmountable (this seems to be inexpensive), and if so it should
> run the userspace helper to unmount them.  If the unmount fails, the helper
> (not the kernel) should try to distinguish a race condition from a dead NFS
> server, and whether the mount will be viable once the server comes back. If
> not, it should be more aggressive than the present daemon in unmounting. At
> present the module carefully keeps up-to-date a last_used field and a
> timeout potentially different for each mount, but it's probably sufficient
> to merely poll all the mount points periodically all at once, perhaps with
> a one-time exemption when something is first mounted.
> 
> And that's *all* the complexity that should be in the kernel.  That's quite
> complex enough in my opinion.  If the userspace helper needs state, it can
> lock and read/write a file.  I don't really see the need for the autofs
> system to have state beyond "it's mounted".
> 

What you've described above is more or less the autofs v3 design.  There
are reasons why you really want to have a simple-minded timeout in the
kernel, mostly because attempting umount is more expensive than it
should be on some filesystems.  It only needs to be statistically
accurate, though, and thus it does not introduce a race.

Once you have to deal with mount trees (multiple filesystems on the same
mount point which you want to have appear to userspace as a unit),
things get significantly more complex, unfortunately.  Mounting is not a
problem, since the nonprivileged processes are simply held, but
umounting is, since in order to make sure there are no race conditions
userspace needs to be locked out from filesystem "a" while umounting
filesystem "a/b", *or* the equivalent of a direct mount autofs point has
to be imposed on node "a/b" of filesystem "a" which can be atomically
deleted together with the umounting of filesystem "a".

These are the mount traps Al Viro has been architecting.

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:28 Ogden, Aaron A.
                   ` (2 preceding siblings ...)
  2004-01-06 22:53 ` Paul Raines
@ 2004-01-07 23:14 ` Jim Carter
  2004-01-07 23:32   ` H. Peter Anvin
  3 siblings, 1 reply; 85+ messages in thread
From: Jim Carter @ 2004-01-07 23:14 UTC (permalink / raw)
  To: Ogden, Aaron A.
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Tue, 6 Jan 2004, Ogden, Aaron A. wrote:
> If you've read this far, what I'm trying to say is that having userspace
> related code remain in userland is a good thing since you can restart
> the daemon if something goes wrong.

Hear, hear.  But...

> If you move all of this to
> kernel-space you can't do anything about it if there is a problem.  In
> Solaris there is a command called 'automount' that tells the kernel to
> re-read the automount maps, perhaps it resets the autofs subsystem in
> the kernel as well.  If linux autofs had the same capability we might
> not need the daemon, but until then, having the daemon in userland is a
> good thing.

To my mind the ideal design goes something like this:

1.  you can mount a synthetic autofs filesystem on lots of directories,
including subdirs of other autofs filesystems.

2.  Whenever anything tries to access one of those directories (for a
direct map) or one of its subdirs whether visible or not (indirect map), if
nothing is mounted on it [and it hasn't been told by a special flag that
it's non-mountable, see the /home/user/server{A,B} example], the autofs
kernel module runs a script in user space (in the namespace context of the
originally requesting process).  Upon exit, if something is now mounted on
the subdir, fine.  Otherwise, ENOENT.  The module is not required to know
anything about autofs maps that the userspace helper may or may not
consult.

3.  Periodically the module should check if mounted filesystems are
potentially unmountable (this seems to be inexpensive), and if so it should
run the userspace helper to unmount them.  If the unmount fails, the helper
(not the kernel) should try to distinguish a race condition from a dead NFS
server, and whether the mount will be viable once the server comes back. If
not, it should be more aggressive than the present daemon in unmounting. At
present the module carefully keeps up-to-date a last_used field and a
timeout potentially different for each mount, but it's probably sufficient
to merely poll all the mount points periodically all at once, perhaps with
a one-time exemption when something is first mounted.

And that's *all* the complexity that should be in the kernel.  That's quite
complex enough in my opinion.  If the userspace helper needs state, it can
lock and read/write a file.  I don't really see the need for the autofs
system to have state beyond "it's mounted".

James F. Carter          Voice 310 825 2897    FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA  90095-1555
Email: jimc@math.ucla.edu    http://www.math.ucla.edu/~jimc (q.v. for PGP key)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:04       ` Mike Waychison
  2004-01-07 21:11         ` Mike Fedyk
@ 2004-01-07 21:24         ` Jeff Garzik
  2004-01-07 23:47           ` Mike Waychison
  1 sibling, 1 reply; 85+ messages in thread
From: Jeff Garzik @ 2004-01-07 21:24 UTC (permalink / raw)
  To: Mike Waychison; +Cc: H. Peter Anvin, linux-kernel

Mike Waychison wrote:
> To put it into perspective, the I'm calling for the following major 
> changes:
[...]
> 2) move the loop that used to spin around and ask kernelspace if there 
> was anything to expire into the VFS as well, where it won't be killed.
[...]
> (1) and (2) shouldn't be hard at all to do considering David Howells has 
> done the majority of this already. (3) is needed in order to manage 
> direct mounts properly for when they are 'covered'.  Admittedly, (4) 
> comes off as an ugly hack.
> 
> Also, (2) was the only 'active' task the automount daemon was doing. 
> Everything else it did can be rewritten in the form of a usermode helper 
> that runs only when it is needed.  This simplifies the userspace code a 
> lot.

Just going by your own explanation here, #2 should not be in the kernel.

If we moving daemons into the kernel just because they won't be killed, 
we'll have Oracle in-kernel before you know it.  Completely spurious reason.

	Jeff




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 21:04       ` Mike Waychison
@ 2004-01-07 21:11         ` Mike Fedyk
  2004-01-07 23:40           ` Jesper Juhl
  2004-01-07 21:24         ` Jeff Garzik
  1 sibling, 1 reply; 85+ messages in thread
From: Mike Fedyk @ 2004-01-07 21:11 UTC (permalink / raw)
  To: Mike Waychison; +Cc: H. Peter Anvin, linux-kernel

On Wed, Jan 07, 2004 at 04:04:41PM -0500, Mike Waychison wrote:
> H. Peter Anvin wrote:
> 
> >>Also when /home or other important fs are mounted via autofs there is
> >>not much practical difference between a hung kernel and a hung
> >>daemon. You have to reboot the system anyways.
> >
> >
> >a) Guess which one is easier to debug?
> 
> When they may both equally hang your machine, neither.

Let's see.

If it's in userspace, then setup your debug area in an area your system
doesn't depend on, and wham, the hang won't affect the entire system anymore.

Also, if you have /home automounted then it only affects the users on /home,
and root's $home should be /home...

Though, you can debug in-kernel code with UML...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07 17:50     ` H. Peter Anvin
@ 2004-01-07 21:04       ` Mike Waychison
  2004-01-07 21:11         ` Mike Fedyk
  2004-01-07 21:24         ` Jeff Garzik
  0 siblings, 2 replies; 85+ messages in thread
From: Mike Waychison @ 2004-01-07 21:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2256 bytes --]

H. Peter Anvin wrote:

>> Also when /home or other important fs are mounted via autofs there is
>> not much practical difference between a hung kernel and a hung
>> daemon. You have to reboot the system anyways.
> 
> 
> a) Guess which one is easier to debug?

When they may both equally hang your machine, neither.

> b) Do people around here really believe that putting things in the 
> kernel magically makes them work right?
> 

No magic involved.

When atomicity is needed wrt. mountpoints, moving the logic into the 
kernel is a much simpler solution.

How much code was required to handle the corner cases and races in the 
existing autofs implementations?

To put it into perspective, the I'm calling for the following major changes:

1) move expiry logic out of autofs and into the VFS where others can use 
it and notice when it breaks when VFS internals change.  For example, I 
just noticed that autofs4 in 2.6 hasn't been updated to grab the new 
vfsmount_lock instead of dcache_lock in certain circumstances.

2) move the loop that used to spin around and ask kernelspace if there 
was anything to expire into the VFS as well, where it won't be killed.

3) introduce some way to let userspace walk the mountpoints using file 
descriptors as references.

4) figure out a way to get super_blocks to clone so that we can have 
some consistent automount functionality across cloned namespaces.

(1) and (2) shouldn't be hard at all to do considering David Howells has 
done the majority of this already. (3) is needed in order to manage 
direct mounts properly for when they are 'covered'.  Admittedly, (4) 
comes off as an ugly hack.

Also, (2) was the only 'active' task the automount daemon was doing. 
Everything else it did can be rewritten in the form of a usermode helper 
that runs only when it is needed.  This simplifies the userspace code a lot.


-- 
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
mailto: Michael.Waychison@Sun.COM
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[-- Attachment #2: Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-07  4:21   ` Andi Kleen
@ 2004-01-07 17:50     ` H. Peter Anvin
  2004-01-07 21:04       ` Mike Waychison
  0 siblings, 1 reply; 85+ messages in thread
From: H. Peter Anvin @ 2004-01-07 17:50 UTC (permalink / raw)
  To: linux-kernel

Andi Kleen wrote:
> 
> I personally would be in favour of doing it all in the kernel because
> autofs3 and autofs4 are not fully compatible and break in subtle ways
> when not matching and in my experience when you have autofs3 compiled
> into the kernel the system happens to have an autofs 4 daemon
> installed and vice versa. Doing it in the kernel would avoid this
> nasty dependency problem.
> 

"Don't do that then."  Really.  Originally the autofs v3 filesystem was 
called "autofs" and the autofs v4 filesystem was called "autofs4" and 
the intent was that you should *never* run them across versions.

Jeremy tried nevertheless to be compatible (mistake #1) and Linus then 
renamed the autofs4 filesystem "autofs" (mistake #2).  There was no good 
reason for this and it should never have happened -- it broke the design 
that was intended to make sure the above wasn't going to be a problem.

> Also when /home or other important fs are mounted via autofs there is
> not much practical difference between a hung kernel and a hung
> daemon. You have to reboot the system anyways.

a) Guess which one is easier to debug?
b) Do people around here really believe that putting things in the 
kernel magically makes them work right?

	-hpa


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
       [not found] ` <1b6CO-3v0-15@gated-at.bofh.it>
@ 2004-01-07  4:21   ` Andi Kleen
  2004-01-07 17:50     ` H. Peter Anvin
  0 siblings, 1 reply; 85+ messages in thread
From: Andi Kleen @ 2004-01-07  4:21 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Michael.Waychison

"H. Peter Anvin" <hpa@zytor.com> writes:

>  A dead daemon is a
> painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN
> condition.  By cramming it into the kernel, you're in fact making the
> system less stable, not more, because the kernel being tainted with
> faulty code is a total system malfunction; a crashed userspace daemon is
> "merely" a messy cleanup.  In practice, the autofs daemon/ does not die
> unless a careless system administrator kills it.  It is a non-problem.

I personally would be in favour of doing it all in the kernel because
autofs3 and autofs4 are not fully compatible and break in subtle ways
when not matching and in my experience when you have autofs3 compiled
into the kernel the system happens to have an autofs 4 daemon
installed and vice versa. Doing it in the kernel would avoid this
nasty dependency problem.

Also when /home or other important fs are mounted via autofs there is
not much practical difference between a hung kernel and a hung
daemon. You have to reboot the system anyways.

-Andi

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 23:34 Ogden, Aaron A.
@ 2004-01-06 23:47 ` Tim Hockin
  0 siblings, 0 replies; 85+ messages in thread
From: Tim Hockin @ 2004-01-06 23:47 UTC (permalink / raw)
  To: Ogden, Aaron A.
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Tue, Jan 06, 2004 at 05:34:08PM -0600, Ogden, Aaron A. wrote:
> autofs work like Solaris autofs.  Is Sun willing to devote man-hours to
> help implement the new autofs?  I think Ian has done a tremendous job

Yes. :)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [autofs] [RFC] Towards a Modern Autofs
@ 2004-01-06 23:34 Ogden, Aaron A.
  2004-01-06 23:47 ` Tim Hockin
  0 siblings, 1 reply; 85+ messages in thread
From: Ogden, Aaron A. @ 2004-01-06 23:34 UTC (permalink / raw)
  To: Tim Hockin
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List



> -----Original Message-----
> From: Tim Hockin [mailto:thockin@hockin.org] 
> Sent: Tuesday, January 06, 2004 4:48 PM
> To: Ogden, Aaron A.
> Cc: thockin@Sun.COM; H. Peter Anvin; autofs mailing list; 
> Mike Waychison; Kernel Mailing List
> Subject: Re: [autofs] [RFC] Towards a Modern Autofs
> 
> 
> On Tue, Jan 06, 2004 at 04:28:59PM -0600, Ogden, Aaron A. wrote:
> > Solaris there is a command called 'automount' that tells the kernel
to
> > re-read the automount maps, perhaps it resets the autofs subsystem
in
> > the kernel as well.  If linux autofs had the same capability we
might
> > not need the daemon, but until then, having the daemon in userland
is a
> > good thing.
> 
> That's more or less exactly what is proposed.
> 

Excellent!  I haven't read through the proposal yet but I have it open
in another window.  :-)
The detailed proposal you've written implies that Sun as a whole has
given serious thought to the problem, which IMHO is how to make linux
autofs work like Solaris autofs.  Is Sun willing to devote man-hours to
help implement the new autofs?  I think Ian has done a tremendous job
with autofs4 but the more minds we throw at the problem the better.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:28 Ogden, Aaron A.
  2004-01-06 22:41 ` Mike Fedyk
  2004-01-06 22:47 ` Tim Hockin
@ 2004-01-06 22:53 ` Paul Raines
  2004-01-07 23:14 ` Jim Carter
  3 siblings, 0 replies; 85+ messages in thread
From: Paul Raines @ 2004-01-06 22:53 UTC (permalink / raw)
  To: Ogden, Aaron A.
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

As another sysadmin with 300+ linux and solaris boxes, I second
you sentiments exactly.  As my previous post today states, I am
having exactly the problem you describe with automount daemons
becoming hung or unresponsive.  Guess I should give 4.1.0 a try.  

Of course the same arguement applies to NFS server but they went
ahead and moved most of that into the kernel anyway for the 
performance gain.

-- 
---------------------------------------------------------------
Paul Raines                   email: raines@nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street        Charlestown, MA 02129	USA   





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:28 Ogden, Aaron A.
  2004-01-06 22:41 ` Mike Fedyk
@ 2004-01-06 22:47 ` Tim Hockin
  2004-01-06 22:53 ` Paul Raines
  2004-01-07 23:14 ` Jim Carter
  3 siblings, 0 replies; 85+ messages in thread
From: Tim Hockin @ 2004-01-06 22:47 UTC (permalink / raw)
  To: Ogden, Aaron A.
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Tue, Jan 06, 2004 at 04:28:59PM -0600, Ogden, Aaron A. wrote:
> Solaris there is a command called 'automount' that tells the kernel to
> re-read the automount maps, perhaps it resets the autofs subsystem in
> the kernel as well.  If linux autofs had the same capability we might
> not need the daemon, but until then, having the daemon in userland is a
> good thing.

That's more or less exactly what is proposed.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [autofs] [RFC] Towards a Modern Autofs
  2004-01-06 22:28 Ogden, Aaron A.
@ 2004-01-06 22:41 ` Mike Fedyk
  2004-01-06 22:47 ` Tim Hockin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 85+ messages in thread
From: Mike Fedyk @ 2004-01-06 22:41 UTC (permalink / raw)
  To: Ogden, Aaron A.
  Cc: thockin, H. Peter Anvin, autofs mailing list, Mike Waychison,
	Kernel Mailing List

On Tue, Jan 06, 2004 at 04:28:59PM -0600, Ogden, Aaron A. wrote:
[snip]
> having the daemon in userland is a
> good thing.

You and hpa are agreeing...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [autofs] [RFC] Towards a Modern Autofs
@ 2004-01-06 22:28 Ogden, Aaron A.
  2004-01-06 22:41 ` Mike Fedyk
                   ` (3 more replies)
  0 siblings, 4 replies; 85+ messages in thread
From: Ogden, Aaron A. @ 2004-01-06 22:28 UTC (permalink / raw)
  To: thockin, H. Peter Anvin
  Cc: autofs mailing list, Mike Waychison, Kernel Mailing List



> -----Original Message-----
> From: autofs-bounces@linux.kernel.org 
> [mailto:autofs-bounces@linux.kernel.org] On Behalf Of Tim Hockin
> Sent: Tuesday, January 06, 2004 3:50 PM
> To: H. Peter Anvin
> Cc: autofs mailing list; Mike Waychison; Kernel Mailing List
> Subject: Re: [autofs] [RFC] Towards a Modern Autofs
> 
<...snip...>
>
> > Pardon me for sounding harsh, but I'm seriously sick of the
oft-repeated
> > idiocy that effectively boils down to "the daemon can die and would
lose
> > its state, so let's put it all in the kernel."  A dead daemon is a
> > painful recovery, admitted.  It is also a THIS SHOULD NOT HAPPEN
> 
> But it *does* happen.
> 
> > condition.  By cramming it into the kernel, you're in fact 
> > making the system less stable, not more, because the kernel being
tainted with
> > faulty code is a total system malfunction; a crashed userspace
daemon is
> 
> I don't think this design crams anything into the kernel.  It 
> doesn't put a whole lot more into the kernel than is currently in
there 
> (expiry and new mount stuff, aside).  All the work still happens in
userland.
> 
> The daemon as it stands does NOT handle namespaces, does NOT handle
expiry
> well, and is a pretty sad copy of an old design.
> 
> > "merely" a messy cleanup.  In practice, the autofs daemon does not
die
> > unless a careless system administrator kills it.  It is a
non-problem.
> 
> I have some customers I'd love to send to you, if you really 
> think that's true.

Speaking as a sysadmin with 300+ machines (some linux, some solaris)
using autofs, I can say that the linux autofs daemon does die on
occasion, or at least some of the children become hung or unresponsive.
This happened to us with autofs3 and autofs4, leading me to contact Ian
Kent and become involved in testing new versions of autofs4.  I don't
have any problems with the newest versions (4.1.0+) but with previous
code, 4.0.0pre10 for example, I found the ability to restart the daemon
invaluable.  On those occasions where the autofs daemon gets confused
(loses track of mountpoints, gets corruption in its internal
representation of NIS maps, etc.) we could shut down the autofs daemon,
kill any remaining processes, and restart it from scratch.  In most
cases restarting the daemon fixes the problem.  It's worth noting that I
have seen this happen on Solaris 2.6 as well but it is extremely rare.
On the solaris machine there was no automount daemon to restart so I was
forced to reboot it to regain access to the 'missing' mountpoint.

If you've read this far, what I'm trying to say is that having userspace
related code remain in userland is a good thing since you can restart
the daemon if something goes wrong.  If you move all of this to
kernel-space you can't do anything about it if there is a problem.  In
Solaris there is a command called 'automount' that tells the kernel to
re-read the automount maps, perhaps it resets the autofs subsystem in
the kernel as well.  If linux autofs had the same capability we might
not need the daemon, but until then, having the daemon in userland is a
good thing.

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2004-01-14 15:58 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-06 19:55 [RFC] Towards a Modern Autofs Mike Waychison
2004-01-06 21:01 ` [autofs] " H. Peter Anvin
2004-01-06 21:44   ` Mike Waychison
2004-01-06 21:50   ` Tim Hockin
2004-01-06 22:06     ` H. Peter Anvin
2004-01-06 22:17       ` Tim Hockin
     [not found]       ` <20040106221502.GA7398@hockin.org>
2004-01-06 22:20         ` H. Peter Anvin
2004-01-07 16:19           ` Mike Waychison
2004-01-07 17:55             ` H. Peter Anvin
2004-01-07 21:13               ` Mike Waychison
2004-01-06 22:28       ` name spaces good (was: [autofs] [RFC] Towards a Modern Autofs) Dax Kelson
2004-01-06 22:48         ` name spaces good H. Peter Anvin
2004-01-07 21:14 ` [autofs] [RFC] Towards a Modern Autofs Jim Carter
2004-01-07 22:55   ` Mike Waychison
2004-01-08 12:00     ` Ian Kent
2004-01-08 15:39       ` Mike Waychison
2004-01-09 18:20         ` Ian Kent
2004-01-09 20:06           ` Mike Waychison
2004-01-10  5:43             ` Ian Kent
2004-01-12 13:07               ` Mike Waychison
2004-01-12 16:01                 ` raven
2004-01-12 16:26                   ` Mike Waychison
2004-01-12 22:50                     ` Tim Hockin
2004-01-12 23:28                       ` Mike Waychison
2004-01-13  1:30                       ` Ian Kent
2004-01-12 16:28                   ` raven
2004-01-12 16:58                     ` Mike Waychison
2004-01-13  1:54                       ` Ian Kent
2004-01-13 19:01                         ` Mike Waychison
2004-01-14 15:58                           ` raven
2004-01-13 18:46                   ` Mike Waychison
2004-01-09 20:51           ` Jim Carter
2004-01-10  5:56             ` Ian Kent
2004-01-08 17:34       ` H. Peter Anvin
2004-01-08 19:41         ` Mike Waychison
2004-01-08 23:42         ` Michael Clark
2004-01-09 20:28           ` Mike Waychison
2004-01-09 20:54             ` H. Peter Anvin
2004-01-09 21:43               ` Mike Waychison
2004-01-09 18:32         ` Ian Kent
2004-01-09 20:52           ` Mike Waychison
2004-01-10  6:05             ` Ian Kent
2004-01-08 12:29     ` Olivier Galibert
2004-01-08 13:20       ` Robin Rosenberg
2004-01-08 16:23       ` Mike Waychison
2004-01-08 12:35     ` Ian Kent
2004-01-08 13:08       ` Ian Kent
2004-01-08 18:20     ` Jim Carter
2004-01-08 21:01       ` H. Peter Anvin
2004-01-08  0:48   ` Ian Kent
2004-01-06 22:28 Ogden, Aaron A.
2004-01-06 22:41 ` Mike Fedyk
2004-01-06 22:47 ` Tim Hockin
2004-01-06 22:53 ` Paul Raines
2004-01-07 23:14 ` Jim Carter
2004-01-07 23:32   ` H. Peter Anvin
2004-01-08 12:52     ` Ian Kent
2004-01-08 18:31       ` viro
2004-01-09 18:43         ` Ian Kent
2004-01-09 19:41         ` Mike Waychison
2004-01-09 19:57           ` H. Peter Anvin
2004-01-09 21:31             ` Mike Waychison
2004-01-09 21:36               ` H. Peter Anvin
2004-01-06 23:34 Ogden, Aaron A.
2004-01-06 23:47 ` Tim Hockin
     [not found] <1b5GC-29h-1@gated-at.bofh.it>
     [not found] ` <1b6CO-3v0-15@gated-at.bofh.it>
2004-01-07  4:21   ` Andi Kleen
2004-01-07 17:50     ` H. Peter Anvin
2004-01-07 21:04       ` Mike Waychison
2004-01-07 21:11         ` Mike Fedyk
2004-01-07 23:40           ` Jesper Juhl
2004-01-07 21:24         ` Jeff Garzik
2004-01-07 23:47           ` Mike Waychison
2004-01-07 23:56             ` Jeff Garzik
2004-01-12 16:57               ` Mike Waychison
2004-01-13  7:39                 ` Ian Kent
2004-01-08 19:32 trond.myklebust
2004-01-08 19:41 ` H. Peter Anvin
2004-01-08 20:08   ` trond.myklebust
2004-01-08 21:13     ` H. Peter Anvin
2004-01-08 22:20       ` J. Bruce Fields
2004-01-08 22:24         ` H. Peter Anvin
2004-01-09 20:37       ` Mike Waychison
2004-01-09 21:02         ` H. Peter Anvin
2004-01-09 21:52           ` Mike Waychison
2004-01-09 20:16   ` Mike Waychison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).