linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] autofs4 - remove hashed check in validate_wait()
@ 2009-06-08  3:25 Ian Kent
  2009-06-08  5:08 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Kent @ 2009-06-08  3:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

The recent ->lookup() deadlock correction required the directory
inode mutex to be dropped while waiting for expire completion. We
were concerned about side effects from this change and one has
been identified.

When checking if a mount has already completed prior to adding a
new mount request to the wait queue we check if the dentry is hashed
and, if so, if it is a mount point. But, if a mount successfully
completed while we slept on the wait queue mutex the dentry must
exist for the mount to have completed so the test is not really
needed.

Mounts can also be done on top of a global root dentry, so for the
above case, where a mount request completes and the wait queue entry
has already been removed, the hashed test returning false can cause
an incorrect callback to the daemon. Also, d_mountpoint() is not
sufficient to check if a mount has completed for the multi-mount
case when we don't have a real mount at the base of the tree.

Signed-off-by: Ian Kent <raven@themaw.net>
---

 fs/autofs4/waitq.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index eeb2468..2341375 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -297,20 +297,14 @@ static int validate_request(struct autofs_wait_queue **wait,
 	 */
 	if (notify == NFY_MOUNT) {
 		/*
-		 * If the dentry isn't hashed just go ahead and try the
-		 * mount again with a new wait (not much else we can do).
-		*/
-		if (!d_unhashed(dentry)) {
-			/*
-			 * But if the dentry is hashed, that means that we
-			 * got here through the revalidate path.  Thus, we
-			 * need to check if the dentry has been mounted
-			 * while we waited on the wq_mutex. If it has,
-			 * simply return success.
-			 */
-			if (d_mountpoint(dentry))
-				return 0;
-		}
+		 * If the dentry was successfully mounted while we slept
+		 * on the wait queue mutex we can return success. If it
+		 * isn't mounted (doesn't have submounts for the case of
+		 * a multi-mount with no mount at it's base) we can
+		 * continue on and create a new request.
+		 */
+		if (have_submounts(dentry))
+			return 0;
 	}
 
 	return 1;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] autofs4 - remove hashed check in validate_wait()
  2009-06-08  3:25 [PATCH] autofs4 - remove hashed check in validate_wait() Ian Kent
@ 2009-06-08  5:08 ` Andrew Morton
  2009-06-08  5:23   ` Ian Kent
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2009-06-08  5:08 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

On Mon, 08 Jun 2009 11:25:37 +0800 Ian Kent <raven@themaw.net> wrote:

> The recent ->lookup() deadlock correction required the directory
> inode mutex to be dropped while waiting for expire completion. We
> were concerned about side effects from this change and one has
> been identified.
> 
> When checking if a mount has already completed prior to adding a
> new mount request to the wait queue we check if the dentry is hashed
> and, if so, if it is a mount point. But, if a mount successfully
> completed while we slept on the wait queue mutex the dentry must
> exist for the mount to have completed so the test is not really
> needed.
> 
> Mounts can also be done on top of a global root dentry, so for the
> above case, where a mount request completes and the wait queue entry
> has already been removed, the hashed test returning false can cause
> an incorrect callback to the daemon. Also, d_mountpoint() is not
> sufficient to check if a mount has completed for the multi-mount
> case when we don't have a real mount at the base of the tree.
> 

I've been scratching my head trying to work out if this is a
needed-in-2.6.30 fix, but all I got was a bald spot.  Help?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] autofs4 - remove hashed check in validate_wait()
  2009-06-08  5:08 ` Andrew Morton
@ 2009-06-08  5:23   ` Ian Kent
  2009-06-08  5:41     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Kent @ 2009-06-08  5:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

Andrew Morton wrote:
> On Mon, 08 Jun 2009 11:25:37 +0800 Ian Kent <raven@themaw.net> wrote:
> 
>> The recent ->lookup() deadlock correction required the directory
>> inode mutex to be dropped while waiting for expire completion. We
>> were concerned about side effects from this change and one has
>> been identified.
>>
>> When checking if a mount has already completed prior to adding a
>> new mount request to the wait queue we check if the dentry is hashed
>> and, if so, if it is a mount point. But, if a mount successfully
>> completed while we slept on the wait queue mutex the dentry must
>> exist for the mount to have completed so the test is not really
>> needed.
>>
>> Mounts can also be done on top of a global root dentry, so for the
>> above case, where a mount request completes and the wait queue entry
>> has already been removed, the hashed test returning false can cause
>> an incorrect callback to the daemon. Also, d_mountpoint() is not
>> sufficient to check if a mount has completed for the multi-mount
>> case when we don't have a real mount at the base of the tree.
>>
> 
> I've been scratching my head trying to work out if this is a
> needed-in-2.6.30 fix, but all I got was a bald spot.  Help?

Yeah, and why would you want to know that much about autofs, it's a
wonder I have any hair at all, ;)

I think so if possible, as it resolves an issue that is a side effect of
the last patch I sent, which resolved a deadlook in ->lookup(). The
problem occurs due to dropping the directory inode mutex before waiting
for an expire. What isn't obvious is that holding the mutex (as we did
previously) causes processes wanting to request mounts for other
directories to wait, so we don't see the contention for the mount
request wait queue that this patch addresses.

However, the issue only surfaces when there are a number of processes
all trying to perform mounts at the same time. The test I ran used 10
processes all using the same map.

Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] autofs4 - remove hashed check in validate_wait()
  2009-06-08  5:23   ` Ian Kent
@ 2009-06-08  5:41     ` Andrew Morton
  2009-06-08  5:53       ` Ian Kent
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2009-06-08  5:41 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

On Mon, 08 Jun 2009 13:23:40 +0800 Ian Kent <raven@themaw.net> wrote:

> Andrew Morton wrote:
> > On Mon, 08 Jun 2009 11:25:37 +0800 Ian Kent <raven@themaw.net> wrote:
> > 
> >> The recent ->lookup() deadlock correction required the directory
> >> inode mutex to be dropped while waiting for expire completion. We
> >> were concerned about side effects from this change and one has
> >> been identified.
> >>
> >> When checking if a mount has already completed prior to adding a
> >> new mount request to the wait queue we check if the dentry is hashed
> >> and, if so, if it is a mount point. But, if a mount successfully
> >> completed while we slept on the wait queue mutex the dentry must
> >> exist for the mount to have completed so the test is not really
> >> needed.
> >>
> >> Mounts can also be done on top of a global root dentry, so for the
> >> above case, where a mount request completes and the wait queue entry
> >> has already been removed, the hashed test returning false can cause
> >> an incorrect callback to the daemon. Also, d_mountpoint() is not
> >> sufficient to check if a mount has completed for the multi-mount
> >> case when we don't have a real mount at the base of the tree.
> >>
> > 
> > I've been scratching my head trying to work out if this is a
> > needed-in-2.6.30 fix, but all I got was a bald spot.  Help?
> 
> Yeah, and why would you want to know that much about autofs, it's a
> wonder I have any hair at all, ;)
> 
> I think so if possible, as it resolves an issue that is a side effect of
> the last patch I sent, which resolved a deadlook in ->lookup(). The
> problem occurs due to dropping the directory inode mutex before waiting
> for an expire. What isn't obvious is that holding the mutex (as we did
> previously) causes processes wanting to request mounts for other
> directories to wait, so we don't see the contention for the mount
> request wait queue that this patch addresses.
> 
> However, the issue only surfaces when there are a number of processes
> all trying to perform mounts at the same time. The test I ran used 10
> processes all using the same map.

<scratch scratch> 
What are the user-visible effects of the thing-which-this-fixed?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] autofs4 - remove hashed check in validate_wait()
  2009-06-08  5:41     ` Andrew Morton
@ 2009-06-08  5:53       ` Ian Kent
  2009-06-08  5:59         ` Ian Kent
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Kent @ 2009-06-08  5:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

Andrew Morton wrote:
> On Mon, 08 Jun 2009 13:23:40 +0800 Ian Kent <raven@themaw.net> wrote:
> 
>> Andrew Morton wrote:
>>> On Mon, 08 Jun 2009 11:25:37 +0800 Ian Kent <raven@themaw.net> wrote:
>>>
>>>> The recent ->lookup() deadlock correction required the directory
>>>> inode mutex to be dropped while waiting for expire completion. We
>>>> were concerned about side effects from this change and one has
>>>> been identified.
>>>>
>>>> When checking if a mount has already completed prior to adding a
>>>> new mount request to the wait queue we check if the dentry is hashed
>>>> and, if so, if it is a mount point. But, if a mount successfully
>>>> completed while we slept on the wait queue mutex the dentry must
>>>> exist for the mount to have completed so the test is not really
>>>> needed.
>>>>
>>>> Mounts can also be done on top of a global root dentry, so for the
>>>> above case, where a mount request completes and the wait queue entry
>>>> has already been removed, the hashed test returning false can cause
>>>> an incorrect callback to the daemon. Also, d_mountpoint() is not
>>>> sufficient to check if a mount has completed for the multi-mount
>>>> case when we don't have a real mount at the base of the tree.
>>>>
>>> I've been scratching my head trying to work out if this is a
>>> needed-in-2.6.30 fix, but all I got was a bald spot.  Help?
>> Yeah, and why would you want to know that much about autofs, it's a
>> wonder I have any hair at all, ;)
>>
>> I think so if possible, as it resolves an issue that is a side effect of
>> the last patch I sent, which resolved a deadlook in ->lookup(). The
>> problem occurs due to dropping the directory inode mutex before waiting
>> for an expire. What isn't obvious is that holding the mutex (as we did
>> previously) causes processes wanting to request mounts for other
>> directories to wait, so we don't see the contention for the mount
>> request wait queue that this patch addresses.
>>
>> However, the issue only surfaces when there are a number of processes
>> all trying to perform mounts at the same time. The test I ran used 10
>> processes all using the same map.
> 
> <scratch scratch> 
> What are the user-visible effects of the thing-which-this-fixed?

I saw several error messages.

They cause autofs to become quite confused and don't really point to the
actual problem.

Things like:

handle_packet_missing_direct:1376: can't find map entry for (43,1827932)

which is usually totally fatal (although in this case it wouldn't be
except that I treat is as such because it normally is).

do_mount_direct: direct trigger not valid or already mounted
/test/nested/g3c/s1/ss1

which is recoverable, however if this problem is at play it can cause
autofs to become quite confused as to the dependencies in the mount tree
because mount triggers end up mounted multiple times. It's hard to
accurately check for this over mounting case and automount shouldn't
need to if the kernel module is doing its job.

There was one other message, similar in consequence of this last one but
I can't locate a log example just now.

Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] autofs4 - remove hashed check in validate_wait()
  2009-06-08  5:53       ` Ian Kent
@ 2009-06-08  5:59         ` Ian Kent
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Kent @ 2009-06-08  5:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: autofs mailing list, linux-fsdevel, Kernel Mailing List

Ian Kent wrote:
> Andrew Morton wrote:
>> On Mon, 08 Jun 2009 13:23:40 +0800 Ian Kent <raven@themaw.net> wrote:
>>
>>> Andrew Morton wrote:
>>>> On Mon, 08 Jun 2009 11:25:37 +0800 Ian Kent <raven@themaw.net> wrote:
>>>>
>>>>> The recent ->lookup() deadlock correction required the directory
>>>>> inode mutex to be dropped while waiting for expire completion. We
>>>>> were concerned about side effects from this change and one has
>>>>> been identified.
>>>>>
>>>>> When checking if a mount has already completed prior to adding a
>>>>> new mount request to the wait queue we check if the dentry is hashed
>>>>> and, if so, if it is a mount point. But, if a mount successfully
>>>>> completed while we slept on the wait queue mutex the dentry must
>>>>> exist for the mount to have completed so the test is not really
>>>>> needed.
>>>>>
>>>>> Mounts can also be done on top of a global root dentry, so for the
>>>>> above case, where a mount request completes and the wait queue entry
>>>>> has already been removed, the hashed test returning false can cause
>>>>> an incorrect callback to the daemon. Also, d_mountpoint() is not
>>>>> sufficient to check if a mount has completed for the multi-mount
>>>>> case when we don't have a real mount at the base of the tree.
>>>>>
>>>> I've been scratching my head trying to work out if this is a
>>>> needed-in-2.6.30 fix, but all I got was a bald spot.  Help?
>>> Yeah, and why would you want to know that much about autofs, it's a
>>> wonder I have any hair at all, ;)
>>>
>>> I think so if possible, as it resolves an issue that is a side effect of
>>> the last patch I sent, which resolved a deadlook in ->lookup(). The
>>> problem occurs due to dropping the directory inode mutex before waiting
>>> for an expire. What isn't obvious is that holding the mutex (as we did
>>> previously) causes processes wanting to request mounts for other
>>> directories to wait, so we don't see the contention for the mount
>>> request wait queue that this patch addresses.
>>>
>>> However, the issue only surfaces when there are a number of processes
>>> all trying to perform mounts at the same time. The test I ran used 10
>>> processes all using the same map.
>> <scratch scratch> 
>> What are the user-visible effects of the thing-which-this-fixed?
> 
> I saw several error messages.
> 
> They cause autofs to become quite confused and don't really point to the
> actual problem.
> 
> Things like:
> 
> handle_packet_missing_direct:1376: can't find map entry for (43,1827932)
> 
> which is usually totally fatal (although in this case it wouldn't be
> except that I treat is as such because it normally is).

Oh .. and the requesting user always receives a fail in this case in
spite of the mount actually having previously succeeded.

> 
> do_mount_direct: direct trigger not valid or already mounted
> /test/nested/g3c/s1/ss1
> 
> which is recoverable, however if this problem is at play it can cause
> autofs to become quite confused as to the dependencies in the mount tree
> because mount triggers end up mounted multiple times. It's hard to
> accurately check for this over mounting case and automount shouldn't
> need to if the kernel module is doing its job.
> 
> There was one other message, similar in consequence of this last one but
> I can't locate a log example just now.

And these two cause fails to be returned return to the requesting
process either immediately or shortly after as autofs becomes confused.

Sorry I can't give a clearer description.

Ian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-08  5:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08  3:25 [PATCH] autofs4 - remove hashed check in validate_wait() Ian Kent
2009-06-08  5:08 ` Andrew Morton
2009-06-08  5:23   ` Ian Kent
2009-06-08  5:41     ` Andrew Morton
2009-06-08  5:53       ` Ian Kent
2009-06-08  5:59         ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).