linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* autofs4 hang in 2.6.37-rc1
@ 2010-11-14 12:55 Avi Kivity
  2010-11-14 13:51 ` Avi Kivity
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2010-11-14 12:55 UTC (permalink / raw)
  To: Ian Kent, autofs; +Cc: linux-kernel

A 2.6.37-rc1 (f6614b7bb405a) setup hangs in autofs4:

INFO: task automount:10399 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
automount     D 0000000000000000     0 10399      1 0x00000000
  ffff88012a059da8 0000000000000082 0000000000000000 0000000000000000
  ffff88012a058010 ffff88012a059fd8 0000000000011800 ffff8801290f2c40
  ffff8801290f2fb0 ffff8801290f2fa8 0000000000011800 0000000000011800
Call Trace:
  [<ffffffff813a3c3c>] __mutex_lock_common+0x126/0x18b
  [<ffffffff813a3cb5>] __mutex_lock_slowpath+0x14/0x16
  [<ffffffff813a3dbc>] mutex_lock+0x31/0x4b
  [<ffffffff811b17f7>] autofs4_root_ioctl+0x28/0x53
  [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
  [<ffffffff810eca65>] ? sys_newlstat+0x2d/0x38
  [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
  [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b

This is a simple /home automount.  More info on request.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-14 12:55 autofs4 hang in 2.6.37-rc1 Avi Kivity
@ 2010-11-14 13:51 ` Avi Kivity
  2010-11-14 15:15   ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2010-11-14 13:51 UTC (permalink / raw)
  To: Ian Kent, autofs; +Cc: linux-kernel

On 11/14/2010 02:55 PM, Avi Kivity wrote:
> A 2.6.37-rc1 (f6614b7bb405a) setup hangs in autofs4:
>
> INFO: task automount:10399 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> automount     D 0000000000000000     0 10399      1 0x00000000
>  ffff88012a059da8 0000000000000082 0000000000000000 0000000000000000
>  ffff88012a058010 ffff88012a059fd8 0000000000011800 ffff8801290f2c40
>  ffff8801290f2fb0 ffff8801290f2fa8 0000000000011800 0000000000011800
> Call Trace:
>  [<ffffffff813a3c3c>] __mutex_lock_common+0x126/0x18b
>  [<ffffffff813a3cb5>] __mutex_lock_slowpath+0x14/0x16
>  [<ffffffff813a3dbc>] mutex_lock+0x31/0x4b
>  [<ffffffff811b17f7>] autofs4_root_ioctl+0x28/0x53
>  [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
>  [<ffffffff810eca65>] ? sys_newlstat+0x2d/0x38
>  [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
>  [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b
>
> This is a simple /home automount.  More info on request.
>

Likely culprit:


automount     S ffff88012a28a680     0   399      1 0x00000000
  ffff88012a07bd08 0000000000000082 0000000000000000 0000000000000000
  ffff88012a07a010 ffff88012a07bfd8 0000000000011800 ffff88012693c260
  ffff88012693c5d0 ffff88012693c5c8 0000000000011800 0000000000011800
Call Trace:
  [<ffffffff81056197>] ? prepare_to_wait+0x67/0x74
  [<ffffffff811b23eb>] autofs4_wait+0x5a4/0x6d5
  [<ffffffff81055f25>] ? autoremove_wake_function+0x0/0x34
  [<ffffffff811b2ba5>] autofs4_do_expire_multi+0x5b/0xa3
  [<ffffffff811b2c39>] autofs4_expire_multi+0x4c/0x54
  [<ffffffff811b1750>] autofs4_root_ioctl_unlocked+0x23e/0x252
  [<ffffffff811b1808>] autofs4_root_ioctl+0x39/0x53
  [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
  [<ffffffff810ca644>] ? remove_vma+0x6e/0x76
  [<ffffffff810cb6a2>] ? do_munmap+0x31c/0x33e
  [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
  [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b


Shouldn't we drop autofs4_ioctl_mutex while we wait?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-14 13:51 ` Avi Kivity
@ 2010-11-14 15:15   ` Arnd Bergmann
  2010-11-14 15:34     ` Avi Kivity
  2010-11-15  1:31     ` Ian Kent
  0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2010-11-14 15:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ian Kent, autofs, linux-kernel

On Sunday 14 November 2010 14:51:04 Avi Kivity wrote:
> automount     S ffff88012a28a680     0   399      1 0x00000000
>   ffff88012a07bd08 0000000000000082 0000000000000000 0000000000000000
>   ffff88012a07a010 ffff88012a07bfd8 0000000000011800 ffff88012693c260
>   ffff88012693c5d0 ffff88012693c5c8 0000000000011800 0000000000011800
> Call Trace:
>   [<ffffffff81056197>] ? prepare_to_wait+0x67/0x74
>   [<ffffffff811b23eb>] autofs4_wait+0x5a4/0x6d5
>   [<ffffffff81055f25>] ? autoremove_wake_function+0x0/0x34
>   [<ffffffff811b2ba5>] autofs4_do_expire_multi+0x5b/0xa3
>   [<ffffffff811b2c39>] autofs4_expire_multi+0x4c/0x54
>   [<ffffffff811b1750>] autofs4_root_ioctl_unlocked+0x23e/0x252
>   [<ffffffff811b1808>] autofs4_root_ioctl+0x39/0x53
>   [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
>   [<ffffffff810ca644>] ? remove_vma+0x6e/0x76
>   [<ffffffff810cb6a2>] ? do_munmap+0x31c/0x33e
>   [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
>   [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b
> 
> 
> Shouldn't we drop autofs4_ioctl_mutex while we wait?

If the ioctl can sleep for multiple seconds, the mutex should
indeed be dropped, and that would be safe because we used to
do the same with the BKL.

The question is why this would sleep for more than 120 seconds.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-14 15:15   ` Arnd Bergmann
@ 2010-11-14 15:34     ` Avi Kivity
  2010-11-15  1:45       ` Ian Kent
  2010-11-15  1:31     ` Ian Kent
  1 sibling, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2010-11-14 15:34 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Ian Kent, autofs, linux-kernel

On 11/14/2010 05:15 PM, Arnd Bergmann wrote:
> On Sunday 14 November 2010 14:51:04 Avi Kivity wrote:
> >  automount     S ffff88012a28a680     0   399      1 0x00000000
> >    ffff88012a07bd08 0000000000000082 0000000000000000 0000000000000000
> >    ffff88012a07a010 ffff88012a07bfd8 0000000000011800 ffff88012693c260
> >    ffff88012693c5d0 ffff88012693c5c8 0000000000011800 0000000000011800
> >  Call Trace:
> >    [<ffffffff81056197>] ? prepare_to_wait+0x67/0x74
> >    [<ffffffff811b23eb>] autofs4_wait+0x5a4/0x6d5
> >    [<ffffffff81055f25>] ? autoremove_wake_function+0x0/0x34
> >    [<ffffffff811b2ba5>] autofs4_do_expire_multi+0x5b/0xa3
> >    [<ffffffff811b2c39>] autofs4_expire_multi+0x4c/0x54
> >    [<ffffffff811b1750>] autofs4_root_ioctl_unlocked+0x23e/0x252
> >    [<ffffffff811b1808>] autofs4_root_ioctl+0x39/0x53
> >    [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
> >    [<ffffffff810ca644>] ? remove_vma+0x6e/0x76
> >    [<ffffffff810cb6a2>] ? do_munmap+0x31c/0x33e
> >    [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
> >    [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b
> >
> >
> >  Shouldn't we drop autofs4_ioctl_mutex while we wait?
>
> If the ioctl can sleep for multiple seconds, the mutex should
> indeed be dropped, and that would be safe because we used to
> do the same with the BKL.
>
> The question is why this would sleep for more than 120 seconds.
>

Let's fix first and ask questions later.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-14 15:15   ` Arnd Bergmann
  2010-11-14 15:34     ` Avi Kivity
@ 2010-11-15  1:31     ` Ian Kent
  2010-11-15  9:02       ` Avi Kivity
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2010-11-15  1:31 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Avi Kivity, autofs, linux-kernel

On Sun, 2010-11-14 at 16:15 +0100, Arnd Bergmann wrote:
> On Sunday 14 November 2010 14:51:04 Avi Kivity wrote:
> > automount     S ffff88012a28a680     0   399      1 0x00000000
> >   ffff88012a07bd08 0000000000000082 0000000000000000 0000000000000000
> >   ffff88012a07a010 ffff88012a07bfd8 0000000000011800 ffff88012693c260
> >   ffff88012693c5d0 ffff88012693c5c8 0000000000011800 0000000000011800
> > Call Trace:
> >   [<ffffffff81056197>] ? prepare_to_wait+0x67/0x74
> >   [<ffffffff811b23eb>] autofs4_wait+0x5a4/0x6d5
> >   [<ffffffff81055f25>] ? autoremove_wake_function+0x0/0x34
> >   [<ffffffff811b2ba5>] autofs4_do_expire_multi+0x5b/0xa3
> >   [<ffffffff811b2c39>] autofs4_expire_multi+0x4c/0x54
> >   [<ffffffff811b1750>] autofs4_root_ioctl_unlocked+0x23e/0x252
> >   [<ffffffff811b1808>] autofs4_root_ioctl+0x39/0x53
> >   [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
> >   [<ffffffff810ca644>] ? remove_vma+0x6e/0x76
> >   [<ffffffff810cb6a2>] ? do_munmap+0x31c/0x33e
> >   [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
> >   [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b
> > 
> > 
> > Shouldn't we drop autofs4_ioctl_mutex while we wait?
> 
> If the ioctl can sleep for multiple seconds, the mutex should
> indeed be dropped, and that would be safe because we used to
> do the same with the BKL.
> 
> The question is why this would sleep for more than 120 seconds.

umount against a server that isn't responding can easily take more than
2 minutes.
 
Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-14 15:34     ` Avi Kivity
@ 2010-11-15  1:45       ` Ian Kent
  2010-11-15  8:54         ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Kent @ 2010-11-15  1:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Arnd Bergmann, autofs, linux-kernel

On Sun, 2010-11-14 at 17:34 +0200, Avi Kivity wrote:
> On 11/14/2010 05:15 PM, Arnd Bergmann wrote:
> > On Sunday 14 November 2010 14:51:04 Avi Kivity wrote:
> > >  automount     S ffff88012a28a680     0   399      1 0x00000000
> > >    ffff88012a07bd08 0000000000000082 0000000000000000 0000000000000000
> > >    ffff88012a07a010 ffff88012a07bfd8 0000000000011800 ffff88012693c260
> > >    ffff88012693c5d0 ffff88012693c5c8 0000000000011800 0000000000011800
> > >  Call Trace:
> > >    [<ffffffff81056197>] ? prepare_to_wait+0x67/0x74
> > >    [<ffffffff811b23eb>] autofs4_wait+0x5a4/0x6d5
> > >    [<ffffffff81055f25>] ? autoremove_wake_function+0x0/0x34
> > >    [<ffffffff811b2ba5>] autofs4_do_expire_multi+0x5b/0xa3
> > >    [<ffffffff811b2c39>] autofs4_expire_multi+0x4c/0x54
> > >    [<ffffffff811b1750>] autofs4_root_ioctl_unlocked+0x23e/0x252
> > >    [<ffffffff811b1808>] autofs4_root_ioctl+0x39/0x53
> > >    [<ffffffff810f5e5c>] do_vfs_ioctl+0x557/0x5bb
> > >    [<ffffffff810ca644>] ? remove_vma+0x6e/0x76
> > >    [<ffffffff810cb6a2>] ? do_munmap+0x31c/0x33e
> > >    [<ffffffff810f5f02>] sys_ioctl+0x42/0x65
> > >    [<ffffffff81002b42>] system_call_fastpath+0x16/0x1b
> > >
> > >
> > >  Shouldn't we drop autofs4_ioctl_mutex while we wait?
> >
> > If the ioctl can sleep for multiple seconds, the mutex should
> > indeed be dropped, and that would be safe because we used to
> > do the same with the BKL.
> >
> > The question is why this would sleep for more than 120 seconds.
> >
> 
> Let's fix first and ask questions later.

You can't hold an exclusive mutex during an autofs expire because the
daemon will start by calling the ioctl to check for a dentry to expire
then call back to the daemon to perform the umount and wait for a status
return (also an ioctl).

>From memory the expire is the only ioctl that is sensitive to this
deadlock.

So, either the mutex must be released while waiting for the status
return or get rid of the autofs4_ioctl_mutex altogether.

Ian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15  1:45       ` Ian Kent
@ 2010-11-15  8:54         ` Arnd Bergmann
  2010-11-15 13:22           ` Ian Kent
  2010-11-18  3:54           ` Ian Kent
  0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2010-11-15  8:54 UTC (permalink / raw)
  To: Ian Kent; +Cc: Avi Kivity, autofs, linux-kernel

On Monday 15 November 2010 02:45:33 Ian Kent wrote:

> You can't hold an exclusive mutex during an autofs expire because the
> daemon will start by calling the ioctl to check for a dentry to expire
> then call back to the daemon to perform the umount and wait for a status
> return (also an ioctl).

Ok, I see. So it's my fault for not realizing that there are long blocking
ioctls. I was under the assumption that all of these ioctl commands were
simple non-blocking commands.

> >From memory the expire is the only ioctl that is sensitive to this
> deadlock.
> 
> So, either the mutex must be released while waiting for the status
> return or get rid of the autofs4_ioctl_mutex altogether.

Right. As I said with the original patch, I don't think the mutex
is really needed, but using it seemed to be the safer alternative.
It was in the sense that it guaranteed the breakage to be obvious
rather than silent...

Ian, if you can prove that the lock is not needed, I think we shold
just remove it.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15  1:31     ` Ian Kent
@ 2010-11-15  9:02       ` Avi Kivity
  2010-11-22  8:42         ` Thomas Fjellstrom
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2010-11-15  9:02 UTC (permalink / raw)
  To: Ian Kent; +Cc: Arnd Bergmann, autofs, linux-kernel

On 11/15/2010 03:31 AM, Ian Kent wrote:
> >
> >  If the ioctl can sleep for multiple seconds, the mutex should
> >  indeed be dropped, and that would be safe because we used to
> >  do the same with the BKL.
> >
> >  The question is why this would sleep for more than 120 seconds.
>
> umount against a server that isn't responding can easily take more than
> 2 minutes.

Well, in my setup, the server should be responding.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15  8:54         ` Arnd Bergmann
@ 2010-11-15 13:22           ` Ian Kent
  2010-11-15 13:27             ` Avi Kivity
  2010-11-18  3:54           ` Ian Kent
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2010-11-15 13:22 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Avi Kivity, autofs, linux-kernel

On Mon, 2010-11-15 at 09:54 +0100, Arnd Bergmann wrote:
> On Monday 15 November 2010 02:45:33 Ian Kent wrote:
> 
> > You can't hold an exclusive mutex during an autofs expire because the
> > daemon will start by calling the ioctl to check for a dentry to expire
> > then call back to the daemon to perform the umount and wait for a status
> > return (also an ioctl).
> 
> Ok, I see. So it's my fault for not realizing that there are long blocking
> ioctls. I was under the assumption that all of these ioctl commands were
> simple non-blocking commands.

This isn't anyone's fault (except maybe mine) because I'm the one most
likely to realize it was a problem and didn't notice it. I've even been
caught by this deadlock (when holding a singular lock) before when I
tried to use .. ummm .. netlink (I think, not even sure what it's called
any more) instead of an ioctl interface for the new autofs control
interface.

> 
> > >From memory the expire is the only ioctl that is sensitive to this
> > deadlock.
> > 
> > So, either the mutex must be released while waiting for the status
> > return or get rid of the autofs4_ioctl_mutex altogether.
> 
> Right. As I said with the original patch, I don't think the mutex
> is really needed, but using it seemed to be the safer alternative.
> It was in the sense that it guaranteed the breakage to be obvious
> rather than silent...
> 
> Ian, if you can prove that the lock is not needed, I think we shold
> just remove it.

I don't think I can prove it but I will have a long look at the code.
I don't think it is needed and I expect I'll recommend it be removed.

Oh and btw ... please excuse this off-topic question.

In your recent commit 6e9624b8caec290d28b4c6d9ec75749df6372b87 regarding
BKL removal you implied that blkdev_{get,put} shouldn't need the BLK.
I'm working on a btrfs problem and one of the issues is a deadlock
caused by the out of order acquisition of the BLK and the bdev->bd_mutex
between these two functions. Clearly this isn't a problem from 2.6.36
but do you think it would be safe just to apply the hunks for
blkdev_{get,put} from your commit to fix my problem for older an older
kernel, say 2.6.35?

Ian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15 13:22           ` Ian Kent
@ 2010-11-15 13:27             ` Avi Kivity
  2010-11-15 13:38               ` Ian Kent
  0 siblings, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2010-11-15 13:27 UTC (permalink / raw)
  To: Ian Kent; +Cc: Arnd Bergmann, autofs, linux-kernel

On 11/15/2010 03:22 PM, Ian Kent wrote:
> >  Ian, if you can prove that the lock is not needed, I think we shold
> >  just remove it.
>
> I don't think I can prove it but I will have a long look at the code.
> I don't think it is needed and I expect I'll recommend it be removed.

I've been running with the lock removed for a while with no ill effect.  
Of course it doesn't prove anything but at least it's a workaround for me.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15 13:27             ` Avi Kivity
@ 2010-11-15 13:38               ` Ian Kent
  2010-11-15 13:42                 ` Ian Kent
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Kent @ 2010-11-15 13:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Arnd Bergmann, autofs, linux-kernel

On Mon, 2010-11-15 at 15:27 +0200, Avi Kivity wrote:
> On 11/15/2010 03:22 PM, Ian Kent wrote:
> > >  Ian, if you can prove that the lock is not needed, I think we shold
> > >  just remove it.
> >
> > I don't think I can prove it but I will have a long look at the code.
> > I don't think it is needed and I expect I'll recommend it be removed.
> 
> I've been running with the lock removed for a while with no ill effect.  
> Of course it doesn't prove anything but at least it's a workaround for me.

Yeah, I tried pretty hard over quite a long time, with the expectation
that the BKL would be removed, to try and make the code independent of
it. At one point patched the kernel to use the unlocked ioctl entry
point during some development testing and found only one fix that was
needed, although a lot has changed since then too.

Ian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15 13:38               ` Ian Kent
@ 2010-11-15 13:42                 ` Ian Kent
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Kent @ 2010-11-15 13:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Arnd Bergmann, autofs, linux-kernel

On Mon, 2010-11-15 at 21:38 +0800, Ian Kent wrote:
> On Mon, 2010-11-15 at 15:27 +0200, Avi Kivity wrote:
> > On 11/15/2010 03:22 PM, Ian Kent wrote:
> > > >  Ian, if you can prove that the lock is not needed, I think we shold
> > > >  just remove it.
> > >
> > > I don't think I can prove it but I will have a long look at the code.
> > > I don't think it is needed and I expect I'll recommend it be removed.
> > 
> > I've been running with the lock removed for a while with no ill effect.  
> > Of course it doesn't prove anything but at least it's a workaround for me.
> 
> Yeah, I tried pretty hard over quite a long time, with the expectation
> that the BKL would be removed, to try and make the code independent of
> it. At one point patched the kernel to use the unlocked ioctl entry
> point during some development testing and found only one fix that was
> needed, although a lot has changed since then too.

Hahaha, although as you say, I won't really know if there are races
until I get people really hammering autofs. But, since that's were this
is at maybe that's reason enough to remove it so we can get people to
start applying pressure to the code so we find and fix any problems.

> 
> Ian
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15  8:54         ` Arnd Bergmann
  2010-11-15 13:22           ` Ian Kent
@ 2010-11-18  3:54           ` Ian Kent
  2010-11-25 13:17             ` Arnd Bergmann
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2010-11-18  3:54 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Avi Kivity, autofs, linux-kernel

On Mon, 2010-11-15 at 09:54 +0100, Arnd Bergmann wrote:
> On Monday 15 November 2010 02:45:33 Ian Kent wrote:
> 
> > You can't hold an exclusive mutex during an autofs expire because the
> > daemon will start by calling the ioctl to check for a dentry to expire
> > then call back to the daemon to perform the umount and wait for a status
> > return (also an ioctl).
> 
> Ok, I see. So it's my fault for not realizing that there are long blocking
> ioctls. I was under the assumption that all of these ioctl commands were
> simple non-blocking commands.
> 
> > >From memory the expire is the only ioctl that is sensitive to this
> > deadlock.
> > 
> > So, either the mutex must be released while waiting for the status
> > return or get rid of the autofs4_ioctl_mutex altogether.
> 
> Right. As I said with the original patch, I don't think the mutex
> is really needed, but using it seemed to be the safer alternative.
> It was in the sense that it guaranteed the breakage to be obvious
> rather than silent...
> 
> Ian, if you can prove that the lock is not needed, I think we shold
> just remove it.

I've looked through the old ioctl interface code and that looks fine.

But the important thing to notice is that the new ioctl interface (in
fs/autofs4/dev-ioctl.c) used the unlocked_ioctl method since it was
merged in 2.6.28 and that calls back into the core ioctl code for its
major functionality. So the core function of the ioctl interface has
been used without the BKL for quite a while now and has been heavily
exercised in subsequent testing since the new ioctl interface has been
in place.

I can't see any reason for keeping the autofs4_ioctl_mutex.

Ian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-15  9:02       ` Avi Kivity
@ 2010-11-22  8:42         ` Thomas Fjellstrom
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Fjellstrom @ 2010-11-22  8:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ian Kent, Arnd Bergmann, autofs, linux-kernel

On November 15, 2010, Avi Kivity wrote:
> On 11/15/2010 03:31 AM, Ian Kent wrote:
> > >  If the ioctl can sleep for multiple seconds, the mutex should
> > >  indeed be dropped, and that would be safe because we used to
> > >  do the same with the BKL.
> > >  
> > >  The question is why this would sleep for more than 120 seconds.
> > 
> > umount against a server that isn't responding can easily take more than
> > 2 minutes.
> 
> Well, in my setup, the server should be responding.

Same as in mine. Server's up, but automount is locking up solid. trying to 
access the share, or links pointing to anything on it also lock up and can't 
be killed.

[ 5520.863130] INFO: task automount:1491 blocked for more than 120 seconds.
[ 5520.863139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 5520.863145] automount     D 0000000000000001     0  1491      1 0x00000000
[ 5520.863157]  ffff880137f5fdc8 0000000000000086 ffff880137f5fd48 
0000000000000086
[ 5520.863172]  ffff8800379efd78 ffff880137f5e000 ffff880137f5e000 ffff88013f1c06a0
[ 5520.863184]  ffff880137f5e010 ffff88013f1c0960 ffff880137f5ffd8 ffff880137f5ffd8
[ 5520.863197] Call Trace:
[ 5520.863214]  [<ffffffff81038541>] ? get_parent_ip+0x11/0x50
[ 5520.863232]  [<ffffffff813914ea>] __mutex_lock_slowpath+0x11a/0x1e0
[ 5520.863242]  [<ffffffff813962fd>] ? sub_preempt_count+0x9d/0xd0
[ 5520.863251]  [<ffffffff8139106e>] mutex_lock+0x1e/0x40
[ 5520.863264]  [<ffffffffa03390a8>] autofs4_root_ioctl+0x38/0x70 [autofs4]
[ 5520.863274]  [<ffffffff8112d4e7>] do_vfs_ioctl+0x97/0x530
[ 5520.863283]  [<ffffffff8104af2c>] ? sys_wait4+0xac/0xf0
[ 5520.863291]  [<ffffffff8112d9ca>] sys_ioctl+0x4a/0x80
[ 5520.863302]  [<ffffffff81002feb>] system_call_fastpath+0x16/0x1b

Got about 8+ of those in my dmesg right now. Another machine running .36 right 
now, is playing music off the same share, and this machine doesn't lock up at 
all when running .36 as well. Tried 2.6.37-rc1 and now -rc3.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: autofs4 hang in 2.6.37-rc1
  2010-11-18  3:54           ` Ian Kent
@ 2010-11-25 13:17             ` Arnd Bergmann
  0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2010-11-25 13:17 UTC (permalink / raw)
  To: Ian Kent; +Cc: Avi Kivity, autofs, linux-kernel

On Thursday 18 November 2010, Ian Kent wrote:
> But the important thing to notice is that the new ioctl interface (in
> fs/autofs4/dev-ioctl.c) used the unlocked_ioctl method since it was
> merged in 2.6.28 and that calls back into the core ioctl code for its
> major functionality. So the core function of the ioctl interface has
> been used without the BKL for quite a while now and has been heavily
> exercised in subsequent testing since the new ioctl interface has been
> in place.
> 
> I can't see any reason for keeping the autofs4_ioctl_mutex.

Ok. Are you submitting a patch to remove it? Your reasoning absolutely
makes sense and we need to fix this regression.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-11-25 13:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-14 12:55 autofs4 hang in 2.6.37-rc1 Avi Kivity
2010-11-14 13:51 ` Avi Kivity
2010-11-14 15:15   ` Arnd Bergmann
2010-11-14 15:34     ` Avi Kivity
2010-11-15  1:45       ` Ian Kent
2010-11-15  8:54         ` Arnd Bergmann
2010-11-15 13:22           ` Ian Kent
2010-11-15 13:27             ` Avi Kivity
2010-11-15 13:38               ` Ian Kent
2010-11-15 13:42                 ` Ian Kent
2010-11-18  3:54           ` Ian Kent
2010-11-25 13:17             ` Arnd Bergmann
2010-11-15  1:31     ` Ian Kent
2010-11-15  9:02       ` Avi Kivity
2010-11-22  8:42         ` Thomas Fjellstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).