All of lore.kernel.org
 help / color / mirror / Atom feed
* 5.0.5 non-expiring mounts
@ 2011-02-08  9:30 Philip Ong Jr.
  2011-02-11  3:10 ` Ian Kent
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Philip Ong Jr. @ 2011-02-08  9:30 UTC (permalink / raw)
  To: autofs

I've pulled down the 5.0.5 tree and all patches.

I've set the logging level to debug and can see messages of expiring 
mounts in /var/log/messages...but when I check /etc/mtab or /proc/mounts 
or df, i can still see them there. Any ideas if this is a known issue or 
if I can give more info than below?


Linux sc-sim-208-144 2.6.29.4 #8 SMP Tue May 25 14:00:33 PDT 2010 x86_64 
x86_64 x86_64 GNU/Linux
# automount -V

Linux automount version 5.0.5

Directories:
         config dir:     /etc/sysconfig
         maps dir:       /etc
         modules dir:    /usr/lib/autofs

Compile options:
   DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS LIBXML2_WORKAROUND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-08  9:30 5.0.5 non-expiring mounts Philip Ong Jr.
@ 2011-02-11  3:10 ` Ian Kent
  2011-02-15  1:10 ` Mike Marion
  2011-03-24 22:03 ` Leonardo Chiquitto
  2 siblings, 0 replies; 19+ messages in thread
From: Ian Kent @ 2011-02-11  3:10 UTC (permalink / raw)
  To: Philip Ong Jr.; +Cc: autofs

On Tue, 2011-02-08 at 01:30 -0800, Philip Ong Jr. wrote:
> I've pulled down the 5.0.5 tree and all patches.
> 
> I've set the logging level to debug and can see messages of expiring 
> mounts in /var/log/messages...but when I check /etc/mtab or /proc/mounts 
> or df, i can still see them there. Any ideas if this is a known issue or 
> if I can give more info than below?

You will need to post a full debug log somewhere.
Make sure that syslog is capturing daemon.* to the syslog so that you
get all the debug log messages.

> 
> 
> Linux sc-sim-208-144 2.6.29.4 #8 SMP Tue May 25 14:00:33 PDT 2010 x86_64 
> x86_64 x86_64 GNU/Linux
> # automount -V
> 
> Linux automount version 5.0.5
> 
> Directories:
>          config dir:     /etc/sysconfig
>          maps dir:       /etc
>          modules dir:    /usr/lib/autofs
> 
> Compile options:
>    DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS LIBXML2_WORKAROUND
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-08  9:30 5.0.5 non-expiring mounts Philip Ong Jr.
  2011-02-11  3:10 ` Ian Kent
@ 2011-02-15  1:10 ` Mike Marion
  2011-02-15  3:11   ` Ian Kent
  2011-03-24 22:03 ` Leonardo Chiquitto
  2 siblings, 1 reply; 19+ messages in thread
From: Mike Marion @ 2011-02-15  1:10 UTC (permalink / raw)
  To: autofs

On Tue, Feb 08, 2011 at 01:30:59AM -0800, Philip Ong Jr. wrote:

> I've set the logging level to debug and can see messages of expiring 
> mounts in /var/log/messages...but when I check /etc/mtab or /proc/mounts 
> or df, i can still see them there. Any ideas if this is a known issue or 
> if I can give more info than below?

Out of curiosity... are these direct or indirect mounts?  We're seeing an
issue where our huge maps served from LDAP are seeing indirects not umounting
while logging expires just fine, but the direct maps are working great.  I
believe a co-worker has replicated this exact same behavior with local files
based maps too though, so pretty sure it's not really related to LDAP at all,
but we've found using LDAP helped clear out a lot of old issues we had a few
years ago related to the hashing bits.

I've even done network sniffing while doing a kill -USR1 and the automount
daemon opens new sockets for each indirect (our homedirs) path it wants to
expire, but there never is any actual network traffic sent across the wire.
Even better, the sockets stay stuck in ESTABLISHED for a long time.  Some more
USR1/expires and/or HUPs can help clear is, or I just run a script that does
an fuser then umount on unused homedirs and that flushes the ports out.  When
the ports pile up to >300 or so, you start getting NFS failures with syslog
showing "couldn't read superblock."

Oh, and when I say huge maps.. I mean it.  We're >10k direct mounts and I don't
know how many indirect mounts, but it's even more then that.

-- 
Mike Marion-Unix SysAdmin/Staff IT Engineer-http://www.qualcomm.com
Antonio: "It's no fair.  Some men drink deep from the fountain of life, while
Antonio takes one sip and it goes down the wrong pipe!"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15  1:10 ` Mike Marion
@ 2011-02-15  3:11   ` Ian Kent
  2011-02-15  3:37     ` Ian Kent
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Kent @ 2011-02-15  3:11 UTC (permalink / raw)
  To: Mike Marion; +Cc: autofs

On Mon, 2011-02-14 at 17:10 -0800, Mike Marion wrote:
> On Tue, Feb 08, 2011 at 01:30:59AM -0800, Philip Ong Jr. wrote:
> 
> > I've set the logging level to debug and can see messages of expiring 
> > mounts in /var/log/messages...but when I check /etc/mtab or /proc/mounts 
> > or df, i can still see them there. Any ideas if this is a known issue or 
> > if I can give more info than below?
> 
> Out of curiosity... are these direct or indirect mounts?  We're seeing an
> issue where our huge maps served from LDAP are seeing indirects not umounting
> while logging expires just fine, but the direct maps are working great.  I
> believe a co-worker has replicated this exact same behavior with local files
> based maps too though, so pretty sure it's not really related to LDAP at all,
> but we've found using LDAP helped clear out a lot of old issues we had a few
> years ago related to the hashing bits.
> 
> I've even done network sniffing while doing a kill -USR1 and the automount
> daemon opens new sockets for each indirect (our homedirs) path it wants to
> expire, but there never is any actual network traffic sent across the wire.
> Even better, the sockets stay stuck in ESTABLISHED for a long time.  Some more
> USR1/expires and/or HUPs can help clear is, or I just run a script that does
> an fuser then umount on unused homedirs and that flushes the ports out.  When
> the ports pile up to >300 or so, you start getting NFS failures with syslog
> showing "couldn't read superblock."
> 
> Oh, and when I say huge maps.. I mean it.  We're >10k direct mounts and I don't
> know how many indirect mounts, but it's even more then that.

What kernel and source?
Ian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15  3:11   ` Ian Kent
@ 2011-02-15  3:37     ` Ian Kent
  2011-02-15  5:28       ` Mike Marion
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Kent @ 2011-02-15  3:37 UTC (permalink / raw)
  To: Mike Marion; +Cc: autofs

On Tue, 2011-02-15 at 11:11 +0800, Ian Kent wrote:
> On Mon, 2011-02-14 at 17:10 -0800, Mike Marion wrote:
> > On Tue, Feb 08, 2011 at 01:30:59AM -0800, Philip Ong Jr. wrote:
> > 
> > > I've set the logging level to debug and can see messages of expiring 
> > > mounts in /var/log/messages...but when I check /etc/mtab or /proc/mounts 
> > > or df, i can still see them there. Any ideas if this is a known issue or 
> > > if I can give more info than below?
> > 
> > Out of curiosity... are these direct or indirect mounts?  We're seeing an
> > issue where our huge maps served from LDAP are seeing indirects not umounting
> > while logging expires just fine, but the direct maps are working great.  I
> > believe a co-worker has replicated this exact same behavior with local files
> > based maps too though, so pretty sure it's not really related to LDAP at all,
> > but we've found using LDAP helped clear out a lot of old issues we had a few
> > years ago related to the hashing bits.
> > 
> > I've even done network sniffing while doing a kill -USR1 and the automount
> > daemon opens new sockets for each indirect (our homedirs) path it wants to
> > expire, but there never is any actual network traffic sent across the wire.
> > Even better, the sockets stay stuck in ESTABLISHED for a long time.  Some more
> > USR1/expires and/or HUPs can help clear is, or I just run a script that does
> > an fuser then umount on unused homedirs and that flushes the ports out.  When
> > the ports pile up to >300 or so, you start getting NFS failures with syslog
> > showing "couldn't read superblock."
> > 
> > Oh, and when I say huge maps.. I mean it.  We're >10k direct mounts and I don't
> > know how many indirect mounts, but it's even more then that.
> 
> What kernel and source?

That is kernel revision and autofs revision?

> Ian
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15  3:37     ` Ian Kent
@ 2011-02-15  5:28       ` Mike Marion
  2011-02-15 12:28         ` Ian Kent
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Marion @ 2011-02-15  5:28 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:

> That is kernel revision and autofs revision?

2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
get exacts because it's a PTF from Novell (we really pushed them to upgrade to
5.0.5) but it should be pretty much equal to the patch they just released for
sle 11 sp1 that they're recommending as they default going forward.

-- 
Mike Marion-Unix SysAdmin/Staff IT Engineer-http://www.qualcomm.com
Vir: "Londo, are you deliberatly trying to drive me insane?"
Londo: "The Universe is already mad.  Anything else would be redundant."
==> Babylon 5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15  5:28       ` Mike Marion
@ 2011-02-15 12:28         ` Ian Kent
  2011-02-15 18:11           ` Leonardo Chiquitto
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Kent @ 2011-02-15 12:28 UTC (permalink / raw)
  To: Mike Marion; +Cc: autofs

On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
> 
> > That is kernel revision and autofs revision?
> 
> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
> 5.0.5) but it should be pretty much equal to the patch they just released for
> sle 11 sp1 that they're recommending as they default going forward.

That make it hard, as you know.

But I wouldn't mind spending a bit of time on it, if you can also.

Let's assume that it's a user space problem for now.
Could you post a debug log somewhere and the current sles patched source
somewhere. If you could spend a little time trying to compare the
patches in the sles package to the ones in the Rawhide autofs repo that
would be a big help too.

Ian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15 12:28         ` Ian Kent
@ 2011-02-15 18:11           ` Leonardo Chiquitto
  2011-02-16  7:08             ` Ian Kent
  0 siblings, 1 reply; 19+ messages in thread
From: Leonardo Chiquitto @ 2011-02-15 18:11 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Mike Marion

On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
> On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
>> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
>>
>> > That is kernel revision and autofs revision?
>>
>> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
>> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
>> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
>> 5.0.5) but it should be pretty much equal to the patch they just released for
>> sle 11 sp1 that they're recommending as they default going forward.
>
> That make it hard, as you know.
>
> But I wouldn't mind spending a bit of time on it, if you can also.
>
> Let's assume that it's a user space problem for now.

Here are the call traces for all automount processes on the kernel side:

PID: 4017   TASK: ffff8104202a9080  CPU: 4   COMMAND: "automount"
 #0 [ffff81040426fc58] schedule at ffffffff802efc13
 #1 [ffff81040426fd30] schedule_timeout at ffffffff802f0748
 #2 [ffff81040426fd80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff81040426fe60] do_futex at ffffffff8014bb33
 #4 [ffff81040426ff20] sys_futex at ffffffff8014c2ed
 #5 [ffff81040426ff80] system_call at ffffffff8010ae16

PID: 4018   TASK: ffff81041cefd810  CPU: 2   COMMAND: "automount"
 #0 [ffff810404211d68] schedule at ffffffff802efc13
 #1 [ffff810404211e40] schedule_timeout at ffffffff802f07b4
 #2 [ffff810404211e90] futex_wait_abstime at ffffffff8014b703
 #3 [ffff810404211f70] futex_wait_restart at ffffffff8014b7d2
 #4 [ffff810404211f80] system_call at ffffffff8010ae16

PID: 4019   TASK: ffff810420ab8080  CPU: 5   COMMAND: "automount"
 #0 [ffff810404213c58] schedule at ffffffff802efc13
 #1 [ffff810404213d30] schedule_timeout at ffffffff802f0748
 #2 [ffff810404213d80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff810404213e60] do_futex at ffffffff8014bb33
 #4 [ffff810404213f20] sys_futex at ffffffff8014c2ed
 #5 [ffff810404213f80] system_call at ffffffff8010ae16

PID: 4022   TASK: ffff81042055b7d0  CPU: 5   COMMAND: "automount"
 #0 [ffff810404c71c58] schedule at ffffffff802efc13
 #1 [ffff810404c71d30] schedule_timeout at ffffffff802f0748
 #2 [ffff810404c71d80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff810404c71e60] do_futex at ffffffff8014bb33
 #4 [ffff810404c71f20] sys_futex at ffffffff8014c2ed
 #5 [ffff810404c71f80] system_call at ffffffff8010ae16

PID: 4025   TASK: ffff81041cefd080  CPU: 2   COMMAND: "automount"
 #0 [ffff8104001d7da8] schedule at ffffffff802efc13
 #1 [ffff8104001d7e80] schedule_timeout at ffffffff802f07b4
 #2 [ffff8104001d7ed0] do_sys_poll at ffffffff80199526
 #3 [ffff8104001d7f60] sys_poll at ffffffff80199640
 #4 [ffff8104001d7f80] system_call at ffffffff8010ae16

PID: 4026   TASK: ffff81041e1bd080  CPU: 7   COMMAND: "automount"
 #0 [ffff8104001f7c58] schedule at ffffffff802efc13
 #1 [ffff8104001f7d30] schedule_timeout at ffffffff802f0748
 #2 [ffff8104001f7d80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff8104001f7e60] do_futex at ffffffff8014bb33
 #4 [ffff8104001f7f20] sys_futex at ffffffff8014c2ed
 #5 [ffff8104001f7f80] system_call at ffffffff8010ae16

PID: 6851   TASK: ffff81013c9187d0  CPU: 6   COMMAND: "automount"
 #0 [ffff810304f97c58] schedule at ffffffff802efc13
 #1 [ffff810304f97d30] schedule_timeout at ffffffff802f0748
 #2 [ffff810304f97d80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff810304f97e60] do_futex at ffffffff8014bb33
 #4 [ffff810304f97f20] sys_futex at ffffffff8014c2ed
 #5 [ffff810304f97f80] system_call at ffffffff8010ae16

PID: 7060   TASK: ffff810188ef5080  CPU: 4   COMMAND: "automount"
 #0 [ffff810319ebdc58] schedule at ffffffff802efc13
 #1 [ffff810319ebdd30] schedule_timeout at ffffffff802f0748
 #2 [ffff810319ebdd80] futex_wait_abstime at ffffffff8014b703
 #3 [ffff810319ebde60] do_futex at ffffffff8014bb33
 #4 [ffff810319ebdf20] sys_futex at ffffffff8014c2ed
 #5 [ffff810319ebdf80] system_call at ffffffff8010ae16

PID: 20702  TASK: ffff810230be5080  CPU: 7   COMMAND: "automount"
 #0 [ffff81026c80dc18] schedule at ffffffff802efc13
 #1 [ffff81026c80dcf0] autofs4_wait at ffffffff8831e5b2 [autofs4]
 #2 [ffff81026c80deb0] autofs4_expire_multi at ffffffff8831ed06 [autofs4]
 #3 [ffff81026c80dee0] do_ioctl at ffffffff80198acd
 #4 [ffff81026c80df00] vfs_ioctl at ffffffff80198d2b
 #5 [ffff81026c80df40] sys_ioctl at ffffffff80198da5
 #6 [ffff81026c80df80] system_call at ffffffff8010ae16

They look regular to me, do you see something suspect?

And here are the call traces from the user land daemon:

Thread 9 (Thread 4017):
#0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
#2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00005555555746cd in master_mutex_lock () at master.c:49
#4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
    at automount.c:1276
#5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
    at automount.c:1354
#6  main (arg=<value optimized out>) at automount.c:2142

Thread 8 (Thread 20702):
#0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
#1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
    ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
    at dev-ioctl-lib.c:657
#2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
    path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
#3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
    at indirect.c:545
#4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 7 (Thread 7060):
#0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
    at master.c:527
#2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
    type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
    at master.c:191
#3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
    default_timeout=86400, logging=<value optimized out>, age=1296059657)
    at master_parse.y:823
#4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
    age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
#5  0x0000555555569052 in do_read_master (master=0x55555568d010,
    type=<value optimized out>, age=1296059657) at lookup.c:96
#6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
    age=1296059657) at lookup.c:229
#7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
    age=1296059657, readall=1) at master.c:831
#8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
    at automount.c:1259
#9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 6 (Thread 6851):
#0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
#2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
#3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 5 (Thread 4026):
#0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
#2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00005555555746cd in master_mutex_lock () at master.c:49
#4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
    pkt=0x41823ec0) at indirect.c:808
#5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
    at automount.c:1026
#6  handle_mounts (ap=<value optimized out>) at automount.c:1551
#7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()

Thread 4 (Thread 4025):
#0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
#1  0x00007fff37314e80 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 3 (Thread 4022):
#0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
    at master.c:527
#2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
    pkt=0x40821ec0) at direct.c:1296
#3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
    at automount.c:1029
#4  handle_mounts (ap=<value optimized out>) at automount.c:1551
#5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()

Thread 2 (Thread 4019):
#0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
    at state.c:1104
#2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 4018):
#0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
    at alarm.c:206
#2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

> Could you post a debug log somewhere and the current sles patched source
> somewhere.

Unfortunately I don't have the full debug logs yet, so I'll leave this to
Mike.

> If you could spend a little time trying to compare the
> patches in the sles package to the ones in the Rawhide autofs repo that
> would be a big help too.

The package includes all upstream patches released so far (ie, up to
autofs-5.0.5-fix-submount-shutdown-wait.patch), minus the following ones:

autofs-5.0.4 - use intr option as hosts mount default (configuration change)
autofs-5.0.5 - add dump maps option (recent feature)
autofs-5.0.5 - add external bind method (recent feature)
autofs-5.0.5 - fix ext4 fsck at mount (no ext4 on SLES 10)

Thanks,
Leonardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-15 18:11           ` Leonardo Chiquitto
@ 2011-02-16  7:08             ` Ian Kent
  2011-02-16 12:18               ` Ian Kent
                                 ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Ian Kent @ 2011-02-16  7:08 UTC (permalink / raw)
  To: Leonardo Chiquitto; +Cc: autofs, Mike Marion

On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
> On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
> > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
> >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
> >>
> >> > That is kernel revision and autofs revision?
> >>
> >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
> >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
> >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
> >> 5.0.5) but it should be pretty much equal to the patch they just released for
> >> sle 11 sp1 that they're recommending as they default going forward.
> >
> > That make it hard, as you know.
> >
> > But I wouldn't mind spending a bit of time on it, if you can also.
> >
> > Let's assume that it's a user space problem for now.
> 
> Here are the call traces for all automount processes on the kernel side:

I think it's a user space problem.

snip ...

> And here are the call traces from the user land daemon:
> 
> Thread 9 (Thread 4017):
> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> #4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
>     at automount.c:1276
> #5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
>     at automount.c:1354
> #6  main (arg=<value optimized out>) at automount.c:2142
> 
> Thread 8 (Thread 20702):
> #0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
> #1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
>     ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
>     at dev-ioctl-lib.c:657
> #2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
>     path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
> #3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
>     at indirect.c:545
> #4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #6  0x0000000000000000 in ?? ()
> 
> Thread 7 (Thread 7060):
> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>     at master.c:527
> #2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
>     type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
>     at master.c:191
> #3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
>     default_timeout=86400, logging=<value optimized out>, age=1296059657)
>     at master_parse.y:823
> #4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
>     age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
> #5  0x0000555555569052 in do_read_master (master=0x55555568d010,
>     type=<value optimized out>, age=1296059657) at lookup.c:96
> #6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
>     age=1296059657) at lookup.c:229
> #7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
>     age=1296059657, readall=1) at master.c:831
> #8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
>     at automount.c:1259
> #9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #11 0x0000000000000000 in ?? ()
> 
> Thread 6 (Thread 6851):
> #0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
> #1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
> #2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
> #3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #5  0x0000000000000000 in ?? ()
> 
> Thread 5 (Thread 4026):
> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> #4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
>     pkt=0x41823ec0) at indirect.c:808
> #5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
>     at automount.c:1026
> #6  handle_mounts (ap=<value optimized out>) at automount.c:1551
> #7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #9  0x0000000000000000 in ?? ()
> 
> Thread 4 (Thread 4025):
> #0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
> #1  0x00007fff37314e80 in ?? ()
> #2  0x0000000000000000 in ?? ()
> 
> Thread 3 (Thread 4022):
> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>     at master.c:527
> #2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
>     pkt=0x40821ec0) at direct.c:1296
> #3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
>     at automount.c:1029
> #4  handle_mounts (ap=<value optimized out>) at automount.c:1551
> #5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #7  0x0000000000000000 in ?? ()
> 
> Thread 2 (Thread 4019):
> #0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
>    from /lib64/libpthread.so.0
> #1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
>     at state.c:1104
> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #4  0x0000000000000000 in ?? ()
> 
> Thread 1 (Thread 4018):
> #0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>    from /lib64/libpthread.so.0
> #1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
>     at alarm.c:206
> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> #4  0x0000000000000000 in ?? ()
> 

It looks like a lock ordering bug to me and I think it's the same hang
that Steve Thompson reported.

This will be hard to test I think, hopefully Mike or Steve will be able
to use a couple of machines for testing, even so it may take a while
before we know.

Could you prepare a test package for Mike using this patch please.
It should apply OK but of course you'll need to discard or fix the
CHANGELOG hunk.

I'll prepare a test package for Steve soon as I get time.

autofs-5.0.5 - fix out of order locking in readmap

From: Ian Kent <raven@themaw.net>

When re-reading the master map and a mount lookup for a particular map
entry arrives at the same time a map source is being added to the map
entry and a map entry re-read is also initiated at the same time an
out of order mutex locking bug which results in a hang.
---

 CHANGELOG    |    1 +
 lib/master.c |    8 ++------
 2 files changed, 3 insertions(+), 6 deletions(-)


diff --git a/CHANGELOG b/CHANGELOG
index 11054da..631094f 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -55,6 +55,7 @@
 - add external bind method.
 - fix add simple bind auth.
 - add option to dump configured automount maps.
+- fix out of order locking in readmap.
 
 03/09/2009 autofs-5.0.5
 -----------------------
diff --git a/lib/master.c b/lib/master.c
index 95bd3fb..daedf66 100644
--- a/lib/master.c
+++ b/lib/master.c
@@ -611,8 +611,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
 {
 	struct list_head *head, *p;
 
-	master_mutex_lock();
-
 	head = &master->mounts;
 	list_for_each(p, head) {
 		struct master_mapent *entry;
@@ -625,8 +623,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
 		}
 	}
 
-	master_mutex_unlock();
-
 	return NULL;
 }
 
@@ -703,9 +699,7 @@ struct master_mapent *master_new_mapent(struct master *master, const char *path,
 
 void master_add_mapent(struct master *master, struct master_mapent *entry)
 {
-	master_mutex_lock();
 	list_add_tail(&entry->list, &master->mounts);
-	master_mutex_unlock();
 	return;
 }
 
@@ -813,6 +807,7 @@ int master_read_master(struct master *master, time_t age, int readall)
 	 * We need to clear and re-populate the null map entry cache
 	 * before alowing anyone else to use it.
 	 */
+	master_mutex_lock();
 	if (master->nc) {
 		cache_writelock(master->nc);
 		nc = master->nc;
@@ -831,6 +826,7 @@ int master_read_master(struct master *master, time_t age, int readall)
 	master_init_scan();
 	lookup_nss_read_master(master, age);
 	cache_unlock(nc);
+	master_mutex_unlock();
 
 	if (!master->read_fail)
 		master_mount_mounts(master, age, readall);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-16  7:08             ` Ian Kent
@ 2011-02-16 12:18               ` Ian Kent
  2011-02-23 20:22                 ` Leonardo Chiquitto
  2011-02-16 12:58               ` Leonardo Chiquitto
  2011-03-04 20:10               ` Leonardo Chiquitto
  2 siblings, 1 reply; 19+ messages in thread
From: Ian Kent @ 2011-02-16 12:18 UTC (permalink / raw)
  To: Leonardo Chiquitto; +Cc: autofs, Mike Marion

On Wed, 2011-02-16 at 15:08 +0800, Ian Kent wrote:
> On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
> > On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
> > > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
> > >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
> > >>
> > >> > That is kernel revision and autofs revision?
> > >>
> > >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
> > >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
> > >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
> > >> 5.0.5) but it should be pretty much equal to the patch they just released for
> > >> sle 11 sp1 that they're recommending as they default going forward.

Now I'm confused?

I thought that Mike had mention he had seen hangs, similar to Steve, and
this backtrace was an example of that. But the mail thread doesn't read
like that and oddly enough I seem to have identified a locking problem
from looking at the code based on the backtrace.

A backtrace generally doesn't do us any good when were trying to find an
expire problem, the debug log is where we have to start on these.

> > >
> > > That make it hard, as you know.
> > >
> > > But I wouldn't mind spending a bit of time on it, if you can also.
> > >
> > > Let's assume that it's a user space problem for now.
> > 
> > Here are the call traces for all automount processes on the kernel side:
> 
> I think it's a user space problem.
> 
> snip ...
> 
> > And here are the call traces from the user land daemon:
> > 
> > Thread 9 (Thread 4017):
> > #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> > #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> > #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> > #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> > #4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
> >     at automount.c:1276
> > #5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
> >     at automount.c:1354
> > #6  main (arg=<value optimized out>) at automount.c:2142
> > 
> > Thread 8 (Thread 20702):
> > #0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
> > #1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
> >     ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
> >     at dev-ioctl-lib.c:657
> > #2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
> >     path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
> > #3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
> >     at indirect.c:545
> > #4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #6  0x0000000000000000 in ?? ()
> > 
> > Thread 7 (Thread 7060):
> > #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> > #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
> >     at master.c:527
> > #2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
> >     type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
> >     at master.c:191
> > #3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
> >     default_timeout=86400, logging=<value optimized out>, age=1296059657)
> >     at master_parse.y:823
> > #4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
> >     age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
> > #5  0x0000555555569052 in do_read_master (master=0x55555568d010,
> >     type=<value optimized out>, age=1296059657) at lookup.c:96
> > #6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
> >     age=1296059657) at lookup.c:229
> > #7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
> >     age=1296059657, readall=1) at master.c:831
> > #8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
> >     at automount.c:1259
> > #9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #11 0x0000000000000000 in ?? ()
> > 
> > Thread 6 (Thread 6851):
> > #0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
> > #1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
> > #2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
> > #3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #5  0x0000000000000000 in ?? ()
> > 
> > Thread 5 (Thread 4026):
> > #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> > #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> > #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> > #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> > #4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
> >     pkt=0x41823ec0) at indirect.c:808
> > #5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
> >     at automount.c:1026
> > #6  handle_mounts (ap=<value optimized out>) at automount.c:1551
> > #7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #9  0x0000000000000000 in ?? ()
> > 
> > Thread 4 (Thread 4025):
> > #0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
> > #1  0x00007fff37314e80 in ?? ()
> > #2  0x0000000000000000 in ?? ()
> > 
> > Thread 3 (Thread 4022):
> > #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> > #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
> >     at master.c:527
> > #2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
> >     pkt=0x40821ec0) at direct.c:1296
> > #3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
> >     at automount.c:1029
> > #4  handle_mounts (ap=<value optimized out>) at automount.c:1551
> > #5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #7  0x0000000000000000 in ?? ()
> > 
> > Thread 2 (Thread 4019):
> > #0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
> >    from /lib64/libpthread.so.0
> > #1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
> >     at state.c:1104
> > #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #4  0x0000000000000000 in ?? ()
> > 
> > Thread 1 (Thread 4018):
> > #0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> >    from /lib64/libpthread.so.0
> > #1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
> >     at alarm.c:206
> > #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> > #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> > #4  0x0000000000000000 in ?? ()
> > 
> 
> It looks like a lock ordering bug to me and I think it's the same hang
> that Steve Thompson reported.
> 
> This will be hard to test I think, hopefully Mike or Steve will be able
> to use a couple of machines for testing, even so it may take a while
> before we know.
> 
> Could you prepare a test package for Mike using this patch please.
> It should apply OK but of course you'll need to discard or fix the
> CHANGELOG hunk.
> 
> I'll prepare a test package for Steve soon as I get time.
> 
> autofs-5.0.5 - fix out of order locking in readmap
> 
> From: Ian Kent <raven@themaw.net>
> 
> When re-reading the master map and a mount lookup for a particular map
> entry arrives at the same time a map source is being added to the map
> entry and a map entry re-read is also initiated at the same time an
> out of order mutex locking bug which results in a hang.
> ---
> 
>  CHANGELOG    |    1 +
>  lib/master.c |    8 ++------
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/CHANGELOG b/CHANGELOG
> index 11054da..631094f 100644
> --- a/CHANGELOG
> +++ b/CHANGELOG
> @@ -55,6 +55,7 @@
>  - add external bind method.
>  - fix add simple bind auth.
>  - add option to dump configured automount maps.
> +- fix out of order locking in readmap.
>  
>  03/09/2009 autofs-5.0.5
>  -----------------------
> diff --git a/lib/master.c b/lib/master.c
> index 95bd3fb..daedf66 100644
> --- a/lib/master.c
> +++ b/lib/master.c
> @@ -611,8 +611,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
>  {
>  	struct list_head *head, *p;
>  
> -	master_mutex_lock();
> -
>  	head = &master->mounts;
>  	list_for_each(p, head) {
>  		struct master_mapent *entry;
> @@ -625,8 +623,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
>  		}
>  	}
>  
> -	master_mutex_unlock();
> -
>  	return NULL;
>  }
>  
> @@ -703,9 +699,7 @@ struct master_mapent *master_new_mapent(struct master *master, const char *path,
>  
>  void master_add_mapent(struct master *master, struct master_mapent *entry)
>  {
> -	master_mutex_lock();
>  	list_add_tail(&entry->list, &master->mounts);
> -	master_mutex_unlock();
>  	return;
>  }
>  
> @@ -813,6 +807,7 @@ int master_read_master(struct master *master, time_t age, int readall)
>  	 * We need to clear and re-populate the null map entry cache
>  	 * before alowing anyone else to use it.
>  	 */
> +	master_mutex_lock();
>  	if (master->nc) {
>  		cache_writelock(master->nc);
>  		nc = master->nc;
> @@ -831,6 +826,7 @@ int master_read_master(struct master *master, time_t age, int readall)
>  	master_init_scan();
>  	lookup_nss_read_master(master, age);
>  	cache_unlock(nc);
> +	master_mutex_unlock();
>  
>  	if (!master->read_fail)
>  		master_mount_mounts(master, age, readall);
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-16  7:08             ` Ian Kent
  2011-02-16 12:18               ` Ian Kent
@ 2011-02-16 12:58               ` Leonardo Chiquitto
  2011-03-04 20:10               ` Leonardo Chiquitto
  2 siblings, 0 replies; 19+ messages in thread
From: Leonardo Chiquitto @ 2011-02-16 12:58 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Mike Marion

On Wed, Feb 16, 2011 at 5:08 AM, Ian Kent <raven@themaw.net> wrote:
> On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
>> On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
>> > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
>> >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
>> >>
>> >> > That is kernel revision and autofs revision?
>> >>
>> >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
>> >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
>> >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
>> >> 5.0.5) but it should be pretty much equal to the patch they just released for
>> >> sle 11 sp1 that they're recommending as they default going forward.
>> >
>> > That make it hard, as you know.
>> >
>> > But I wouldn't mind spending a bit of time on it, if you can also.
>> >
>> > Let's assume that it's a user space problem for now.
>>
>> Here are the call traces for all automount processes on the kernel side:
>
> I think it's a user space problem.
>
> snip ...
>
>> And here are the call traces from the user land daemon:
>>
>> Thread 9 (Thread 4017):
>> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
>> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
>> #4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
>>     at automount.c:1276
>> #5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
>>     at automount.c:1354
>> #6  main (arg=<value optimized out>) at automount.c:2142
>>
>> Thread 8 (Thread 20702):
>> #0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
>> #1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
>>     ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
>>     at dev-ioctl-lib.c:657
>> #2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
>>     path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
>> #3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
>>     at indirect.c:545
>> #4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 7 (Thread 7060):
>> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
>> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>>     at master.c:527
>> #2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
>>     type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
>>     at master.c:191
>> #3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
>>     default_timeout=86400, logging=<value optimized out>, age=1296059657)
>>     at master_parse.y:823
>> #4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
>>     age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
>> #5  0x0000555555569052 in do_read_master (master=0x55555568d010,
>>     type=<value optimized out>, age=1296059657) at lookup.c:96
>> #6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
>>     age=1296059657) at lookup.c:229
>> #7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
>>     age=1296059657, readall=1) at master.c:831
>> #8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
>>     at automount.c:1259
>> #9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #11 0x0000000000000000 in ?? ()
>>
>> Thread 6 (Thread 6851):
>> #0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
>> #1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
>> #2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
>> #3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 5 (Thread 4026):
>> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
>> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
>> #4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
>>     pkt=0x41823ec0) at indirect.c:808
>> #5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
>>     at automount.c:1026
>> #6  handle_mounts (ap=<value optimized out>) at automount.c:1551
>> #7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #9  0x0000000000000000 in ?? ()
>>
>> Thread 4 (Thread 4025):
>> #0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
>> #1  0x00007fff37314e80 in ?? ()
>> #2  0x0000000000000000 in ?? ()
>>
>> Thread 3 (Thread 4022):
>> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
>> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>>     at master.c:527
>> #2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
>>     pkt=0x40821ec0) at direct.c:1296
>> #3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
>>     at automount.c:1029
>> #4  handle_mounts (ap=<value optimized out>) at automount.c:1551
>> #5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 2 (Thread 4019):
>> #0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>> #1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
>>     at state.c:1104
>> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 1 (Thread 4018):
>> #0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>> #1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
>>     at alarm.c:206
>> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>
> It looks like a lock ordering bug to me and I think it's the same hang
> that Steve Thompson reported.
>
> This will be hard to test I think, hopefully Mike or Steve will be able
> to use a couple of machines for testing, even so it may take a while
> before we know.
>
> Could you prepare a test package for Mike using this patch please.
> It should apply OK but of course you'll need to discard or fix the
> CHANGELOG hunk.

Ian, thanks for the patch. I built the test package and forwarded it
to Mike.

> I'll prepare a test package for Steve soon as I get time.

I took the liberty of building a test package for CentOS and RHEL too.
I used autofs-5.0.1-0.rc2.143.el5_5.6.src.rpm as base and added
your patch. The packages are now available at:

http://download.opensuse.org/repositories/home:/leonardocf:/autofs/

Unfortunately I failed to persuade the Build Service to keep the package
release number, so Steve might need to pass --oldpackage to rpm.

Thanks,
Leonardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-16 12:18               ` Ian Kent
@ 2011-02-23 20:22                 ` Leonardo Chiquitto
  0 siblings, 0 replies; 19+ messages in thread
From: Leonardo Chiquitto @ 2011-02-23 20:22 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Mike Marion

On Wed, Feb 16, 2011 at 10:18 AM, Ian Kent <raven@themaw.net> wrote:
> On Wed, 2011-02-16 at 15:08 +0800, Ian Kent wrote:
>> On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
>> > On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
>> > > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
>> > >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
>> > >>
>> > >> > That is kernel revision and autofs revision?
>> > >>
>> > >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
>> > >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
>> > >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
>> > >> 5.0.5) but it should be pretty much equal to the patch they just released for
>> > >> sle 11 sp1 that they're recommending as they default going forward.
>
> Now I'm confused?
>
> I thought that Mike had mention he had seen hangs, similar to Steve, and
> this backtrace was an example of that.

This is right. The backtrace I posted was sent to me as an example of a hang.

> But the mail thread doesn't read
> like that and oddly enough I seem to have identified a locking problem
> from looking at the code based on the backtrace.

I'm sorry that we ended up hijacking the "non-expiring mounts" thread.

Leonardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-16  7:08             ` Ian Kent
  2011-02-16 12:18               ` Ian Kent
  2011-02-16 12:58               ` Leonardo Chiquitto
@ 2011-03-04 20:10               ` Leonardo Chiquitto
  2011-03-05  5:18                 ` Ian Kent
  2 siblings, 1 reply; 19+ messages in thread
From: Leonardo Chiquitto @ 2011-03-04 20:10 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Mike Marion

On Wed, Feb 16, 2011 at 5:08 AM, Ian Kent <raven@themaw.net> wrote:
> On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
>> On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
>> > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
>> >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
>> >>
>> >> > That is kernel revision and autofs revision?
>> >>
>> >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
>> >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
>> >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
>> >> 5.0.5) but it should be pretty much equal to the patch they just released for
>> >> sle 11 sp1 that they're recommending as they default going forward.
>> >
>> > That make it hard, as you know.
>> >
>> > But I wouldn't mind spending a bit of time on it, if you can also.
>> >
>> > Let's assume that it's a user space problem for now.
>>
>> Here are the call traces for all automount processes on the kernel side:
>
> I think it's a user space problem.
>
> snip ...
>
>> And here are the call traces from the user land daemon:
>>
>> Thread 9 (Thread 4017):
>> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
>> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
>> #4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
>>     at automount.c:1276
>> #5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
>>     at automount.c:1354
>> #6  main (arg=<value optimized out>) at automount.c:2142
>>
>> Thread 8 (Thread 20702):
>> #0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
>> #1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
>>     ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
>>     at dev-ioctl-lib.c:657
>> #2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
>>     path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
>> #3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
>>     at indirect.c:545
>> #4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 7 (Thread 7060):
>> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
>> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>>     at master.c:527
>> #2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
>>     type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
>>     at master.c:191
>> #3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
>>     default_timeout=86400, logging=<value optimized out>, age=1296059657)
>>     at master_parse.y:823
>> #4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
>>     age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
>> #5  0x0000555555569052 in do_read_master (master=0x55555568d010,
>>     type=<value optimized out>, age=1296059657) at lookup.c:96
>> #6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
>>     age=1296059657) at lookup.c:229
>> #7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
>>     age=1296059657, readall=1) at master.c:831
>> #8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
>>     at automount.c:1259
>> #9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #11 0x0000000000000000 in ?? ()
>>
>> Thread 6 (Thread 6851):
>> #0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
>> #1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
>> #2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
>> #3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 5 (Thread 4026):
>> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
>> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
>> #4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
>>     pkt=0x41823ec0) at indirect.c:808
>> #5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
>>     at automount.c:1026
>> #6  handle_mounts (ap=<value optimized out>) at automount.c:1551
>> #7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #9  0x0000000000000000 in ?? ()
>>
>> Thread 4 (Thread 4025):
>> #0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
>> #1  0x00007fff37314e80 in ?? ()
>> #2  0x0000000000000000 in ?? ()
>>
>> Thread 3 (Thread 4022):
>> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
>> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
>>     at master.c:527
>> #2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
>>     pkt=0x40821ec0) at direct.c:1296
>> #3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
>>     at automount.c:1029
>> #4  handle_mounts (ap=<value optimized out>) at automount.c:1551
>> #5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 2 (Thread 4019):
>> #0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>> #1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
>>     at state.c:1104
>> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 1 (Thread 4018):
>> #0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>> #1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
>>     at alarm.c:206
>> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
>> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>
> It looks like a lock ordering bug to me and I think it's the same hang
> that Steve Thompson reported.
>
> This will be hard to test I think, hopefully Mike or Steve will be able
> to use a couple of machines for testing, even so it may take a while
> before we know.
>
> Could you prepare a test package for Mike using this patch please.
> It should apply OK but of course you'll need to discard or fix the
> CHANGELOG hunk.
>
> I'll prepare a test package for Steve soon as I get time.
>
> autofs-5.0.5 - fix out of order locking in readmap
>
> From: Ian Kent <raven@themaw.net>
>
> When re-reading the master map and a mount lookup for a particular map
> entry arrives at the same time a map source is being added to the map
> entry and a map entry re-read is also initiated at the same time an
> out of order mutex locking bug which results in a hang.
> ---
>
>  CHANGELOG    |    1 +
>  lib/master.c |    8 ++------
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
>
> diff --git a/CHANGELOG b/CHANGELOG
> index 11054da..631094f 100644
> --- a/CHANGELOG
> +++ b/CHANGELOG
> @@ -55,6 +55,7 @@
>  - add external bind method.
>  - fix add simple bind auth.
>  - add option to dump configured automount maps.
> +- fix out of order locking in readmap.
>
>  03/09/2009 autofs-5.0.5
>  -----------------------
> diff --git a/lib/master.c b/lib/master.c
> index 95bd3fb..daedf66 100644
> --- a/lib/master.c
> +++ b/lib/master.c
> @@ -611,8 +611,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
>  {
>        struct list_head *head, *p;
>
> -       master_mutex_lock();
> -
>        head = &master->mounts;
>        list_for_each(p, head) {
>                struct master_mapent *entry;
> @@ -625,8 +623,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
>                }
>        }
>
> -       master_mutex_unlock();
> -
>        return NULL;
>  }
>
> @@ -703,9 +699,7 @@ struct master_mapent *master_new_mapent(struct master *master, const char *path,
>
>  void master_add_mapent(struct master *master, struct master_mapent *entry)
>  {
> -       master_mutex_lock();
>        list_add_tail(&entry->list, &master->mounts);
> -       master_mutex_unlock();
>        return;
>  }
>
> @@ -813,6 +807,7 @@ int master_read_master(struct master *master, time_t age, int readall)
>         * We need to clear and re-populate the null map entry cache
>         * before alowing anyone else to use it.
>         */
> +       master_mutex_lock();
>        if (master->nc) {
>                cache_writelock(master->nc);
>                nc = master->nc;
> @@ -831,6 +826,7 @@ int master_read_master(struct master *master, time_t age, int readall)
>        master_init_scan();
>        lookup_nss_read_master(master, age);
>        cache_unlock(nc);
> +       master_mutex_unlock();
>
>        if (!master->read_fail)
>                master_mount_mounts(master, age, readall);

Hello Ian,

I just got a reply confirming that the hangs are no longer
reproducible with this
patch applied. Please, do you think you can consider it for upstream inclusion?

Thanks,
Leonardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-03-04 20:10               ` Leonardo Chiquitto
@ 2011-03-05  5:18                 ` Ian Kent
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Kent @ 2011-03-05  5:18 UTC (permalink / raw)
  To: Leonardo Chiquitto; +Cc: autofs, Mike Marion

On Fri, 2011-03-04 at 17:10 -0300, Leonardo Chiquitto wrote:
> On Wed, Feb 16, 2011 at 5:08 AM, Ian Kent <raven@themaw.net> wrote:
> > On Tue, 2011-02-15 at 16:11 -0200, Leonardo Chiquitto wrote:
> >> On Tue, Feb 15, 2011 at 10:28 AM, Ian Kent <raven@themaw.net> wrote:
> >> > On Mon, 2011-02-14 at 21:28 -0800, Mike Marion wrote:
> >> >> On Mon, Feb 14, 2011 at 07:37:01PM -0800, Ian Kent wrote:
> >> >>
> >> >> > That is kernel revision and autofs revision?
> >> >>
> >> >> 2.6.16.60-0.59.1 (Sles10 sp3 with an updated, but not bleeding edge, patch).
> >> >> autofs 5.0.5 with most of the patches up to a couple months ago.  It's hard to
> >> >> get exacts because it's a PTF from Novell (we really pushed them to upgrade to
> >> >> 5.0.5) but it should be pretty much equal to the patch they just released for
> >> >> sle 11 sp1 that they're recommending as they default going forward.
> >> >
> >> > That make it hard, as you know.
> >> >
> >> > But I wouldn't mind spending a bit of time on it, if you can also.
> >> >
> >> > Let's assume that it's a user space problem for now.
> >>
> >> Here are the call traces for all automount processes on the kernel side:
> >
> > I think it's a user space problem.
> >
> > snip ...
> >
> >> And here are the call traces from the user land daemon:
> >>
> >> Thread 9 (Thread 4017):
> >> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> >> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> >> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> >> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> >> #4  0x000055555555d260 in do_hup_signal (master=0x55555568d010, age=1296063258)
> >>     at automount.c:1276
> >> #5  0x0000555555560bd3 in statemachine (arg=<value optimized out>)
> >>     at automount.c:1354
> >> #6  main (arg=<value optimized out>) at automount.c:2142
> >>
> >> Thread 8 (Thread 20702):
> >> #0  0x00002b56e4dd62a7 in brk () from /lib64/libc.so.6
> >> #1  0x0000555555577dfe in expire (logopt=2, cmd=<value optimized out>, fd=21,
> >>     ioctlfd=21, path=0x55555569ca20 "/usr2", arg=0x41c27ef4)
> >>     at dev-ioctl-lib.c:657
> >> #2  0x0000555555577ebe in ioctl_expire (logopt=21, ioctlfd=-1,
> >>     path=0x55555569ca20 "/usr2", when=0) at dev-ioctl-lib.c:701
> >> #3  0x0000555555561e4e in expire_proc_indirect (arg=<value optimized out>)
> >>     at indirect.c:545
> >> #4  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #5  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #6  0x0000000000000000 in ?? ()
> >>
> >> Thread 7 (Thread 7060):
> >> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> >> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
> >>     at master.c:527
> >> #2  0x0000555555575f8f in master_add_map_source (entry=0x5555556a10b0,
> >>     type=0x0, format=0x0, age=1296059657, argc=1, argv=<value optimized out>)
> >>     at master.c:191
> >> #3  0x0000555555579ee3 in master_parse_entry (buffer=<value optimized out>,
> >>     default_timeout=86400, logging=<value optimized out>, age=1296059657)
> >>     at master_parse.y:823
> >> #4  0x00002aaaaaab83fe in lookup_read_master (master=<value optimized out>,
> >>     age=1296059657, context=<value optimized out>) at lookup_ldap.c:1625
> >> #5  0x0000555555569052 in do_read_master (master=0x55555568d010,
> >>     type=<value optimized out>, age=1296059657) at lookup.c:96
> >> #6  0x000055555556aa3c in lookup_nss_read_master (master=0x55555568d010,
> >>     age=1296059657) at lookup.c:229
> >> #7  0x0000555555575c28 in master_read_master (master=0x55555568d010,
> >>     age=1296059657, readall=1) at master.c:831
> >> #8  0x000055555555d844 in do_read_master (arg=<value optimized out>)
> >>     at automount.c:1259
> >> #9  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #10 0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #11 0x0000000000000000 in ?? ()
> >>
> >> Thread 6 (Thread 6851):
> >> #0  0x00002b56e465aa3d in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
> >> #1  0x000055555556deb6 in cache_readlock (mc=0x55555568e5b8) at cache.c:60
> >> #2  0x000055555556baff in do_readmap (arg=<value optimized out>) at state.c:479
> >> #3  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #4  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #5  0x0000000000000000 in ?? ()
> >>
> >> Thread 5 (Thread 4026):
> >> #0  0x00002b56e465d6a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
> >> #1  0x00002b56e46599fb in _L_mutex_lock_92 () from /lib64/libpthread.so.0
> >> #2  0x00002b56e4659455 in pthread_mutex_lock () from /lib64/libpthread.so.0
> >> #3  0x00005555555746cd in master_mutex_lock () at master.c:49
> >> #4  0x0000555555560ff9 in handle_packet_missing_indirect (ap=0x55555569c940,
> >>     pkt=0x41823ec0) at indirect.c:808
> >> #5  0x000055555555fa32 in handle_packet (ap=<value optimized out>)
> >>     at automount.c:1026
> >> #6  handle_mounts (ap=<value optimized out>) at automount.c:1551
> >> #7  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #8  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #9  0x0000000000000000 in ?? ()
> >>
> >> Thread 4 (Thread 4025):
> >> #0  0x00002b56e4dd4b26 in fts_read () from /lib64/libc.so.6
> >> #1  0x00007fff37314e80 in ?? ()
> >> #2  0x0000000000000000 in ?? ()
> >>
> >> Thread 3 (Thread 4022):
> >> #0  0x00002b56e465ac77 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
> >> #1  0x00005555555752ea in master_source_writelock (entry=<value optimized out>)
> >>     at master.c:527
> >> #2  0x00005555555631af in handle_packet_missing_direct (ap=0x5555556a11c0,
> >>     pkt=0x40821ec0) at direct.c:1296
> >> #3  0x000055555555fa1d in handle_packet (ap=<value optimized out>)
> >>     at automount.c:1029
> >> #4  handle_mounts (ap=<value optimized out>) at automount.c:1551
> >> #5  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #6  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #7  0x0000000000000000 in ?? ()
> >>
> >> Thread 2 (Thread 4019):
> >> #0  0x00002b56e465b326 in pthread_cond_wait@@GLIBC_2.3.2 ()
> >>    from /lib64/libpthread.so.0
> >> #1  0x000055555556c8b3 in st_queue_handler (arg=<value optimized out>)
> >>     at state.c:1104
> >> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #4  0x0000000000000000 in ?? ()
> >>
> >> Thread 1 (Thread 4018):
> >> #0  0x00002b56e465b517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> >>    from /lib64/libpthread.so.0
> >> #1  0x0000555555573762 in alarm_handler (arg=<value optimized out>)
> >>     at alarm.c:206
> >> #2  0x00002b56e4657193 in start_thread () from /lib64/libpthread.so.0
> >> #3  0x00002b56e4ddcdfd in sysctl () from /lib64/libc.so.6
> >> #4  0x0000000000000000 in ?? ()
> >>
> >
> > It looks like a lock ordering bug to me and I think it's the same hang
> > that Steve Thompson reported.
> >
> > This will be hard to test I think, hopefully Mike or Steve will be able
> > to use a couple of machines for testing, even so it may take a while
> > before we know.
> >
> > Could you prepare a test package for Mike using this patch please.
> > It should apply OK but of course you'll need to discard or fix the
> > CHANGELOG hunk.
> >
> > I'll prepare a test package for Steve soon as I get time.
> >
> > autofs-5.0.5 - fix out of order locking in readmap
> >
> > From: Ian Kent <raven@themaw.net>
> >
> > When re-reading the master map and a mount lookup for a particular map
> > entry arrives at the same time a map source is being added to the map
> > entry and a map entry re-read is also initiated at the same time an
> > out of order mutex locking bug which results in a hang.
> > ---
> >
> >  CHANGELOG    |    1 +
> >  lib/master.c |    8 ++------
> >  2 files changed, 3 insertions(+), 6 deletions(-)
> >
> >
> > diff --git a/CHANGELOG b/CHANGELOG
> > index 11054da..631094f 100644
> > --- a/CHANGELOG
> > +++ b/CHANGELOG
> > @@ -55,6 +55,7 @@
> >  - add external bind method.
> >  - fix add simple bind auth.
> >  - add option to dump configured automount maps.
> > +- fix out of order locking in readmap.
> >
> >  03/09/2009 autofs-5.0.5
> >  -----------------------
> > diff --git a/lib/master.c b/lib/master.c
> > index 95bd3fb..daedf66 100644
> > --- a/lib/master.c
> > +++ b/lib/master.c
> > @@ -611,8 +611,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
> >  {
> >        struct list_head *head, *p;
> >
> > -       master_mutex_lock();
> > -
> >        head = &master->mounts;
> >        list_for_each(p, head) {
> >                struct master_mapent *entry;
> > @@ -625,8 +623,6 @@ struct master_mapent *master_find_mapent(struct master *master, const char *path
> >                }
> >        }
> >
> > -       master_mutex_unlock();
> > -
> >        return NULL;
> >  }
> >
> > @@ -703,9 +699,7 @@ struct master_mapent *master_new_mapent(struct master *master, const char *path,
> >
> >  void master_add_mapent(struct master *master, struct master_mapent *entry)
> >  {
> > -       master_mutex_lock();
> >        list_add_tail(&entry->list, &master->mounts);
> > -       master_mutex_unlock();
> >        return;
> >  }
> >
> > @@ -813,6 +807,7 @@ int master_read_master(struct master *master, time_t age, int readall)
> >         * We need to clear and re-populate the null map entry cache
> >         * before alowing anyone else to use it.
> >         */
> > +       master_mutex_lock();
> >        if (master->nc) {
> >                cache_writelock(master->nc);
> >                nc = master->nc;
> > @@ -831,6 +826,7 @@ int master_read_master(struct master *master, time_t age, int readall)
> >        master_init_scan();
> >        lookup_nss_read_master(master, age);
> >        cache_unlock(nc);
> > +       master_mutex_unlock();
> >
> >        if (!master->read_fail)
> >                master_mount_mounts(master, age, readall);
> 
> Hello Ian,
> 
> I just got a reply confirming that the hangs are no longer
> reproducible with this
> patch applied. Please, do you think you can consider it for upstream inclusion?

Oh wow, that's just plain luck I think.

I'll put the patch back on my commit queue but can't say yet when the
next time I will push updates. At least, as long as it's in the queue it
will be included in the next release, which is long overdue.

> 
> Thanks,
> Leonardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-08  9:30 5.0.5 non-expiring mounts Philip Ong Jr.
  2011-02-11  3:10 ` Ian Kent
  2011-02-15  1:10 ` Mike Marion
@ 2011-03-24 22:03 ` Leonardo Chiquitto
  2011-03-28  3:17   ` Ian Kent
  2 siblings, 1 reply; 19+ messages in thread
From: Leonardo Chiquitto @ 2011-03-24 22:03 UTC (permalink / raw)
  To: autofs; +Cc: Philip Ong Jr.

[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]

On Tue, Feb 8, 2011 at 7:30 AM, Philip Ong Jr. <pongjr@gmail.com> wrote:
> I've pulled down the 5.0.5 tree and all patches.
>
> I've set the logging level to debug and can see messages of expiring mounts
> in /var/log/messages...but when I check /etc/mtab or /proc/mounts or df, i
> can still see them there. Any ideas if this is a known issue or if I can
> give more info than below?

Hello,

I finally had time to return to this issue. To avoid the confusion caused by old
kernels, I reproduced the problem on openSUSE Factory (kernel 2.6.38 and
autofs 5.0.5 with all kernel.org patches applied). Here's the
configuration used:

# cat /etc/auto.master
/data	/etc/auto.data

# cat /etc/auto.data
isos	-fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid libre:/isos

# cat /etc/sysconfig/autofs | grep -v '^#'
AUTOFS_OPTIONS=""
LOCAL_OPTIONS=""
APPEND_OPTIONS="yes"
USE_MISC_DEVICE="yes"
DEFAULT_MASTER_MAP_NAME="auto.master"
DEFAULT_TIMEOUT=600
DEFAULT_BROWSE_MODE="yes"
DEFAULT_LOGGING="debug"
DEFAULT_MAP_OBJECT_CLASS="nisMap"
DEFAULT_ENTRY_OBJECT_CLASS="nisObject"
DEFAULT_MAP_ATTRIBUTE="nisMapName"
DEFAULT_ENTRY_ATTRIBUTE="cn"
DEFAULT_VALUE_ATTRIBUTE="nisMapEntry"
DEFAULT_AUTH_CONF_FILE="etc/autofs_ldap_auth.conf"
MAP_HASH_TABLE_SIZE=1024

I'm attaching the automount debug logs showing the following sequence:

- automount startup
- mount of an NFS volume (/data/isos)
- failed attempt to trigger the expiration of the mounted volume (sending
  SIGUSR1)
- successful attempt to trigger the expiration of the mounted volume
  (sending SIGUSR1 again)

I also confirmed that the problem no longer happens if I revert the
following commit:

commit 08aafab4c1d0ab6227c80f8cd1086ae78556a370
Author: Ian Kent <raven@themaw.net>
Date:   Thu Sep 9 11:10:47 2010 +0800

    autofs-5.0.5 - fix direct map not updating on reread

Philip, do you think you could try to revert it in your setup/package just
to confirm this "works"?

Thanks,
Leonardo

[-- Attachment #2: messages-n47.txt --]
[-- Type: text/plain, Size: 4860 bytes --]

== STARTUP

Mar 24 18:41:33 n47 automount[4681]: Starting automounter version 5.0.5, master map auto.master
Mar 24 18:41:33 n47 automount[4681]: using kernel protocol version 5.02
Mar 24 18:41:33 n47 automount[4681]: lookup_nss_read_master: reading master files auto.master
Mar 24 18:41:33 n47 automount[4681]: parse_init: parse(sun): init gathered global options: (null)
Mar 24 18:41:33 n47 automount[4681]: lookup_read_master: lookup(file): read entry /data
Mar 24 18:41:33 n47 automount[4681]: master_do_mount: mounting /data
Mar 24 18:41:33 n47 automount[4681]: automount_path_to_fifo: fifo name /var/run/autofs.fifo-data
Mar 24 18:41:33 n47 automount[4681]: lookup_nss_read_map: reading map file /etc/auto.data
Mar 24 18:41:33 n47 automount[4681]: parse_init: parse(sun): init gathered global options: (null)
Mar 24 18:41:33 n47 automount[4681]: mounted indirect on /data with timeout 600, freq 150 seconds
Mar 24 18:41:33 n47 automount[4681]: st_ready: st_ready(): state = 0 path /data
Mar 24 18:41:33 n47 automount[4681]: ghosting enabled

== COMMAND: ls /data/isos

Mar 24 18:41:54 n47 automount[4681]: handle_packet: type = 3
Mar 24 18:41:54 n47 automount[4681]: handle_packet_missing_indirect: token 3, name isos, request pid 4691
Mar 24 18:41:54 n47 automount[4681]: attempting to mount entry /data/isos
Mar 24 18:41:54 n47 automount[4681]: lookup_mount: lookup(file): looking up isos
Mar 24 18:41:54 n47 automount[4681]: lookup_mount: lookup(file): isos -> -fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid libre:/isos
Mar 24 18:41:54 n47 automount[4681]: parse_mount: parse(sun): expanded entry: -fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid libre:/isos
Mar 24 18:41:54 n47 automount[4681]: parse_mount: parse(sun): gathered options: fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid
Mar 24 18:41:54 n47 automount[4681]: parse_mount: parse(sun): dequote("libre:/isos") -> libre:/isos
Mar 24 18:41:54 n47 automount[4681]: parse_mount: parse(sun): core of entry: options=fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid, loc=libre:/isos
Mar 24 18:41:54 n47 automount[4681]: sun_mount: parse(sun): mounting root /data, mountpoint isos, what libre:/isos, fstype nfs, options ro,rsize=8192,wsize=8192,intr,nolock,nosuid
Mar 24 18:41:54 n47 automount[4681]: mount_mount: mount(nfs): root=/data name=isos what=libre:/isos, fstype=nfs, options=ro,rsize=8192,wsize=8192,intr,nolock,nosuid
Mar 24 18:41:54 n47 automount[4681]: mount_mount: mount(nfs): nfs options="ro,rsize=8192,wsize=8192,intr,nolock,nosuid", nosymlink=0, ro=1
Mar 24 18:41:54 n47 automount[4681]: mount_mount: mount(nfs): calling mkdir_path /data/isos
Mar 24 18:41:54 n47 automount[4681]: mount_mount: mount(nfs): calling mount -t nfs -s -o ro,rsize=8192,wsize=8192,intr,nolock,nosuid libre:/isos /data/isos
Mar 24 18:41:54 n47 automount[4681]: mount_mount: mount(nfs): mounted libre:/isos on /data/isos
Mar 24 18:41:54 n47 automount[4681]: dev_ioctl_send_ready: token = 3
Mar 24 18:41:54 n47 automount[4681]: mounted /data/isos
Mar 24 18:41:54 n47 automount[4681]: st_readmap: state 1 path /data
Mar 24 18:41:54 n47 automount[4681]: re-reading map for /data
Mar 24 18:41:54 n47 automount[4681]: lookup_nss_read_map: reading map file /etc/auto.data
Mar 24 18:41:54 n47 automount[4681]: parse_init: parse(sun): init gathered global options: (null)
Mar 24 18:41:54 n47 automount[4681]: st_ready: st_ready(): state = 4 path /data

== COMMAND: kill -USR1 $(pidof automount)

Mar 24 18:42:07 n47 automount[4681]: do_notify_state: signal 10
Mar 24 18:42:07 n47 automount[4681]: master_notify_state_change: sig 10 switching /data from 1 to 3

== COMMAND: kill -USR1 $(pidof automount)

Mar 24 18:42:45 n47 automount[4681]: do_notify_state: signal 10
Mar 24 18:42:45 n47 automount[4681]: master_notify_state_change: sig 10 switching /data from 1 to 3
Mar 24 18:42:45 n47 automount[4681]: st_prune: state 1 path /data
Mar 24 18:42:45 n47 automount[4681]: expire_proc: exp_proc = 140224412317440 path /data
Mar 24 18:42:45 n47 automount[4681]: expire_proc_indirect: expire /data/isos
Mar 24 18:42:45 n47 automount[4681]: handle_packet: type = 4
Mar 24 18:42:45 n47 automount[4681]: handle_packet_expire_indirect: token 4, name isos
Mar 24 18:42:45 n47 automount[4681]: expiring path /data/isos
Mar 24 18:42:45 n47 automount[4681]: umount_multi: path /data/isos incl 1
Mar 24 18:42:45 n47 automount[4681]: umount_subtree_mounts: unmounting dir = /data/isos
Mar 24 18:42:45 n47 automount[4681]: expired /data/isos
Mar 24 18:42:45 n47 automount[4681]: dev_ioctl_send_ready: token = 4
Mar 24 18:42:45 n47 automount[4681]: expire_cleanup: got thid 140224412317440 path /data stat 0
Mar 24 18:42:45 n47 automount[4681]: expire_cleanup: sigchld: exp 140224412317440 finished, switching from 3 to 1
Mar 24 18:42:45 n47 automount[4681]: st_ready: st_ready(): state = 3 path /data

[-- Attachment #3: Type: text/plain, Size: 140 bytes --]

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-03-24 22:03 ` Leonardo Chiquitto
@ 2011-03-28  3:17   ` Ian Kent
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Kent @ 2011-03-28  3:17 UTC (permalink / raw)
  To: Leonardo Chiquitto; +Cc: autofs, Philip Ong Jr.

On Thu, 2011-03-24 at 19:03 -0300, Leonardo Chiquitto wrote:
> 
> I finally had time to return to this issue. To avoid the confusion caused by old
> kernels, I reproduced the problem on openSUSE Factory (kernel 2.6.38 and
> autofs 5.0.5 with all kernel.org patches applied). Here's the
> configuration used:
> 
> # cat /etc/auto.master
> /data	/etc/auto.data
> 
> # cat /etc/auto.data
> isos	-fstype=nfs,ro,rsize=8192,wsize=8192,intr,nolock,nosuid libre:/isos
> 
> # cat /etc/sysconfig/autofs | grep -v '^#'
> AUTOFS_OPTIONS=""
> LOCAL_OPTIONS=""
> APPEND_OPTIONS="yes"
> USE_MISC_DEVICE="yes"
> DEFAULT_MASTER_MAP_NAME="auto.master"
> DEFAULT_TIMEOUT=600
> DEFAULT_BROWSE_MODE="yes"
> DEFAULT_LOGGING="debug"
> DEFAULT_MAP_OBJECT_CLASS="nisMap"
> DEFAULT_ENTRY_OBJECT_CLASS="nisObject"
> DEFAULT_MAP_ATTRIBUTE="nisMapName"
> DEFAULT_ENTRY_ATTRIBUTE="cn"
> DEFAULT_VALUE_ATTRIBUTE="nisMapEntry"
> DEFAULT_AUTH_CONF_FILE="etc/autofs_ldap_auth.conf"
> MAP_HASH_TABLE_SIZE=1024
> 
> I'm attaching the automount debug logs showing the following sequence:
> 
> - automount startup
> - mount of an NFS volume (/data/isos)
> - failed attempt to trigger the expiration of the mounted volume (sending
>   SIGUSR1)
> - successful attempt to trigger the expiration of the mounted volume
>   (sending SIGUSR1 again)
> 
> I also confirmed that the problem no longer happens if I revert the
> following commit:
> 
> commit 08aafab4c1d0ab6227c80f8cd1086ae78556a370
> Author: Ian Kent <raven@themaw.net>
> Date:   Thu Sep 9 11:10:47 2010 +0800
> 
>     autofs-5.0.5 - fix direct map not updating on reread
> 
> Philip, do you think you could try to revert it in your setup/package just
> to confirm this "works"?

The problem was actually introduced by
autofs-5.0.5-remove-state-machine-timed-wait.patch.

There had been a long standing bug in the state queue handling which I
thought was a pthreads problem. When I added the above patch everything
appeared to work OK.

But the change you mentioned above exposed the bug and the log you
provided allowed me to work out what was broken and fix it.

But the stale map processing has also been broken somewhere along the
way and I'm also working on fixing that.

Once that's done there's not much more needed for 5.0.6, at last.

Ian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
@ 2011-02-17 19:21 Deke Clinger
  0 siblings, 0 replies; 19+ messages in thread
From: Deke Clinger @ 2011-02-17 19:21 UTC (permalink / raw)
  To: autofs

(my reply to Ian, without the logfile attached)

--

On Thu, 17 Feb 2011, Ian Kent wrote:

> On Wed, 2011-02-16 at 17:08 -0800, Deke Clinger wrote:
> > On Wed Feb 16 12:18:15 UTC 2011 Ian Kent wrote:
> > 
> > > A backtrace generally doesn't do us any good when were trying to find an
> > > expire problem, the debug log is where we have to start on these.
> > 
> > I've got a debug log from a sled10sp3 machine running the Novell autofs
> > 5.0.5 update. Ian - could I mail you this personally? I'd rather not have
> > this log with usernames, paths, hostnames, etc. on a public archive.
> 
> Yes please.

Attached. This is all automount entries from bootup in case you're
interested. I login and start doing stuff around "Feb 16 16:38:00". There
are entries like:

Feb 16 16:57:40 src@private automount[3833]: expired /usr2/user1
Feb 16 16:57:40 src@private automount[3833]: expiring path /usr2/user2

but these paths never actually go away.

BTW, this host is running 'autofs5-5.0.5-0.7.1.2597.0.TEST.667967' from
Novell, which includes your 'fix-out-of-order-locking-in-readmap' patch.

> > FWIW, I did do a test with autofs5.0.5 built from source with all the
> > patches in the patch order list from kernel.org and it demonstrated the
> > same behavior: a USR1 signal unmounted the direct map entries but not the
> > indirect. I changed the maps to files, pruned them to a few hundred entries
> > and converted the indirect entries to direct and re-ran the
> > test. Configured like this all entries unmounted upon a USR1 signal so I do
> > believe this has something to do with direct vs indirect mounts. 
> 
> There are a couple of possibilities. I'm always working with the current
> source and basic testing includes expiring both direct and indirect
> mounts as a matter of course and I'm not seeing this. So there has to be
> more to it or I have one or other patches already in the queue that
> resolve the problem.
> 
> Does this happen straight away or start after some time of running?

Immediately. 

> Do all indirect mounts stop expiring or only some?

We use /net with a map like this:

# /net, auto_master, automountMaps, sandiego, corp, qualcomm.com
dn: automountKey=/net,automountMapName=auto_master,ou=automountMaps,ou=sandiego,ou=corp,dc=qualcomm,dc=com
qcautomountkey: /net
objectClass: automount
objectClass: top
objectClass: qcNisObject
automountkey: /net
automountinformation: -hosts -nosuid

Those paths do unmount on signal. 

> What is the for of the indirect map entries, are they multi-mount
> entries?

No multimount, the maps are in LDAP like the one above. Here's my home
directory, for example:

# dclinger, auto.home, automountMaps, sandiego, corp, qualcomm.com
dn: qcAutomountKey=dclinger,automountMapName=auto.home,ou=automountMaps,ou=san 
 diego,ou=corp,dc=qualcomm,dc=com
automountinformation: -rw,noquota filer:/vol/vol2/usr2/dclinger
qcautomountkey: dclinger
automountkey: dclinger
objectClass: top
objectClass: automount
objectClass: qcNisObject

The others are similiar, with different keys obviously and various file
servers and volume names.

> Obviously the debug log will probably answer most of these questions.

Please let me know if any other questions. Thanks for looking at this.

-Deke

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
  2011-02-17  1:08 Deke Clinger
@ 2011-02-17  7:57 ` Ian Kent
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Kent @ 2011-02-17  7:57 UTC (permalink / raw)
  To: Deke Clinger; +Cc: autofs

On Wed, 2011-02-16 at 17:08 -0800, Deke Clinger wrote:
> On Wed Feb 16 12:18:15 UTC 2011 Ian Kent wrote:
> 
> > A backtrace generally doesn't do us any good when were trying to find an
> > expire problem, the debug log is where we have to start on these.
> 
> I've got a debug log from a sled10sp3 machine running the Novell autofs
> 5.0.5 update. Ian - could I mail you this personally? I'd rather not have
> this log with usernames, paths, hostnames, etc. on a public archive.

Yes please.

> 
> FWIW, I did do a test with autofs5.0.5 built from source with all the
> patches in the patch order list from kernel.org and it demonstrated the
> same behavior: a USR1 signal unmounted the direct map entries but not the
> indirect. I changed the maps to files, pruned them to a few hundred entries
> and converted the indirect entries to direct and re-ran the
> test. Configured like this all entries unmounted upon a USR1 signal so I do
> believe this has something to do with direct vs indirect mounts. 

There are a couple of possibilities. I'm always working with the current
source and basic testing includes expiring both direct and indirect
mounts as a matter of course and I'm not seeing this. So there has to be
more to it or I have one or other patches already in the queue that
resolve the problem.

Does this happen straight away or start after some time of running?
Do all indirect mounts stop expiring or only some?
What is the for of the indirect map entries, are they multi-mount
entries?

Obviously the debug log will probably answer most of these questions.

Ian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 5.0.5 non-expiring mounts
@ 2011-02-17  1:08 Deke Clinger
  2011-02-17  7:57 ` Ian Kent
  0 siblings, 1 reply; 19+ messages in thread
From: Deke Clinger @ 2011-02-17  1:08 UTC (permalink / raw)
  To: autofs

On Wed Feb 16 12:18:15 UTC 2011 Ian Kent wrote:

> A backtrace generally doesn't do us any good when were trying to find an
> expire problem, the debug log is where we have to start on these.

I've got a debug log from a sled10sp3 machine running the Novell autofs
5.0.5 update. Ian - could I mail you this personally? I'd rather not have
this log with usernames, paths, hostnames, etc. on a public archive.

FWIW, I did do a test with autofs5.0.5 built from source with all the
patches in the patch order list from kernel.org and it demonstrated the
same behavior: a USR1 signal unmounted the direct map entries but not the
indirect. I changed the maps to files, pruned them to a few hundred entries
and converted the indirect entries to direct and re-ran the
test. Configured like this all entries unmounted upon a USR1 signal so I do
believe this has something to do with direct vs indirect mounts. 

Regards,

-Deke

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-28  3:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-08  9:30 5.0.5 non-expiring mounts Philip Ong Jr.
2011-02-11  3:10 ` Ian Kent
2011-02-15  1:10 ` Mike Marion
2011-02-15  3:11   ` Ian Kent
2011-02-15  3:37     ` Ian Kent
2011-02-15  5:28       ` Mike Marion
2011-02-15 12:28         ` Ian Kent
2011-02-15 18:11           ` Leonardo Chiquitto
2011-02-16  7:08             ` Ian Kent
2011-02-16 12:18               ` Ian Kent
2011-02-23 20:22                 ` Leonardo Chiquitto
2011-02-16 12:58               ` Leonardo Chiquitto
2011-03-04 20:10               ` Leonardo Chiquitto
2011-03-05  5:18                 ` Ian Kent
2011-03-24 22:03 ` Leonardo Chiquitto
2011-03-28  3:17   ` Ian Kent
2011-02-17  1:08 Deke Clinger
2011-02-17  7:57 ` Ian Kent
2011-02-17 19:21 Deke Clinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.