All of lore.kernel.org
 help / color / mirror / Atom feed
* two /net paths to the same local mount?
@ 2010-06-21 19:17 Chris Quenelle
  2010-06-22  2:05 ` Michael Loftis
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-06-21 19:17 UTC (permalink / raw)
  To: autofs

When I use /net/host and /net/host.domain to access a file system
that is local to the current host, the second flavor of access will
hang.

We have a complex build environment running on multiple hosts,
so directory names often need to work either locally or remotely.
That's why we're using /net paths for the local host.

Is this a known bug?

I can supply more details, if necessary.
Rebuilding from source is not really practical in my environment.

--chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-21 19:17 two /net paths to the same local mount? Chris Quenelle
@ 2010-06-22  2:05 ` Michael Loftis
  2010-06-23 18:32   ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Loftis @ 2010-06-22  2:05 UTC (permalink / raw)
  To: Chris Quenelle, autofs

<reply inline>

--On Monday, June 21, 2010 12:17 PM -0700 Chris Quenelle 
<chris.quenelle@oracle.com> wrote:

> When I use /net/host and /net/host.domain to access a file system
> that is local to the current host, the second flavor of access will
> hang.

Check the /etc/hosts file on the machine having the problem.  Usually this 
means that your'e expecting it to contact from/to one address (say the 
loopback in the case of hostname) but the hosts file has another for the 
hostname.domain case, or doesn't have one, so it's getting the external 
address via DNS.

>
> We have a complex build environment running on multiple hosts,
> so directory names often need to work either locally or remotely.
> That's why we're using /net paths for the local host.
>
> Is this a known bug?
>
> I can supply more details, if necessary.
> Rebuilding from source is not really practical in my environment.
>
> --chris
>
>
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-22  2:05 ` Michael Loftis
@ 2010-06-23 18:32   ` Chris Quenelle
  2010-06-24  3:00     ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-06-23 18:32 UTC (permalink / raw)
  To: Michael Loftis, autofs

Thanks for your time guys!

It turns out it's not a general bug.  I was just applying
a pessimistic interpretation without checking.

After more testing, it turns out it's just problem
on one machine, for two mount points under one
host alias.

I'm interested in knowing more about how I could
debug the situation without just rebooting and
hoping it doesn't happen again.

Here's my scenario:

I have three exported filesystems.
I can use three host aliases to refer to my own machine.
(carabas, carabas.sfbay, carabas.sfbay.sun.com)

/etc/hosts says:
[IP addr]     carabas.sfbay.sun.com carabas


All filesystems work under all /net/foo aliases
except two of them hang when accessed via one
of the host aliases.

% showmount -e localhost
Export list for localhost:
/export/home3 *
/export/home2 *
/export/home1 *


% df -kl
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             69575344   3408900  66166444   5% /
udev                   8222568       148   8222420   1% /dev
/dev/sdb1             71671728  26507132  45164596  37% /export/home1
/dev/sdc1             71671728   8545196  63126532  12% /export/home2
/dev/sdd1             71671728   5375672  66296056   8% /export/home3
/export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
/export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
/export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
/export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
/export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
/export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
/export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3


% strace ls /net/carabas/export/home2
...
...
mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
close(4)                                = 0
close(3)                                = 0
open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY <unfinished ...>

I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
It doesn't tell you much, the interesting stuff happens in the automounter.

Is there a error log file for the automounter that is enabled by default?
Is there an easy recipe for restarting the automounter with debugging output on?

--chris










Michael Loftis wrote:
> <reply inline>
> 
> --On Monday, June 21, 2010 12:17 PM -0700 Chris Quenelle
> <chris.quenelle@oracle.com> wrote:
> 
>> When I use /net/host and /net/host.domain to access a file system
>> that is local to the current host, the second flavor of access will
>> hang.
> 
> Check the /etc/hosts file on the machine having the problem.  Usually
> this means that your'e expecting it to contact from/to one address (say
> the loopback in the case of hostname) but the hosts file has another for
> the hostname.domain case, or doesn't have one, so it's getting the
> external address via DNS.
> 
>>
>> We have a complex build environment running on multiple hosts,
>> so directory names often need to work either locally or remotely.
>> That's why we're using /net paths for the local host.
>>
>> Is this a known bug?
>>
>> I can supply more details, if necessary.
>> Rebuilding from source is not really practical in my environment.
>>
>> --chris
>>
>>
>>
>> _______________________________________________
>> autofs mailing list
>> autofs@linux.kernel.org
>> http://linux.kernel.org/mailman/listinfo/autofs
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-23 18:32   ` Chris Quenelle
@ 2010-06-24  3:00     ` Ian Kent
  2010-06-29 16:14       ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-06-24  3:00 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs

On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
> Thanks for your time guys!
> 
> It turns out it's not a general bug.  I was just applying
> a pessimistic interpretation without checking.
> 
> After more testing, it turns out it's just problem
> on one machine, for two mount points under one
> host alias.
> 
> I'm interested in knowing more about how I could
> debug the situation without just rebooting and
> hoping it doesn't happen again.
> 
> Here's my scenario:
> 
> I have three exported filesystems.
> I can use three host aliases to refer to my own machine.
> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
> 
> /etc/hosts says:
> [IP addr]     carabas.sfbay.sun.com carabas
> 
> 
> All filesystems work under all /net/foo aliases
> except two of them hang when accessed via one
> of the host aliases.

Are you sure the address lookup is returning what you think it does?
What's in /etc/host.conf?

> 
> % showmount -e localhost
> Export list for localhost:
> /export/home3 *
> /export/home2 *
> /export/home1 *
> 
> 
> % df -kl
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sda2             69575344   3408900  66166444   5% /
> udev                   8222568       148   8222420   1% /dev
> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
> 
> 
> % strace ls /net/carabas/export/home2
> ...
> ...
> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
> close(4)                                = 0
> close(3)                                = 0
> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY <unfinished ...>

What does a ps listing give you when it is blocked?
Any mount.nfs processes?

That's sign that automount thinks that the mount isn't local and is
probably trying to mount from a possibly non-existent ip address.

> 
> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
> It doesn't tell you much, the interesting stuff happens in the automounter.
> 
> Is there a error log file for the automounter that is enabled by default?
> Is there an easy recipe for restarting the automounter with debugging output on?

You need to ensure that debug logging is being recorded in syslog.
Ensure you are sending daemon.* somewhere.

Then

automount -l debug /net

will start the debug logging and

automount -l err /net

will set it back to no logging.

That is depending on the version of autofs you are using.

If you don't see this option in the automount(8) man page you don't have
this option and all you can do is enable debug logging in the
configuration.

Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-24  3:00     ` Ian Kent
@ 2010-06-29 16:14       ` Chris Quenelle
  2010-06-30 13:12         ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-06-29 16:14 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

Ian,

Thanks again for your help.
I have syslog-ng installed, and I can't figure out how to tell
if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
appear anywhere in /etc/syslog-ng/syslog-ng.conf

But when I grep /var/log/messages I see:

% grep automount messages
Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net

I didn't get the get_pkg message until I turned off debug logging.

While logging was set to debug, I tried the hanging ls command again, and
I also tried listing a different file system that was actually remote.
I didn't see any log messages for those operations.

Notes:

1. /etc/host.conf says only:
      order hosts, bind
      multi on

2. No mount.nfs processes are running when the ls command hangs
   on the local mount.


Ian Kent wrote:
> On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
>> Thanks for your time guys!
>>
>> It turns out it's not a general bug.  I was just applying
>> a pessimistic interpretation without checking.
>>
>> After more testing, it turns out it's just problem
>> on one machine, for two mount points under one
>> host alias.
>>
>> I'm interested in knowing more about how I could
>> debug the situation without just rebooting and
>> hoping it doesn't happen again.
>>
>> Here's my scenario:
>>
>> I have three exported filesystems.
>> I can use three host aliases to refer to my own machine.
>> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
>>
>> /etc/hosts says:
>> [IP addr]     carabas.sfbay.sun.com carabas
>>
>>
>> All filesystems work under all /net/foo aliases
>> except two of them hang when accessed via one
>> of the host aliases.
> 
> Are you sure the address lookup is returning what you think it does?
> What's in /etc/host.conf?
> 
>> % showmount -e localhost
>> Export list for localhost:
>> /export/home3 *
>> /export/home2 *
>> /export/home1 *
>>
>>
>> % df -kl
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sda2             69575344   3408900  66166444   5% /
>> udev                   8222568       148   8222420   1% /dev
>> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
>> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
>> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
>> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
>>
>>
>> % strace ls /net/carabas/export/home2
>> ...
>> ...
>> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
>> close(4)                                = 0
>> close(3)                                = 0
>> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY <unfinished ...>
> 
> What does a ps listing give you when it is blocked?
> Any mount.nfs processes?
> 
> That's sign that automount thinks that the mount isn't local and is
> probably trying to mount from a possibly non-existent ip address.
> 
>> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
>> It doesn't tell you much, the interesting stuff happens in the automounter.
>>
>> Is there a error log file for the automounter that is enabled by default?
>> Is there an easy recipe for restarting the automounter with debugging output on?
> 
> You need to ensure that debug logging is being recorded in syslog.
> Ensure you are sending daemon.* somewhere.
> 
> Then
> 
> automount -l debug /net
> 
> will start the debug logging and
> 
> automount -l err /net
> 
> will set it back to no logging.
> 
> That is depending on the version of autofs you are using.
> 
> If you don't see this option in the automount(8) man page you don't have
> this option and all you can do is enable debug logging in the
> configuration.
> 
> Ian
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-29 16:14       ` Chris Quenelle
@ 2010-06-30 13:12         ` Ian Kent
  2010-06-30 14:57           ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-06-30 13:12 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs

On Tue, 2010-06-29 at 09:14 -0700, Chris Quenelle wrote:
> Ian,
> 
> Thanks again for your help.
> I have syslog-ng installed, and I can't figure out how to tell
> if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
> appear anywhere in /etc/syslog-ng/syslog-ng.conf

Well, you need to consult the man page for the configuration file then
don't you.

For me, when syslog was in use and now with rsyslog, I just add a new
line to send all the facility daemon output to a log file.

I add the line:

daemon.*                                /var/log/debug

to the configuration and then

touch /var/log/debug

and finally restart logging service.

You probably should post your autofs maps as well.

> But when I grep /var/log/messages I see:
> 
> % grep automount messages
> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> 
> I didn't get the get_pkg message until I turned off debug logging.
> 
> While logging was set to debug, I tried the hanging ls command again, and
> I also tried listing a different file system that was actually remote.
> I didn't see any log messages for those operations.
> 
> Notes:
> 
> 1. /etc/host.conf says only:
>       order hosts, bind
>       multi on
> 
> 2. No mount.nfs processes are running when the ls command hangs
>    on the local mount.
> 
> 
> Ian Kent wrote:
> > On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
> >> Thanks for your time guys!
> >>
> >> It turns out it's not a general bug.  I was just applying
> >> a pessimistic interpretation without checking.
> >>
> >> After more testing, it turns out it's just problem
> >> on one machine, for two mount points under one
> >> host alias.
> >>
> >> I'm interested in knowing more about how I could
> >> debug the situation without just rebooting and
> >> hoping it doesn't happen again.
> >>
> >> Here's my scenario:
> >>
> >> I have three exported filesystems.
> >> I can use three host aliases to refer to my own machine.
> >> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
> >>
> >> /etc/hosts says:
> >> [IP addr]     carabas.sfbay.sun.com carabas
> >>
> >>
> >> All filesystems work under all /net/foo aliases
> >> except two of them hang when accessed via one
> >> of the host aliases.
> > 
> > Are you sure the address lookup is returning what you think it does?
> > What's in /etc/host.conf?
> > 
> >> % showmount -e localhost
> >> Export list for localhost:
> >> /export/home3 *
> >> /export/home2 *
> >> /export/home1 *
> >>
> >>
> >> % df -kl
> >> Filesystem           1K-blocks      Used Available Use% Mounted on
> >> /dev/sda2             69575344   3408900  66166444   5% /
> >> udev                   8222568       148   8222420   1% /dev
> >> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
> >> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
> >> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
> >> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
> >> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
> >> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
> >> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
> >> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
> >> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
> >> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
> >>
> >>
> >> % strace ls /net/carabas/export/home2
> >> ...
> >> ...
> >> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
> >> close(4)                                = 0
> >> close(3)                                = 0
> >> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY <unfinished ...>
> > 
> > What does a ps listing give you when it is blocked?
> > Any mount.nfs processes?
> > 
> > That's sign that automount thinks that the mount isn't local and is
> > probably trying to mount from a possibly non-existent ip address.
> > 
> >> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
> >> It doesn't tell you much, the interesting stuff happens in the automounter.
> >>
> >> Is there a error log file for the automounter that is enabled by default?
> >> Is there an easy recipe for restarting the automounter with debugging output on?
> > 
> > You need to ensure that debug logging is being recorded in syslog.
> > Ensure you are sending daemon.* somewhere.
> > 
> > Then
> > 
> > automount -l debug /net
> > 
> > will start the debug logging and
> > 
> > automount -l err /net
> > 
> > will set it back to no logging.
> > 
> > That is depending on the version of autofs you are using.
> > 
> > If you don't see this option in the automount(8) man page you don't have
> > this option and all you can do is enable debug logging in the
> > configuration.
> > 
> > Ian
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-30 13:12         ` Ian Kent
@ 2010-06-30 14:57           ` Chris Quenelle
  2010-07-01  2:19             ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-06-30 14:57 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

On Wednesday June 30    6:12AM, Ian Kent wrote:
> On Tue, 2010-06-29 at 09:14 -0700, Chris Quenelle wrote:
>> Ian,
>>
>> Thanks again for your help.
>> I have syslog-ng installed, and I can't figure out how to tell
>> if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
>> appear anywhere in /etc/syslog-ng/syslog-ng.conf
>
> Well, you need to consult the man page for the configuration file then
> don't you.

Sorry for sounding lazy there.  I've never used syslog-ng before, and
I read the man page for a while without figuring how the old style
of names/categories mapped to the new system.  I'll continue to fiddle
around and see if I can find out what's going on.

Thanks for helping me get started with the debugging output.

My autofs map looks like this:
% ypcat -k auto.master | grep /net
/net -hosts -intr,nosuid,retrans=10,retry=3,nobrowse

But I'm not sure that helps.

--chris



>
> For me, when syslog was in use and now with rsyslog, I just add a new
> line to send all the facility daemon output to a log file.
>
> I add the line:
>
> daemon.*                                /var/log/debug
>
> to the configuration and then
>
> touch /var/log/debug
>
> and finally restart logging service.
>
> You probably should post your autofs maps as well.
>
>> But when I grep /var/log/messages I see:
>>
>> % grep automount messages
>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>> Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
>>
>> I didn't get the get_pkg message until I turned off debug logging.
>>
>> While logging was set to debug, I tried the hanging ls command again, and
>> I also tried listing a different file system that was actually remote.
>> I didn't see any log messages for those operations.
>>
>> Notes:
>>
>> 1. /etc/host.conf says only:
>>        order hosts, bind
>>        multi on
>>
>> 2. No mount.nfs processes are running when the ls command hangs
>>     on the local mount.
>>
>>
>> Ian Kent wrote:
>>> On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
>>>> Thanks for your time guys!
>>>>
>>>> It turns out it's not a general bug.  I was just applying
>>>> a pessimistic interpretation without checking.
>>>>
>>>> After more testing, it turns out it's just problem
>>>> on one machine, for two mount points under one
>>>> host alias.
>>>>
>>>> I'm interested in knowing more about how I could
>>>> debug the situation without just rebooting and
>>>> hoping it doesn't happen again.
>>>>
>>>> Here's my scenario:
>>>>
>>>> I have three exported filesystems.
>>>> I can use three host aliases to refer to my own machine.
>>>> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
>>>>
>>>> /etc/hosts says:
>>>> [IP addr]     carabas.sfbay.sun.com carabas
>>>>
>>>>
>>>> All filesystems work under all /net/foo aliases
>>>> except two of them hang when accessed via one
>>>> of the host aliases.
>>>
>>> Are you sure the address lookup is returning what you think it does?
>>> What's in /etc/host.conf?
>>>
>>>> % showmount -e localhost
>>>> Export list for localhost:
>>>> /export/home3 *
>>>> /export/home2 *
>>>> /export/home1 *
>>>>
>>>>
>>>> % df -kl
>>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>>> /dev/sda2             69575344   3408900  66166444   5% /
>>>> udev                   8222568       148   8222420   1% /dev
>>>> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
>>>> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
>>>> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
>>>>
>>>>
>>>> % strace ls /net/carabas/export/home2
>>>> ...
>>>> ...
>>>> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
>>>> close(4)                                = 0
>>>> close(3)                                = 0
>>>> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY<unfinished ...>
>>>
>>> What does a ps listing give you when it is blocked?
>>> Any mount.nfs processes?
>>>
>>> That's sign that automount thinks that the mount isn't local and is
>>> probably trying to mount from a possibly non-existent ip address.
>>>
>>>> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
>>>> It doesn't tell you much, the interesting stuff happens in the automounter.
>>>>
>>>> Is there a error log file for the automounter that is enabled by default?
>>>> Is there an easy recipe for restarting the automounter with debugging output on?
>>>
>>> You need to ensure that debug logging is being recorded in syslog.
>>> Ensure you are sending daemon.* somewhere.
>>>
>>> Then
>>>
>>> automount -l debug /net
>>>
>>> will start the debug logging and
>>>
>>> automount -l err /net
>>>
>>> will set it back to no logging.
>>>
>>> That is depending on the version of autofs you are using.
>>>
>>> If you don't see this option in the automount(8) man page you don't have
>>> this option and all you can do is enable debug logging in the
>>> configuration.
>>>
>>> Ian
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-06-30 14:57           ` Chris Quenelle
@ 2010-07-01  2:19             ` Ian Kent
  2010-07-07  0:27               ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-07-01  2:19 UTC (permalink / raw)
  To: chris.quenelle; +Cc: autofs

On Wed, 2010-06-30 at 07:57 -0700, Chris Quenelle wrote:
> On Wednesday June 30    6:12AM, Ian Kent wrote:
> > On Tue, 2010-06-29 at 09:14 -0700, Chris Quenelle wrote:
> >> Ian,
> >>
> >> Thanks again for your help.
> >> I have syslog-ng installed, and I can't figure out how to tell
> >> if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
> >> appear anywhere in /etc/syslog-ng/syslog-ng.conf
> >
> > Well, you need to consult the man page for the configuration file then
> > don't you.
> 
> Sorry for sounding lazy there.  I've never used syslog-ng before, and
> I read the man page for a while without figuring how the old style
> of names/categories mapped to the new system.  I'll continue to fiddle
> around and see if I can find out what's going on.
> 
> Thanks for helping me get started with the debugging output.
> 
> My autofs map looks like this:
> % ypcat -k auto.master | grep /net
> /net -hosts -intr,nosuid,retrans=10,retry=3,nobrowse
> 
> But I'm not sure that helps.

Is automount crashing, ie. does it show up in a ps listing after you see
it hang?

> 
> --chris
> 
> 
> 
> >
> > For me, when syslog was in use and now with rsyslog, I just add a new
> > line to send all the facility daemon output to a log file.
> >
> > I add the line:
> >
> > daemon.*                                /var/log/debug
> >
> > to the configuration and then
> >
> > touch /var/log/debug
> >
> > and finally restart logging service.
> >
> > You probably should post your autofs maps as well.
> >
> >> But when I grep /var/log/messages I see:
> >>
> >> % grep automount messages
> >> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >> Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
> >> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> >> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> >> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> >>
> >> I didn't get the get_pkg message until I turned off debug logging.
> >>
> >> While logging was set to debug, I tried the hanging ls command again, and
> >> I also tried listing a different file system that was actually remote.
> >> I didn't see any log messages for those operations.
> >>
> >> Notes:
> >>
> >> 1. /etc/host.conf says only:
> >>        order hosts, bind
> >>        multi on
> >>
> >> 2. No mount.nfs processes are running when the ls command hangs
> >>     on the local mount.
> >>
> >>
> >> Ian Kent wrote:
> >>> On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
> >>>> Thanks for your time guys!
> >>>>
> >>>> It turns out it's not a general bug.  I was just applying
> >>>> a pessimistic interpretation without checking.
> >>>>
> >>>> After more testing, it turns out it's just problem
> >>>> on one machine, for two mount points under one
> >>>> host alias.
> >>>>
> >>>> I'm interested in knowing more about how I could
> >>>> debug the situation without just rebooting and
> >>>> hoping it doesn't happen again.
> >>>>
> >>>> Here's my scenario:
> >>>>
> >>>> I have three exported filesystems.
> >>>> I can use three host aliases to refer to my own machine.
> >>>> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
> >>>>
> >>>> /etc/hosts says:
> >>>> [IP addr]     carabas.sfbay.sun.com carabas
> >>>>
> >>>>
> >>>> All filesystems work under all /net/foo aliases
> >>>> except two of them hang when accessed via one
> >>>> of the host aliases.
> >>>
> >>> Are you sure the address lookup is returning what you think it does?
> >>> What's in /etc/host.conf?
> >>>
> >>>> % showmount -e localhost
> >>>> Export list for localhost:
> >>>> /export/home3 *
> >>>> /export/home2 *
> >>>> /export/home1 *
> >>>>
> >>>>
> >>>> % df -kl
> >>>> Filesystem           1K-blocks      Used Available Use% Mounted on
> >>>> /dev/sda2             69575344   3408900  66166444   5% /
> >>>> udev                   8222568       148   8222420   1% /dev
> >>>> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
> >>>> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
> >>>> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
> >>>> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
> >>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
> >>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
> >>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
> >>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
> >>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
> >>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
> >>>>
> >>>>
> >>>> % strace ls /net/carabas/export/home2
> >>>> ...
> >>>> ...
> >>>> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
> >>>> close(4)                                = 0
> >>>> close(3)                                = 0
> >>>> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY<unfinished ...>
> >>>
> >>> What does a ps listing give you when it is blocked?
> >>> Any mount.nfs processes?
> >>>
> >>> That's sign that automount thinks that the mount isn't local and is
> >>> probably trying to mount from a possibly non-existent ip address.
> >>>
> >>>> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
> >>>> It doesn't tell you much, the interesting stuff happens in the automounter.
> >>>>
> >>>> Is there a error log file for the automounter that is enabled by default?
> >>>> Is there an easy recipe for restarting the automounter with debugging output on?
> >>>
> >>> You need to ensure that debug logging is being recorded in syslog.
> >>> Ensure you are sending daemon.* somewhere.
> >>>
> >>> Then
> >>>
> >>> automount -l debug /net
> >>>
> >>> will start the debug logging and
> >>>
> >>> automount -l err /net
> >>>
> >>> will set it back to no logging.
> >>>
> >>> That is depending on the version of autofs you are using.
> >>>
> >>> If you don't see this option in the automount(8) man page you don't have
> >>> this option and all you can do is enable debug logging in the
> >>> configuration.
> >>>
> >>> Ian
> >>>
> >>>
> >>
> >
> >
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-01  2:19             ` Ian Kent
@ 2010-07-07  0:27               ` Chris Quenelle
  2010-07-07  4:22                 ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-07-07  0:27 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs

Ian Kent wrote:
> On Wed, 2010-06-30 at 07:57 -0700, Chris Quenelle wrote:
>> On Wednesday June 30    6:12AM, Ian Kent wrote:
>>> On Tue, 2010-06-29 at 09:14 -0700, Chris Quenelle wrote:
>>>> Ian,
>>>>
>>>> Thanks again for your help.
>>>> I have syslog-ng installed, and I can't figure out how to tell
>>>> if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
>>>> appear anywhere in /etc/syslog-ng/syslog-ng.conf
>>> Well, you need to consult the man page for the configuration file then
>>> don't you.
>> Sorry for sounding lazy there.  I've never used syslog-ng before, and
>> I read the man page for a while without figuring how the old style
>> of names/categories mapped to the new system.  I'll continue to fiddle
>> around and see if I can find out what's going on.
>>
>> Thanks for helping me get started with the debugging output.
>>
>> My autofs map looks like this:
>> % ypcat -k auto.master | grep /net
>> /net -hosts -intr,nosuid,retrans=10,retry=3,nobrowse
>>
>> But I'm not sure that helps.
> 
> Is automount crashing, ie. does it show up in a ps listing after you see
> it hang?


Nope.  The ps command doesn't show any change in the status of the automount
process.

In fact, when I attach strace to the automount process I don't see any
sign of activity when I do an "ls" on the unhealthy path.
It sounds like I would need to look at kernel tracing/debugging if I want
to pursue this further.






> 
>> --chris
>>
>>
>>
>>> For me, when syslog was in use and now with rsyslog, I just add a new
>>> line to send all the facility daemon output to a log file.
>>>
>>> I add the line:
>>>
>>> daemon.*                                /var/log/debug
>>>
>>> to the configuration and then
>>>
>>> touch /var/log/debug
>>>
>>> and finally restart logging service.
>>>
>>> You probably should post your autofs maps as well.
>>>
>>>> But when I grep /var/log/messages I see:
>>>>
>>>> % grep automount messages
>>>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>>>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>>>> Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
>>>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>>>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
>>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
>>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
>>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
>>>>
>>>> I didn't get the get_pkg message until I turned off debug logging.
>>>>
>>>> While logging was set to debug, I tried the hanging ls command again, and
>>>> I also tried listing a different file system that was actually remote.
>>>> I didn't see any log messages for those operations.
>>>>
>>>> Notes:
>>>>
>>>> 1. /etc/host.conf says only:
>>>>        order hosts, bind
>>>>        multi on
>>>>
>>>> 2. No mount.nfs processes are running when the ls command hangs
>>>>     on the local mount.
>>>>
>>>>
>>>> Ian Kent wrote:
>>>>> On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
>>>>>> Thanks for your time guys!
>>>>>>
>>>>>> It turns out it's not a general bug.  I was just applying
>>>>>> a pessimistic interpretation without checking.
>>>>>>
>>>>>> After more testing, it turns out it's just problem
>>>>>> on one machine, for two mount points under one
>>>>>> host alias.
>>>>>>
>>>>>> I'm interested in knowing more about how I could
>>>>>> debug the situation without just rebooting and
>>>>>> hoping it doesn't happen again.
>>>>>>
>>>>>> Here's my scenario:
>>>>>>
>>>>>> I have three exported filesystems.
>>>>>> I can use three host aliases to refer to my own machine.
>>>>>> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
>>>>>>
>>>>>> /etc/hosts says:
>>>>>> [IP addr]     carabas.sfbay.sun.com carabas
>>>>>>
>>>>>>
>>>>>> All filesystems work under all /net/foo aliases
>>>>>> except two of them hang when accessed via one
>>>>>> of the host aliases.
>>>>> Are you sure the address lookup is returning what you think it does?
>>>>> What's in /etc/host.conf?
>>>>>
>>>>>> % showmount -e localhost
>>>>>> Export list for localhost:
>>>>>> /export/home3 *
>>>>>> /export/home2 *
>>>>>> /export/home1 *
>>>>>>
>>>>>>
>>>>>> % df -kl
>>>>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>>>>> /dev/sda2             69575344   3408900  66166444   5% /
>>>>>> udev                   8222568       148   8222420   1% /dev
>>>>>> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
>>>>>> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
>>>>>> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
>>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
>>>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
>>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
>>>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
>>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
>>>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
>>>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
>>>>>>
>>>>>>
>>>>>> % strace ls /net/carabas/export/home2
>>>>>> ...
>>>>>> ...
>>>>>> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
>>>>>> close(4)                                = 0
>>>>>> close(3)                                = 0
>>>>>> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY<unfinished ...>
>>>>> What does a ps listing give you when it is blocked?
>>>>> Any mount.nfs processes?
>>>>>
>>>>> That's sign that automount thinks that the mount isn't local and is
>>>>> probably trying to mount from a possibly non-existent ip address.
>>>>>
>>>>>> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
>>>>>> It doesn't tell you much, the interesting stuff happens in the automounter.
>>>>>>
>>>>>> Is there a error log file for the automounter that is enabled by default?
>>>>>> Is there an easy recipe for restarting the automounter with debugging output on?
>>>>> You need to ensure that debug logging is being recorded in syslog.
>>>>> Ensure you are sending daemon.* somewhere.
>>>>>
>>>>> Then
>>>>>
>>>>> automount -l debug /net
>>>>>
>>>>> will start the debug logging and
>>>>>
>>>>> automount -l err /net
>>>>>
>>>>> will set it back to no logging.
>>>>>
>>>>> That is depending on the version of autofs you are using.
>>>>>
>>>>> If you don't see this option in the automount(8) man page you don't have
>>>>> this option and all you can do is enable debug logging in the
>>>>> configuration.
>>>>>
>>>>> Ian
>>>>>
>>>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-07  0:27               ` Chris Quenelle
@ 2010-07-07  4:22                 ` Ian Kent
  2010-07-09 22:04                   ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-07-07  4:22 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs

On Tue, 2010-07-06 at 17:27 -0700, Chris Quenelle wrote:
> Ian Kent wrote:
> > On Wed, 2010-06-30 at 07:57 -0700, Chris Quenelle wrote:
> >> On Wednesday June 30    6:12AM, Ian Kent wrote:
> >>> On Tue, 2010-06-29 at 09:14 -0700, Chris Quenelle wrote:
> >>>> Ian,
> >>>>
> >>>> Thanks again for your help.
> >>>> I have syslog-ng installed, and I can't figure out how to tell
> >>>> if "daemon.*" is being sent somewhere or not.  "daemon" doesn't
> >>>> appear anywhere in /etc/syslog-ng/syslog-ng.conf
> >>> Well, you need to consult the man page for the configuration file then
> >>> don't you.
> >> Sorry for sounding lazy there.  I've never used syslog-ng before, and
> >> I read the man page for a while without figuring how the old style
> >> of names/categories mapped to the new system.  I'll continue to fiddle
> >> around and see if I can find out what's going on.
> >>
> >> Thanks for helping me get started with the debugging output.
> >>
> >> My autofs map looks like this:
> >> % ypcat -k auto.master | grep /net
> >> /net -hosts -intr,nosuid,retrans=10,retry=3,nobrowse
> >>
> >> But I'm not sure that helps.
> > 
> > Is automount crashing, ie. does it show up in a ps listing after you see
> > it hang?
> 
> 
> Nope.  The ps command doesn't show any change in the status of the automount
> process.
> 
> In fact, when I attach strace to the automount process I don't see any
> sign of activity when I do an "ls" on the unhealthy path.
> It sounds like I would need to look at kernel tracing/debugging if I want
> to pursue this further.

strace output is often not very useful.

If you think there is some sort of deadlock going on get a sysreq-t dump
to syslog. We still haven't seen a debug log?

> 
> 
> 
> 
> 
> 
> > 
> >> --chris
> >>
> >>
> >>
> >>> For me, when syslog was in use and now with rsyslog, I just add a new
> >>> line to send all the facility daemon output to a log file.
> >>>
> >>> I add the line:
> >>>
> >>> daemon.*                                /var/log/debug
> >>>
> >>> to the configuration and then
> >>>
> >>> touch /var/log/debug
> >>>
> >>> and finally restart logging service.
> >>>
> >>> You probably should post your autofs maps as well.
> >>>
> >>>> But when I grep /var/log/messages I see:
> >>>>
> >>>> % grep automount messages
> >>>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >>>> Jun 29 02:41:14 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >>>> Jun 29 03:17:45 carabas automount[11786]: lookup_mount: lookup(yp): key "opt" not found in map
> >>>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >>>> Jun 29 03:24:27 carabas automount[11786]: lookup_mount: lookup(yp): key "Codemgr_wsdata" not found in map
> >>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> >>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> >>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> >>>>
> >>>> I didn't get the get_pkg message until I turned off debug logging.
> >>>>
> >>>> While logging was set to debug, I tried the hanging ls command again, and
> >>>> I also tried listing a different file system that was actually remote.
> >>>> I didn't see any log messages for those operations.
> >>>>
> >>>> Notes:
> >>>>
> >>>> 1. /etc/host.conf says only:
> >>>>        order hosts, bind
> >>>>        multi on
> >>>>
> >>>> 2. No mount.nfs processes are running when the ls command hangs
> >>>>     on the local mount.
> >>>>
> >>>>
> >>>> Ian Kent wrote:
> >>>>> On Wed, 2010-06-23 at 11:32 -0700, Chris Quenelle wrote:
> >>>>>> Thanks for your time guys!
> >>>>>>
> >>>>>> It turns out it's not a general bug.  I was just applying
> >>>>>> a pessimistic interpretation without checking.
> >>>>>>
> >>>>>> After more testing, it turns out it's just problem
> >>>>>> on one machine, for two mount points under one
> >>>>>> host alias.
> >>>>>>
> >>>>>> I'm interested in knowing more about how I could
> >>>>>> debug the situation without just rebooting and
> >>>>>> hoping it doesn't happen again.
> >>>>>>
> >>>>>> Here's my scenario:
> >>>>>>
> >>>>>> I have three exported filesystems.
> >>>>>> I can use three host aliases to refer to my own machine.
> >>>>>> (carabas, carabas.sfbay, carabas.sfbay.sun.com)
> >>>>>>
> >>>>>> /etc/hosts says:
> >>>>>> [IP addr]     carabas.sfbay.sun.com carabas
> >>>>>>
> >>>>>>
> >>>>>> All filesystems work under all /net/foo aliases
> >>>>>> except two of them hang when accessed via one
> >>>>>> of the host aliases.
> >>>>> Are you sure the address lookup is returning what you think it does?
> >>>>> What's in /etc/host.conf?
> >>>>>
> >>>>>> % showmount -e localhost
> >>>>>> Export list for localhost:
> >>>>>> /export/home3 *
> >>>>>> /export/home2 *
> >>>>>> /export/home1 *
> >>>>>>
> >>>>>>
> >>>>>> % df -kl
> >>>>>> Filesystem           1K-blocks      Used Available Use% Mounted on
> >>>>>> /dev/sda2             69575344   3408900  66166444   5% /
> >>>>>> udev                   8222568       148   8222420   1% /dev
> >>>>>> /dev/sdb1             71671728  26507132  45164596  37% /export/home1
> >>>>>> /dev/sdc1             71671728   8545196  63126532  12% /export/home2
> >>>>>> /dev/sdd1             71671728   5375672  66296056   8% /export/home3
> >>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas/export/home1
> >>>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay/export/home2
> >>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay/export/home1
> >>>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay/export/home3
> >>>>>> /export/home1         71671728  26507128  45164600  37% /net/carabas.sfbay.sun.com/export/home1
> >>>>>> /export/home2         71671728   8545196  63126532  12% /net/carabas.sfbay.sun.com/export/home2
> >>>>>> /export/home3         71671728   5375672  66296056   8% /net/carabas.sfbay.sun.com/export/home3
> >>>>>>
> >>>>>>
> >>>>>> % strace ls /net/carabas/export/home2
> >>>>>> ...
> >>>>>> ...
> >>>>>> mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2b2388370000
> >>>>>> close(4)                                = 0
> >>>>>> close(3)                                = 0
> >>>>>> open("/net/carabas/export/home2", O_RDONLY|O_NONBLOCK|O_DIRECTORY<unfinished ...>
> >>>>> What does a ps listing give you when it is blocked?
> >>>>> Any mount.nfs processes?
> >>>>>
> >>>>> That's sign that automount thinks that the mount isn't local and is
> >>>>> probably trying to mount from a possibly non-existent ip address.
> >>>>>
> >>>>>> I hit ctrl-C when the ls command hangs.  Above is the tail of the strace output.
> >>>>>> It doesn't tell you much, the interesting stuff happens in the automounter.
> >>>>>>
> >>>>>> Is there a error log file for the automounter that is enabled by default?
> >>>>>> Is there an easy recipe for restarting the automounter with debugging output on?
> >>>>> You need to ensure that debug logging is being recorded in syslog.
> >>>>> Ensure you are sending daemon.* somewhere.
> >>>>>
> >>>>> Then
> >>>>>
> >>>>> automount -l debug /net
> >>>>>
> >>>>> will start the debug logging and
> >>>>>
> >>>>> automount -l err /net
> >>>>>
> >>>>> will set it back to no logging.
> >>>>>
> >>>>> That is depending on the version of autofs you are using.
> >>>>>
> >>>>> If you don't see this option in the automount(8) man page you don't have
> >>>>> this option and all you can do is enable debug logging in the
> >>>>> configuration.
> >>>>>
> >>>>> Ian
> >>>>>
> >>>>>
> >>>
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-07  4:22                 ` Ian Kent
@ 2010-07-09 22:04                   ` Chris Quenelle
  2010-07-12  2:53                     ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-07-09 22:04 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Frank Thommen

Ian Kent wrote:

> strace output is often not very useful.
> 
> If you think there is some sort of deadlock going on get a sysreq-t dump
> to syslog. We still haven't seen a debug log?

I've had reports that my emails are being delayed when they go out to the list.
If anyone is following along and you'd like me to add you to my cc:
lines so you get the email directly, let me know, and I'll do that.

I'm getting close to my limits of what this problem is worth to me.
I suspect the two broken paths will get unwedged if I reboot the system.
But I'd love to know how to prevent it from happening again.

I saw these lines in /var/log/messages:

> >>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> >>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> >>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net

Does that mean that all debugging output from automount should be
going to that file?  Or could the debug output still be going someplace
else (or into /dev/null?) In between the first line of that log output and
the last line, I provoked a correctly functioning automount of
a local file system, and I also tried to access the "broken" path
to the local filesystem.

So that in combination with strace/automount not giving any output
when I access the broken path, makes me think the control path
is not getting out of the kernel.

Can you point me to an explanation of what a "sysreq-t dump" is and
how to get it?  I don't have access to the console of this machine,
hopefully it's something I can do from a root term window.

To summarize my problem, I have a test set of paths to access a local
filesystem, 7 work and 2 don't.

/net/carabas/export/home1
/net/carabas/export/home2    <-- fails
/net/carabas/export/home3    <-- fails
/net/carabas.sfbay/export/home1
/net/carabas.sfbay/export/home2
/net/carabas.sfbay/export/home3
/net/carabas.sfbay.sun.com/export/home1
/net/carabas.sfbay.sun.com/export/home2
/net/carabas.sfbay.sun.com/export/home3


I don't see anythign suspicious in the output of:
showmount
df
/etc/host.conf
strace automount
automount -l debug /net




--chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-09 22:04                   ` Chris Quenelle
@ 2010-07-12  2:53                     ` Ian Kent
  2010-07-15 20:08                       ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-07-12  2:53 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs, Frank Thommen

On Fri, 2010-07-09 at 15:04 -0700, Chris Quenelle wrote:
> Ian Kent wrote:
> 
> > strace output is often not very useful.
> > 
> > If you think there is some sort of deadlock going on get a sysreq-t dump
> > to syslog. We still haven't seen a debug log?
> 
> I've had reports that my emails are being delayed when they go out to the list.
> If anyone is following along and you'd like me to add you to my cc:
> lines so you get the email directly, let me know, and I'll do that.

That's going to happen if you post to a subscribers only list without
subscribing to it.

> 
> I'm getting close to my limits of what this problem is worth to me.

And yet you haven't really provided the information requested?

I don't remember but did we get the distribution and autofs version your
using?

> I suspect the two broken paths will get unwedged if I reboot the system.
> But I'd love to know how to prevent it from happening again.
> 
> I saw these lines in /var/log/messages:
> 
> > >>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> > >>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> > >>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> 
> Does that mean that all debugging output from automount should be
> going to that file?  Or could the debug output still be going someplace
> else (or into /dev/null?) In between the first line of that log output and
> the last line, I provoked a correctly functioning automount of
> a local file system, and I also tried to access the "broken" path
> to the local filesystem.

What file, I don't understand what you mean?

But you don't mention what you have done to tell syslog to actually send
"all" facility daemon messages to the syslog.

Try having a look at Jeffs page http://people.redhat.com/jmoyer for a
description debug logging setup.

> 
> So that in combination with strace/automount not giving any output
> when I access the broken path, makes me think the control path
> is not getting out of the kernel.

Maybe.

> 
> Can you point me to an explanation of what a "sysreq-t dump" is and
> how to get it?  I don't have access to the console of this machine,
> hopefully it's something I can do from a root term window.

Wherever your distribution's has kernel documentation (or a package that
contains the documentation) look at Documentation/sysrq.txt.

Often, you will find you can:

echo "t" > /proc/sysrq-trigger

to get a trace dump, which is what I'm asking for.

> 
> To summarize my problem, I have a test set of paths to access a local
> filesystem, 7 work and 2 don't.
> 
> /net/carabas/export/home1
> /net/carabas/export/home2    <-- fails
> /net/carabas/export/home3    <-- fails
> /net/carabas.sfbay/export/home1
> /net/carabas.sfbay/export/home2
> /net/carabas.sfbay/export/home3
> /net/carabas.sfbay.sun.com/export/home1
> /net/carabas.sfbay.sun.com/export/home2
> /net/carabas.sfbay.sun.com/export/home3
> 
> 
> I don't see anythign suspicious in the output of:
> showmount
> df
> /etc/host.conf
> strace automount
> automount -l debug /net
> 
> 
> 
> 
> --chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-12  2:53                     ` Ian Kent
@ 2010-07-15 20:08                       ` Chris Quenelle
  2010-07-16  7:28                         ` Ian Kent
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Quenelle @ 2010-07-15 20:08 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Frank Thommen

I went through this thread and collected all the information
in a problem description.  I also included sysrq dump
output before and during the problem.  It's 300k.  I can
send it to the list in email if you prefer.  For now
it's available here:

http://quenelle.org/unix/wp-content/uploads/2010/07/linux-log.txt

Again, I want to thank you guys for your time.  I've learned a lot.

From the dump output I can see that there is one additional
"automount" thread when the problem is happening.  I think
the new one has the number 5603.  But that number seems to be
in the "father" column, not the "pid" column.  I'm not sure
what that means.

automount     S 0000555555686e00     0  5603      1                4054 (NOTLB)
ffff810366a07e88 0000000000000086 0000000005f5e100 000000000000000a
       ffff810417dc62d8 ffff810417dc6080 ffff810001033700 001082fb301a703a
       0000000000000653 0000000001037030
Call Trace: <ffffffff8014a06b>{enqueue_hrtimer+90} <ffffffff802ea159>{schedule_hrtimer+41}
       <ffffffff8014a5af>{hrtimer_nanosleep+130} <ffffffff8014a6a5>{sys_nanosleep+76}
       <ffffffff8010ae42>{system_call+126}

Anyway, the full dumps are included in the log I pointed at above.

--chris



Ian Kent wrote:
> On Fri, 2010-07-09 at 15:04 -0700, Chris Quenelle wrote:
>> Ian Kent wrote:
>>
>>> strace output is often not very useful.
>>>
>>> If you think there is some sort of deadlock going on get a sysreq-t dump
>>> to syslog. We still haven't seen a debug log?
>> I've had reports that my emails are being delayed when they go out to the list.
>> If anyone is following along and you'd like me to add you to my cc:
>> lines so you get the email directly, let me know, and I'll do that.
> 
> That's going to happen if you post to a subscribers only list without
> subscribing to it.
> 
>> I'm getting close to my limits of what this problem is worth to me.
> 
> And yet you haven't really provided the information requested?
> 
> I don't remember but did we get the distribution and autofs version your
> using?
> 
>> I suspect the two broken paths will get unwedged if I reboot the system.
>> But I'd love to know how to prevent it from happening again.
>>
>> I saw these lines in /var/log/messages:
>>
>>>>>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
>>>>>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
>>>>>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
>> Does that mean that all debugging output from automount should be
>> going to that file?  Or could the debug output still be going someplace
>> else (or into /dev/null?) In between the first line of that log output and
>> the last line, I provoked a correctly functioning automount of
>> a local file system, and I also tried to access the "broken" path
>> to the local filesystem.
> 
> What file, I don't understand what you mean?
> 
> But you don't mention what you have done to tell syslog to actually send
> "all" facility daemon messages to the syslog.
> 
> Try having a look at Jeffs page http://people.redhat.com/jmoyer for a
> description debug logging setup.
> 
>> So that in combination with strace/automount not giving any output
>> when I access the broken path, makes me think the control path
>> is not getting out of the kernel.
> 
> Maybe.
> 
>> Can you point me to an explanation of what a "sysreq-t dump" is and
>> how to get it?  I don't have access to the console of this machine,
>> hopefully it's something I can do from a root term window.
> 
> Wherever your distribution's has kernel documentation (or a package that
> contains the documentation) look at Documentation/sysrq.txt.
> 
> Often, you will find you can:
> 
> echo "t" > /proc/sysrq-trigger
> 
> to get a trace dump, which is what I'm asking for.
> 
>> To summarize my problem, I have a test set of paths to access a local
>> filesystem, 7 work and 2 don't.
>>
>> /net/carabas/export/home1
>> /net/carabas/export/home2    <-- fails
>> /net/carabas/export/home3    <-- fails
>> /net/carabas.sfbay/export/home1
>> /net/carabas.sfbay/export/home2
>> /net/carabas.sfbay/export/home3
>> /net/carabas.sfbay.sun.com/export/home1
>> /net/carabas.sfbay.sun.com/export/home2
>> /net/carabas.sfbay.sun.com/export/home3
>>
>>
>> I don't see anythign suspicious in the output of:
>> showmount
>> df
>> /etc/host.conf
>> strace automount
>> automount -l debug /net
>>
>>
>>
>>
>> --chris
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-15 20:08                       ` Chris Quenelle
@ 2010-07-16  7:28                         ` Ian Kent
       [not found]                           ` <4C47428A.809@oracle.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-07-16  7:28 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs, Frank Thommen

On Thu, 2010-07-15 at 13:08 -0700, Chris Quenelle wrote:
> I went through this thread and collected all the information
> in a problem description.  I also included sysrq dump
> output before and during the problem.  It's 300k.  I can
> send it to the list in email if you prefer.  For now
> it's available here:
> 
> http://quenelle.org/unix/wp-content/uploads/2010/07/linux-log.txt

This doesn't look like a deadlock in the kernel.

We still need a full debug log, which would have been useful to relate
to the srsrq-t dump.

You might be seeing a thread create synchronization problem. I've fixed
some problems in that area since 5.0.2 (but then we don't know what
patches the SuSE folks have applied). Information about that possibility
can be obtained by getting a gdb backtrace of the main automount
process. This isn't much use unless debug symbols are available. In
Fedora we have debuginfo packages that correspond to each package. They
can be installed along with the package so that gdb has access to the
program symbols.

In any case once the debug symbols are available you can use:

gdb -p <automount pid> /usr/sbin/automount
gdb> thr a a bt

(assuming automount is actuall in /usr/sbin) and capture the output of
this so we can see what the automount threads are doing, or not doing,
as the case may be.
 
> 
> Again, I want to thank you guys for your time.  I've learned a lot.
> 
> From the dump output I can see that there is one additional
> "automount" thread when the problem is happening.  I think
> the new one has the number 5603.  But that number seems to be
> in the "father" column, not the "pid" column.  I'm not sure
> what that means.
> 
> automount     S 0000555555686e00     0  5603      1                4054 (NOTLB)
> ffff810366a07e88 0000000000000086 0000000005f5e100 000000000000000a
>        ffff810417dc62d8 ffff810417dc6080 ffff810001033700 001082fb301a703a
>        0000000000000653 0000000001037030
> Call Trace: <ffffffff8014a06b>{enqueue_hrtimer+90} <ffffffff802ea159>{schedule_hrtimer+41}
>        <ffffffff8014a5af>{hrtimer_nanosleep+130} <ffffffff8014a6a5>{sys_nanosleep+76}
>        <ffffffff8010ae42>{system_call+126}
> 
> Anyway, the full dumps are included in the log I pointed at above.
> 
> --chris
> 
> 
> 
> Ian Kent wrote:
> > On Fri, 2010-07-09 at 15:04 -0700, Chris Quenelle wrote:
> >> Ian Kent wrote:
> >>
> >>> strace output is often not very useful.
> >>>
> >>> If you think there is some sort of deadlock going on get a sysreq-t dump
> >>> to syslog. We still haven't seen a debug log?
> >> I've had reports that my emails are being delayed when they go out to the list.
> >> If anyone is following along and you'd like me to add you to my cc:
> >> lines so you get the email directly, let me know, and I'll do that.
> > 
> > That's going to happen if you post to a subscribers only list without
> > subscribing to it.
> > 
> >> I'm getting close to my limits of what this problem is worth to me.
> > 
> > And yet you haven't really provided the information requested?
> > 
> > I don't remember but did we get the distribution and autofs version your
> > using?
> > 
> >> I suspect the two broken paths will get unwedged if I reboot the system.
> >> But I'd love to know how to prevent it from happening again.
> >>
> >> I saw these lines in /var/log/messages:
> >>
> >>>>>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> >>>>>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> >>>>>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> >> Does that mean that all debugging output from automount should be
> >> going to that file?  Or could the debug output still be going someplace
> >> else (or into /dev/null?) In between the first line of that log output and
> >> the last line, I provoked a correctly functioning automount of
> >> a local file system, and I also tried to access the "broken" path
> >> to the local filesystem.
> > 
> > What file, I don't understand what you mean?
> > 
> > But you don't mention what you have done to tell syslog to actually send
> > "all" facility daemon messages to the syslog.
> > 
> > Try having a look at Jeffs page http://people.redhat.com/jmoyer for a
> > description debug logging setup.
> > 
> >> So that in combination with strace/automount not giving any output
> >> when I access the broken path, makes me think the control path
> >> is not getting out of the kernel.
> > 
> > Maybe.
> > 
> >> Can you point me to an explanation of what a "sysreq-t dump" is and
> >> how to get it?  I don't have access to the console of this machine,
> >> hopefully it's something I can do from a root term window.
> > 
> > Wherever your distribution's has kernel documentation (or a package that
> > contains the documentation) look at Documentation/sysrq.txt.
> > 
> > Often, you will find you can:
> > 
> > echo "t" > /proc/sysrq-trigger
> > 
> > to get a trace dump, which is what I'm asking for.
> > 
> >> To summarize my problem, I have a test set of paths to access a local
> >> filesystem, 7 work and 2 don't.
> >>
> >> /net/carabas/export/home1
> >> /net/carabas/export/home2    <-- fails
> >> /net/carabas/export/home3    <-- fails
> >> /net/carabas.sfbay/export/home1
> >> /net/carabas.sfbay/export/home2
> >> /net/carabas.sfbay/export/home3
> >> /net/carabas.sfbay.sun.com/export/home1
> >> /net/carabas.sfbay.sun.com/export/home2
> >> /net/carabas.sfbay.sun.com/export/home3
> >>
> >>
> >> I don't see anythign suspicious in the output of:
> >> showmount
> >> df
> >> /etc/host.conf
> >> strace automount
> >> automount -l debug /net
> >>
> >>
> >>
> >>
> >> --chris
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
       [not found]                           ` <4C47428A.809@oracle.com>
@ 2010-07-22  3:45                             ` Ian Kent
  2010-07-23 21:22                               ` Chris Quenelle
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kent @ 2010-07-22  3:45 UTC (permalink / raw)
  To: Chris Quenelle; +Cc: autofs, Frank Thommen

On Wed, 2010-07-21 at 11:55 -0700, Chris Quenelle wrote:
> Ian Kent wrote:
> > On Thu, 2010-07-15 at 13:08 -0700, Chris Quenelle wrote:
> >> I went through this thread and collected all the information
> >> in a problem description.  I also included sysrq dump
> >> output before and during the problem.  It's 300k.  I can
> >> send it to the list in email if you prefer.  For now
> >> it's available here:
> >>
> >> http://quenelle.org/unix/wp-content/uploads/2010/07/linux-log.txt
> > 
> > This doesn't look like a deadlock in the kernel.
> > 
> > We still need a full debug log, which would have been useful to relate
> > to the srsrq-t dump.
> 
> There is no debug output when I access the problematic path.
> I verified that I'm actually getting all the debug output
> by accessing an unmounted /net location on another host.
> This surprised me.

I understand that may be the case but a full debug log is usually the
staring point for automount debugging. A full debug log means, a debug
log from the start of running automount (with the syslog facility
daemon.* being recorded) until the problem occurs, with autofs in a
clean state at the start. Most people don't get the "in a clean state at
the start" so don't worry too much about that, but if we can't see
anything in the log then I start talking about the state of autofs when
it was started and move on from there.

The whole point is that, often times a problem is seen that happened
quite some time before, and sometimes there are messages in the log that
help focus efforts, which leads to a solution.

But even more importantly, with such an old version, we might have seen
the problem before and the problem signature in the log might just "turn
on that light". Of course, we may have something different or a previous
problem with slightly different symptoms.

This is so important I'll say it again, a full debug log is pretty much
always the starting point in trying to resolve autofs problems.

> 
> I'm not really in a position to get debug symbols right now.

That's a bummer.

> 
> You've done two important things for me so far.
> 
> 1. You've showed me there is a place I can get help with my
> autofs problems.

Sure. But your distribution autofs maintainer should be able to go
through the debugging exercise with you and bring his view to the list.
That is actually important because the maintainer should be keeping an
eye on what is happening and what new patches are appearing on
kernel.org and be aware of the patches they have applied in their
distribution version. This last point is the most important as I have no
idea what the SuSE folks have applied to the package and I really
shouldn't have to go through the exercise of finding out, although I
have done so from time to time.

Oh ... I can't resist ... a shameless plug.

If your a heavy autofs user the best distribution to use is RedHat
Enterpise Linux because I am up with what is happening with autofs
(obviously).
  
> 
> 2. You've helped me come up to speed on Linux/automount debugging
> so that I can do an initial evaluation for any future problems
> I run into.

I'm sure you will pick up a bit more too as time passes.

> 
> These things are very useful to me, even if I didn't get to
> root of my problem.  When I get more time I will look for
> debug symbol packages for SUSE, and/or I'll try building
> automount from source with debugging enabled.

Yes, it is a bit hard, since a lot has happened since 5.2.

> 
> --chris
> 
> 
> 
> 
> > 
> > You might be seeing a thread create synchronization problem. I've fixed
> > some problems in that area since 5.0.2 (but then we don't know what
> > patches the SuSE folks have applied). Information about that possibility
> > can be obtained by getting a gdb backtrace of the main automount
> > process. This isn't much use unless debug symbols are available. In
> > Fedora we have debuginfo packages that correspond to each package. They
> > can be installed along with the package so that gdb has access to the
> > program symbols.
> > 
> > In any case once the debug symbols are available you can use:
> > 
> > gdb -p <automount pid> /usr/sbin/automount
> > gdb> thr a a bt
> > 
> > (assuming automount is actuall in /usr/sbin) and capture the output of
> > this so we can see what the automount threads are doing, or not doing,
> > as the case may be.
> >  
> >> Again, I want to thank you guys for your time.  I've learned a lot.
> >>
> >> From the dump output I can see that there is one additional
> >> "automount" thread when the problem is happening.  I think
> >> the new one has the number 5603.  But that number seems to be
> >> in the "father" column, not the "pid" column.  I'm not sure
> >> what that means.
> >>
> >> automount     S 0000555555686e00     0  5603      1                4054 (NOTLB)
> >> ffff810366a07e88 0000000000000086 0000000005f5e100 000000000000000a
> >>        ffff810417dc62d8 ffff810417dc6080 ffff810001033700 001082fb301a703a
> >>        0000000000000653 0000000001037030
> >> Call Trace: <ffffffff8014a06b>{enqueue_hrtimer+90} <ffffffff802ea159>{schedule_hrtimer+41}
> >>        <ffffffff8014a5af>{hrtimer_nanosleep+130} <ffffffff8014a6a5>{sys_nanosleep+76}
> >>        <ffffffff8010ae42>{system_call+126}
> >>
> >> Anyway, the full dumps are included in the log I pointed at above.
> >>
> >> --chris
> >>
> >>
> >>
> >> Ian Kent wrote:
> >>> On Fri, 2010-07-09 at 15:04 -0700, Chris Quenelle wrote:
> >>>> Ian Kent wrote:
> >>>>
> >>>>> strace output is often not very useful.
> >>>>>
> >>>>> If you think there is some sort of deadlock going on get a sysreq-t dump
> >>>>> to syslog. We still haven't seen a debug log?
> >>>> I've had reports that my emails are being delayed when they go out to the list.
> >>>> If anyone is following along and you'd like me to add you to my cc:
> >>>> lines so you get the email directly, let me know, and I'll do that.
> >>> That's going to happen if you post to a subscribers only list without
> >>> subscribing to it.
> >>>
> >>>> I'm getting close to my limits of what this problem is worth to me.
> >>> And yet you haven't really provided the information requested?
> >>>
> >>> I don't remember but did we get the distribution and autofs version your
> >>> using?
> >>>
> >>>> I suspect the two broken paths will get unwedged if I reboot the system.
> >>>> But I'd love to know how to prevent it from happening again.
> >>>>
> >>>> I saw these lines in /var/log/messages:
> >>>>
> >>>>>>>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
> >>>>>>>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
> >>>>>>>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
> >>>> Does that mean that all debugging output from automount should be
> >>>> going to that file?  Or could the debug output still be going someplace
> >>>> else (or into /dev/null?) In between the first line of that log output and
> >>>> the last line, I provoked a correctly functioning automount of
> >>>> a local file system, and I also tried to access the "broken" path
> >>>> to the local filesystem.
> >>> What file, I don't understand what you mean?
> >>>
> >>> But you don't mention what you have done to tell syslog to actually send
> >>> "all" facility daemon messages to the syslog.
> >>>
> >>> Try having a look at Jeffs page http://people.redhat.com/jmoyer for a
> >>> description debug logging setup.
> >>>
> >>>> So that in combination with strace/automount not giving any output
> >>>> when I access the broken path, makes me think the control path
> >>>> is not getting out of the kernel.
> >>> Maybe.
> >>>
> >>>> Can you point me to an explanation of what a "sysreq-t dump" is and
> >>>> how to get it?  I don't have access to the console of this machine,
> >>>> hopefully it's something I can do from a root term window.
> >>> Wherever your distribution's has kernel documentation (or a package that
> >>> contains the documentation) look at Documentation/sysrq.txt.
> >>>
> >>> Often, you will find you can:
> >>>
> >>> echo "t" > /proc/sysrq-trigger
> >>>
> >>> to get a trace dump, which is what I'm asking for.
> >>>
> >>>> To summarize my problem, I have a test set of paths to access a local
> >>>> filesystem, 7 work and 2 don't.
> >>>>
> >>>> /net/carabas/export/home1
> >>>> /net/carabas/export/home2    <-- fails
> >>>> /net/carabas/export/home3    <-- fails
> >>>> /net/carabas.sfbay/export/home1
> >>>> /net/carabas.sfbay/export/home2
> >>>> /net/carabas.sfbay/export/home3
> >>>> /net/carabas.sfbay.sun.com/export/home1
> >>>> /net/carabas.sfbay.sun.com/export/home2
> >>>> /net/carabas.sfbay.sun.com/export/home3
> >>>>
> >>>>
> >>>> I don't see anythign suspicious in the output of:
> >>>> showmount
> >>>> df
> >>>> /etc/host.conf
> >>>> strace automount
> >>>> automount -l debug /net
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --chris
> >>>
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: two /net paths to the same local mount?
  2010-07-22  3:45                             ` Ian Kent
@ 2010-07-23 21:22                               ` Chris Quenelle
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Quenelle @ 2010-07-23 21:22 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, Frank Thommen

Ian Kent wrote:
> On Wed, 2010-07-21 at 11:55 -0700, Chris Quenelle wrote:
>> Ian Kent wrote:
>>> On Thu, 2010-07-15 at 13:08 -0700, Chris Quenelle wrote:
>>>> I went through this thread and collected all the information
>>>> in a problem description.  I also included sysrq dump
>>>> output before and during the problem.  It's 300k.  I can
>>>> send it to the list in email if you prefer.  For now
>>>> it's available here:
>>>>
>>>> http://quenelle.org/unix/wp-content/uploads/2010/07/linux-log.txt
>>> This doesn't look like a deadlock in the kernel.
>>>
>>> We still need a full debug log, which would have been useful to relate
>>> to the srsrq-t dump.
>> There is no debug output when I access the problematic path.
>> I verified that I'm actually getting all the debug output
>> by accessing an unmounted /net location on another host.
>> This surprised me.
> 
> I understand that may be the case but a full debug log is usually the
> staring point for automount debugging. A full debug log means, a debug
> log from the start of running automount (with the syslog facility
> daemon.* being recorded) until the problem occurs, with autofs in a
> clean state at the start. Most people don't get the "in a clean state at
> the start" so don't worry too much about that, but if we can't see
> anything in the log then I start talking about the state of autofs when
> it was started and move on from there.
> 
> The whole point is that, often times a problem is seen that happened
> quite some time before, and sometimes there are messages in the log that
> help focus efforts, which leads to a solution.
> 
> But even more importantly, with such an old version, we might have seen
> the problem before and the problem signature in the log might just "turn
> on that light". Of course, we may have something different or a previous
> problem with slightly different symptoms.
> 
> This is so important I'll say it again, a full debug log is pretty much
> always the starting point in trying to resolve autofs problems.

Point taken about the importance of the full debug log.
If the problem turns out to be reproducible after rebooting
the machine (which I haven't done yet) then I'll post a debug
log from the time of boot up until the problem happens.


> 
>> I'm not really in a position to get debug symbols right now.
> 
> That's a bummer.
> 
>> You've done two important things for me so far.
>>
>> 1. You've showed me there is a place I can get help with my
>> autofs problems.
> 
> Sure. But your distribution autofs maintainer should be able to go
> through the debugging exercise with you and bring his view to the list.
> That is actually important because the maintainer should be keeping an
> eye on what is happening and what new patches are appearing on
> kernel.org and be aware of the patches they have applied in their
> distribution version. This last point is the most important as I have no
> idea what the SuSE folks have applied to the package and I really
> shouldn't have to go through the exercise of finding out, although I
> have done so from time to time.


Okay.

> 
> Oh ... I can't resist ... a shameless plug.
> 
> If your a heavy autofs user the best distribution to use is RedHat
> Enterpise Linux because I am up with what is happening with autofs
> (obviously).

If I have an RHEL system and I want to set it up to automatically
pull down the latest patches, is there a web page describing how
I can do that?  Or the same for SUSE for that matter?
I realize this isn't an automount-specific question.  I'm using
an ocean of poorly updated lab machines with different distros
and versions of Linux.  Any tips for keeping some or all
of them updated with official patches would be great.  Like I said,
I know it's off-topic, but pointers to RTFM would make my life much
easier.



>   
>> 2. You've helped me come up to speed on Linux/automount debugging
>> so that I can do an initial evaluation for any future problems
>> I run into.
> 
> I'm sure you will pick up a bit more too as time passes.
> 
>> These things are very useful to me, even if I didn't get to
>> root of my problem.  When I get more time I will look for
>> debug symbol packages for SUSE, and/or I'll try building
>> automount from source with debugging enabled.
> 
> Yes, it is a bit hard, since a lot has happened since 5.2.
> 
>> --chris
>>
>>
>>
>>
>>> You might be seeing a thread create synchronization problem. I've fixed
>>> some problems in that area since 5.0.2 (but then we don't know what
>>> patches the SuSE folks have applied). Information about that possibility
>>> can be obtained by getting a gdb backtrace of the main automount
>>> process. This isn't much use unless debug symbols are available. In
>>> Fedora we have debuginfo packages that correspond to each package. They
>>> can be installed along with the package so that gdb has access to the
>>> program symbols.
>>>
>>> In any case once the debug symbols are available you can use:
>>>
>>> gdb -p <automount pid> /usr/sbin/automount
>>> gdb> thr a a bt
>>>
>>> (assuming automount is actuall in /usr/sbin) and capture the output of
>>> this so we can see what the automount threads are doing, or not doing,
>>> as the case may be.
>>>  
>>>> Again, I want to thank you guys for your time.  I've learned a lot.
>>>>
>>>> From the dump output I can see that there is one additional
>>>> "automount" thread when the problem is happening.  I think
>>>> the new one has the number 5603.  But that number seems to be
>>>> in the "father" column, not the "pid" column.  I'm not sure
>>>> what that means.
>>>>
>>>> automount     S 0000555555686e00     0  5603      1                4054 (NOTLB)
>>>> ffff810366a07e88 0000000000000086 0000000005f5e100 000000000000000a
>>>>        ffff810417dc62d8 ffff810417dc6080 ffff810001033700 001082fb301a703a
>>>>        0000000000000653 0000000001037030
>>>> Call Trace: <ffffffff8014a06b>{enqueue_hrtimer+90} <ffffffff802ea159>{schedule_hrtimer+41}
>>>>        <ffffffff8014a5af>{hrtimer_nanosleep+130} <ffffffff8014a6a5>{sys_nanosleep+76}
>>>>        <ffffffff8010ae42>{system_call+126}
>>>>
>>>> Anyway, the full dumps are included in the log I pointed at above.
>>>>
>>>> --chris
>>>>
>>>>
>>>>
>>>> Ian Kent wrote:
>>>>> On Fri, 2010-07-09 at 15:04 -0700, Chris Quenelle wrote:
>>>>>> Ian Kent wrote:
>>>>>>
>>>>>>> strace output is often not very useful.
>>>>>>>
>>>>>>> If you think there is some sort of deadlock going on get a sysreq-t dump
>>>>>>> to syslog. We still haven't seen a debug log?
>>>>>> I've had reports that my emails are being delayed when they go out to the list.
>>>>>> If anyone is following along and you'd like me to add you to my cc:
>>>>>> lines so you get the email directly, let me know, and I'll do that.
>>>>> That's going to happen if you post to a subscribers only list without
>>>>> subscribing to it.
>>>>>
>>>>>> I'm getting close to my limits of what this problem is worth to me.
>>>>> And yet you haven't really provided the information requested?
>>>>>
>>>>> I don't remember but did we get the distribution and autofs version your
>>>>> using?
>>>>>
>>>>>> I suspect the two broken paths will get unwedged if I reboot the system.
>>>>>> But I'd love to know how to prevent it from happening again.
>>>>>>
>>>>>> I saw these lines in /var/log/messages:
>>>>>>
>>>>>>>>>>> Jun 29 09:04:46 carabas automount[11786]: Debug logging set for /net
>>>>>>>>>>> Jun 29 09:09:22 carabas automount[11786]: get_pkt: message pending on control fifo.
>>>>>>>>>>> Jun 29 09:09:22 carabas automount[11786]: Basic logging set for /net
>>>>>> Does that mean that all debugging output from automount should be
>>>>>> going to that file?  Or could the debug output still be going someplace
>>>>>> else (or into /dev/null?) In between the first line of that log output and
>>>>>> the last line, I provoked a correctly functioning automount of
>>>>>> a local file system, and I also tried to access the "broken" path
>>>>>> to the local filesystem.
>>>>> What file, I don't understand what you mean?
>>>>>
>>>>> But you don't mention what you have done to tell syslog to actually send
>>>>> "all" facility daemon messages to the syslog.
>>>>>
>>>>> Try having a look at Jeffs page http://people.redhat.com/jmoyer for a
>>>>> description debug logging setup.
>>>>>
>>>>>> So that in combination with strace/automount not giving any output
>>>>>> when I access the broken path, makes me think the control path
>>>>>> is not getting out of the kernel.
>>>>> Maybe.
>>>>>
>>>>>> Can you point me to an explanation of what a "sysreq-t dump" is and
>>>>>> how to get it?  I don't have access to the console of this machine,
>>>>>> hopefully it's something I can do from a root term window.
>>>>> Wherever your distribution's has kernel documentation (or a package that
>>>>> contains the documentation) look at Documentation/sysrq.txt.
>>>>>
>>>>> Often, you will find you can:
>>>>>
>>>>> echo "t" > /proc/sysrq-trigger
>>>>>
>>>>> to get a trace dump, which is what I'm asking for.
>>>>>
>>>>>> To summarize my problem, I have a test set of paths to access a local
>>>>>> filesystem, 7 work and 2 don't.
>>>>>>
>>>>>> /net/carabas/export/home1
>>>>>> /net/carabas/export/home2    <-- fails
>>>>>> /net/carabas/export/home3    <-- fails
>>>>>> /net/carabas.sfbay/export/home1
>>>>>> /net/carabas.sfbay/export/home2
>>>>>> /net/carabas.sfbay/export/home3
>>>>>> /net/carabas.sfbay.sun.com/export/home1
>>>>>> /net/carabas.sfbay.sun.com/export/home2
>>>>>> /net/carabas.sfbay.sun.com/export/home3
>>>>>>
>>>>>>
>>>>>> I don't see anythign suspicious in the output of:
>>>>>> showmount
>>>>>> df
>>>>>> /etc/host.conf
>>>>>> strace automount
>>>>>> automount -l debug /net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --chris
>>>
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-07-23 21:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-21 19:17 two /net paths to the same local mount? Chris Quenelle
2010-06-22  2:05 ` Michael Loftis
2010-06-23 18:32   ` Chris Quenelle
2010-06-24  3:00     ` Ian Kent
2010-06-29 16:14       ` Chris Quenelle
2010-06-30 13:12         ` Ian Kent
2010-06-30 14:57           ` Chris Quenelle
2010-07-01  2:19             ` Ian Kent
2010-07-07  0:27               ` Chris Quenelle
2010-07-07  4:22                 ` Ian Kent
2010-07-09 22:04                   ` Chris Quenelle
2010-07-12  2:53                     ` Ian Kent
2010-07-15 20:08                       ` Chris Quenelle
2010-07-16  7:28                         ` Ian Kent
     [not found]                           ` <4C47428A.809@oracle.com>
2010-07-22  3:45                             ` Ian Kent
2010-07-23 21:22                               ` Chris Quenelle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.