linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: NULL pointer dereference in autofs4_expire_wait
       [not found] <525736C7.9080400@gmail.com>
@ 2013-10-11  2:06 ` Ian Kent
  2013-10-11  9:55   ` Ian Kent
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Kent @ 2013-10-11  2:06 UTC (permalink / raw)
  To: David Ahern; +Cc: autofs, viro, linux-kernel

On Thu, 2013-10-10 at 17:22 -0600, David Ahern wrote:
> Running 3.12-rc3 just hit BUG in autofs4_expire_wait

It doesn't look like this could be due to Al's change to the locking in
autos4_wait() and that the only change to autofs that I'm aware of.

Could you do a bisect please?

> 
> [787422.065405] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000010
> [787422.065567] IP: [<ffffffff812722d8>] autofs4_expire_wait+0x38/0x120
> [787422.065659] PGD 163bdb067 PUD 163bbc067 PMD 0
> [787422.065744] Oops: 0000 [#1] SMP
> [787422.065825] Modules linked in: binfmt_misc nfsv3 rpcsec_gss_krb5 
> nfsv4 dns_resolver nfs fscache bridge stp llc ipt_MASQUERADE xt_nat 
> iptable_nat nf_nat_ipv4 nf_nat xt_physdev nf_conntrack_ipv4 
> nf_defrag_ipv4 xt_state nf_conntrack xt_multiport nfsd lockd nfs_acl 
> auth_rpcgss sunrpc ipmi_si ipmi_msghandler vhost_net iTCO_wdt macvtap 
> macvlan vhost iTCO_vendor_support pcspkr i7core_edac lpc_ich mfd_core 
> tun edac_core bnx2 hpwdt microcode acpi_power_meter oid_registry 
> kvm_intel kvm usb_storage hpsa ttm drm_kms_helper drm i2c_algo_bit i2c_core
> [787422.066557] CPU: 10 PID: 20498 Comm: sed Not tainted 3.12.0-rc3+ #8
> [787422.066640] Hardware name: HP ProLiant DL380 G6, BIOS P62 05/05/2011
> [787422.066722] task: ffff88030e941790 ti: ffff880182a16000 task.ti: 
> ffff880182a16000
> [787422.066872] RIP: 0010:[<ffffffff812722d8>]  [<ffffffff812722d8>] 
> autofs4_expire_wait+0x38/0x120
> [787422.067029] RSP: 0000:ffff880182a17aa8  EFLAGS: 00010246
> [787422.067121] RAX: 00000000b1acb1ac RBX: ffff8802e1056a80 RCX: 
> 0000000000000010
> [787422.067270] RDX: 000000000000b1ac RSI: ffffffff81c3e3e0 RDI: 
> ffff88060e187d98
> [787422.067457] RBP: ffff880182a17ad8 R08: 0000000000000000 R09: 
> ffffffff811a5748
> [787422.067607] R10: ff030306ff030001 R11: ffffffffffffffff R12: 
> ffff88060e187d00
> [787422.067758] R13: 0000000000000000 R14: 0000000000637461 R15: 
> ffff8802e1056a80
> [787422.067909] FS:  0000000000000000(0000) GS:ffff880313ca0000(0000) 
> knlGS:0000000000000000
> [787422.068061] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> [787422.068141] CR2: 0000000000000010 CR3: 000000010f106000 CR4: 
> 00000000000027e0
> [787422.068302] Stack:
> [787422.068414]  ffff880182a17af8 ffffffff810768fe 0000000000000100 
> ffff8802e1056a80
> [787422.068575]  ffff88060e187dc0 ffff88060e187dc0 ffff880182a17b48 
> ffffffff8126f5fc
> [787422.068736]  0000000000000000 ffff880192afb890 ffff8802e1056ab8 
> 0000000392afb890
> [787422.068896] Call Trace:
> [787422.068976]  [<ffffffff810768fe>] ? prepare_to_wait+0x5e/0x90
> [787422.069060]  [<ffffffff8126f5fc>] do_expire_wait+0x17c/0x190
> [787422.069142]  [<ffffffff8126f9a4>] autofs4_d_manage+0xb4/0x170
> [787422.069227]  [<ffffffff8119af4d>] follow_managed+0xcd/0x2c0
> [787422.069323]  [<ffffffff8162a3f3>] lookup_slow+0x7b/0xaa
> [787422.069441]  [<ffffffff8119c4fa>] link_path_walk+0x34a/0x8d0
> [787422.069524]  [<ffffffff811a67d1>] ? dput+0x31/0x1f0
> [787422.069606]  [<ffffffff811aeac9>] ? mntput_no_expire+0x49/0x140
> [787422.069690]  [<ffffffff8119c0bc>] ? path_init+0x30c/0x400
> [787422.069772]  [<ffffffff8119cad8>] path_lookupat+0x58/0x740
> [787422.069856]  [<ffffffff8117a153>] ? kmem_cache_alloc+0x1c3/0x200
> [787422.069939]  [<ffffffff8117a12d>] ? kmem_cache_alloc+0x19d/0x200
> [787422.071815]  [<ffffffff8119d1f4>] filename_lookup+0x34/0xc0
> [787422.071898]  [<ffffffff811a0a59>] user_path_at_empty+0x59/0xa0
> [787422.071981]  [<ffffffff811a0b73>] ? do_filp_open+0x43/0xa0
> [787422.072064]  [<ffffffff811a0ab1>] user_path_at+0x11/0x20
> [787422.072146]  [<ffffffff81195761>] vfs_fstatat+0x51/0xb0
> [787422.072228]  [<ffffffff8119588b>] vfs_stat+0x1b/0x20
> [787422.072311]  [<ffffffff8104d6ca>] sys32_stat64+0x1a/0x40
> [787422.072453]  [<ffffffff8118f74a>] ? do_sys_open+0x1aa/0x220
> [787422.072539]  [<ffffffff8163cf49>] ia32_do_call+0x13/0x13
> [787422.072619] Code: 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 48 8b 
> 47 68 4c 8b 6f 78 4c 8b a0 00 03 00 00 49 8d bc 24 98 00 00 00 e8 78 0d 
> 3c 00 <41> f6 45 10 01 74 61 66 41 83 84 24 98 00 00 00 01 f6 05 52 4a
> [787422.073004] RIP  [<ffffffff812722d8>] autofs4_expire_wait+0x38/0x120
> [787422.073089]  RSP <ffff880182a17aa8>
> [787422.073164] CR2: 0000000000000010
> [787422.073595] ---[ end trace c75e278f6383bf9a ]---



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in autofs4_expire_wait
  2013-10-11  2:06 ` NULL pointer dereference in autofs4_expire_wait Ian Kent
@ 2013-10-11  9:55   ` Ian Kent
  2013-10-11 13:29     ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Kent @ 2013-10-11  9:55 UTC (permalink / raw)
  To: David Ahern; +Cc: autofs, viro, linux-kernel

On Fri, 2013-10-11 at 10:06 +0800, Ian Kent wrote:
> On Thu, 2013-10-10 at 17:22 -0600, David Ahern wrote:
> > Running 3.12-rc3 just hit BUG in autofs4_expire_wait
> 
> It doesn't look like this could be due to Al's change to the locking in
> autos4_wait() and that the only change to autofs that I'm aware of.
> 
> Could you do a bisect please?

Of course that assumes it's repeatable.
Is it?

Can you provide any information about the environment and activity that
was happening at the time of the BUG()?
 
> 
> > 
> > [787422.065405] BUG: unable to handle kernel NULL pointer dereference at 
> > 0000000000000010
> > [787422.065567] IP: [<ffffffff812722d8>] autofs4_expire_wait+0x38/0x120
> > [787422.065659] PGD 163bdb067 PUD 163bbc067 PMD 0
> > [787422.065744] Oops: 0000 [#1] SMP
> > [787422.065825] Modules linked in: binfmt_misc nfsv3 rpcsec_gss_krb5 
> > nfsv4 dns_resolver nfs fscache bridge stp llc ipt_MASQUERADE xt_nat 
> > iptable_nat nf_nat_ipv4 nf_nat xt_physdev nf_conntrack_ipv4 
> > nf_defrag_ipv4 xt_state nf_conntrack xt_multiport nfsd lockd nfs_acl 
> > auth_rpcgss sunrpc ipmi_si ipmi_msghandler vhost_net iTCO_wdt macvtap 
> > macvlan vhost iTCO_vendor_support pcspkr i7core_edac lpc_ich mfd_core 
> > tun edac_core bnx2 hpwdt microcode acpi_power_meter oid_registry 
> > kvm_intel kvm usb_storage hpsa ttm drm_kms_helper drm i2c_algo_bit i2c_core
> > [787422.066557] CPU: 10 PID: 20498 Comm: sed Not tainted 3.12.0-rc3+ #8
> > [787422.066640] Hardware name: HP ProLiant DL380 G6, BIOS P62 05/05/2011
> > [787422.066722] task: ffff88030e941790 ti: ffff880182a16000 task.ti: 
> > ffff880182a16000
> > [787422.066872] RIP: 0010:[<ffffffff812722d8>]  [<ffffffff812722d8>] 
> > autofs4_expire_wait+0x38/0x120
> > [787422.067029] RSP: 0000:ffff880182a17aa8  EFLAGS: 00010246
> > [787422.067121] RAX: 00000000b1acb1ac RBX: ffff8802e1056a80 RCX: 
> > 0000000000000010
> > [787422.067270] RDX: 000000000000b1ac RSI: ffffffff81c3e3e0 RDI: 
> > ffff88060e187d98
> > [787422.067457] RBP: ffff880182a17ad8 R08: 0000000000000000 R09: 
> > ffffffff811a5748
> > [787422.067607] R10: ff030306ff030001 R11: ffffffffffffffff R12: 
> > ffff88060e187d00
> > [787422.067758] R13: 0000000000000000 R14: 0000000000637461 R15: 
> > ffff8802e1056a80
> > [787422.067909] FS:  0000000000000000(0000) GS:ffff880313ca0000(0000) 
> > knlGS:0000000000000000
> > [787422.068061] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> > [787422.068141] CR2: 0000000000000010 CR3: 000000010f106000 CR4: 
> > 00000000000027e0
> > [787422.068302] Stack:
> > [787422.068414]  ffff880182a17af8 ffffffff810768fe 0000000000000100 
> > ffff8802e1056a80
> > [787422.068575]  ffff88060e187dc0 ffff88060e187dc0 ffff880182a17b48 
> > ffffffff8126f5fc
> > [787422.068736]  0000000000000000 ffff880192afb890 ffff8802e1056ab8 
> > 0000000392afb890
> > [787422.068896] Call Trace:
> > [787422.068976]  [<ffffffff810768fe>] ? prepare_to_wait+0x5e/0x90
> > [787422.069060]  [<ffffffff8126f5fc>] do_expire_wait+0x17c/0x190
> > [787422.069142]  [<ffffffff8126f9a4>] autofs4_d_manage+0xb4/0x170
> > [787422.069227]  [<ffffffff8119af4d>] follow_managed+0xcd/0x2c0
> > [787422.069323]  [<ffffffff8162a3f3>] lookup_slow+0x7b/0xaa
> > [787422.069441]  [<ffffffff8119c4fa>] link_path_walk+0x34a/0x8d0
> > [787422.069524]  [<ffffffff811a67d1>] ? dput+0x31/0x1f0
> > [787422.069606]  [<ffffffff811aeac9>] ? mntput_no_expire+0x49/0x140
> > [787422.069690]  [<ffffffff8119c0bc>] ? path_init+0x30c/0x400
> > [787422.069772]  [<ffffffff8119cad8>] path_lookupat+0x58/0x740
> > [787422.069856]  [<ffffffff8117a153>] ? kmem_cache_alloc+0x1c3/0x200
> > [787422.069939]  [<ffffffff8117a12d>] ? kmem_cache_alloc+0x19d/0x200
> > [787422.071815]  [<ffffffff8119d1f4>] filename_lookup+0x34/0xc0
> > [787422.071898]  [<ffffffff811a0a59>] user_path_at_empty+0x59/0xa0
> > [787422.071981]  [<ffffffff811a0b73>] ? do_filp_open+0x43/0xa0
> > [787422.072064]  [<ffffffff811a0ab1>] user_path_at+0x11/0x20
> > [787422.072146]  [<ffffffff81195761>] vfs_fstatat+0x51/0xb0
> > [787422.072228]  [<ffffffff8119588b>] vfs_stat+0x1b/0x20
> > [787422.072311]  [<ffffffff8104d6ca>] sys32_stat64+0x1a/0x40
> > [787422.072453]  [<ffffffff8118f74a>] ? do_sys_open+0x1aa/0x220
> > [787422.072539]  [<ffffffff8163cf49>] ia32_do_call+0x13/0x13
> > [787422.072619] Code: 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 48 8b 
> > 47 68 4c 8b 6f 78 4c 8b a0 00 03 00 00 49 8d bc 24 98 00 00 00 e8 78 0d 
> > 3c 00 <41> f6 45 10 01 74 61 66 41 83 84 24 98 00 00 00 01 f6 05 52 4a
> > [787422.073004] RIP  [<ffffffff812722d8>] autofs4_expire_wait+0x38/0x120
> > [787422.073089]  RSP <ffff880182a17aa8>
> > [787422.073164] CR2: 0000000000000010
> > [787422.073595] ---[ end trace c75e278f6383bf9a ]---
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in autofs4_expire_wait
  2013-10-11  9:55   ` Ian Kent
@ 2013-10-11 13:29     ` David Ahern
  2013-10-12  1:56       ` Ian Kent
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2013-10-11 13:29 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, viro, linux-kernel

On 10/11/13 3:55 AM, Ian Kent wrote:
> On Fri, 2013-10-11 at 10:06 +0800, Ian Kent wrote:
>> On Thu, 2013-10-10 at 17:22 -0600, David Ahern wrote:
>>> Running 3.12-rc3 just hit BUG in autofs4_expire_wait
>>
>> It doesn't look like this could be due to Al's change to the locking in
>> autos4_wait() and that the only change to autofs that I'm aware of.
>>
>> Could you do a bisect please?
>
> Of course that assumes it's repeatable.
> Is it?
>
> Can you provide any information about the environment and activity that
> was happening at the time of the BUG()?

The system was up and running for 9 days before hitting the BUG. After 
that with 3 cpus on softlockup I had to do a reboot (forced). After the 
reboot I continued the workload again without a repeat incident (yet), 
so I am not sure bisect is going to be possible.

This is a corporate environment where practically everything is in an 
automount. Specific to this problem I was repeatedly building a 
workspace in one window, using cscope in another and checking code 
against a different workspace in a third -- all 3 of those were 
different automounts and different NAS servers.

 From objdump on vmlinux the line in question is fs/autofs4/expire.c:465

     if (ino->flags & AUTOFS_INF_EXPIRING) {

I will be continuing the sequence above today (working through compile 
problems for on OS port). I will bump the kernel to top of tree and see 
if it repeats.

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in autofs4_expire_wait
  2013-10-11 13:29     ` David Ahern
@ 2013-10-12  1:56       ` Ian Kent
  2013-10-13 20:13         ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Kent @ 2013-10-12  1:56 UTC (permalink / raw)
  To: David Ahern; +Cc: autofs, viro, linux-kernel

On Fri, 2013-10-11 at 07:29 -0600, David Ahern wrote:
> On 10/11/13 3:55 AM, Ian Kent wrote:
> > On Fri, 2013-10-11 at 10:06 +0800, Ian Kent wrote:
> >> On Thu, 2013-10-10 at 17:22 -0600, David Ahern wrote:
> >>> Running 3.12-rc3 just hit BUG in autofs4_expire_wait
> >>
> >> It doesn't look like this could be due to Al's change to the locking in
> >> autos4_wait() and that the only change to autofs that I'm aware of.
> >>
> >> Could you do a bisect please?
> >
> > Of course that assumes it's repeatable.
> > Is it?
> >
> > Can you provide any information about the environment and activity that
> > was happening at the time of the BUG()?
> 
> The system was up and running for 9 days before hitting the BUG. After 
> that with 3 cpus on softlockup I had to do a reboot (forced). After the 
> reboot I continued the workload again without a repeat incident (yet), 
> so I am not sure bisect is going to be possible.

Yeah, it isn't repeatable.

> 
> This is a corporate environment where practically everything is in an 
> automount. Specific to this problem I was repeatedly building a 
> workspace in one window, using cscope in another and checking code 
> against a different workspace in a third -- all 3 of those were 
> different automounts and different NAS servers.
> 
>  From objdump on vmlinux the line in question is fs/autofs4/expire.c:465
> 
>      if (ino->flags & AUTOFS_INF_EXPIRING) {

Right, there haven't been changes to the autofs kernel code that affect
the reference counting of dentrys so I have to conclude this is being
caused by other changes.

When walking an autofs path, the walk should always be put into refwalk
mode, so the function containing this line should always have a dentry
with a reference held. Which just means that the autofs info struct (ino
here) won't be invalid.

Now ->d_release() (which frees ino) is only called after the dentry
reference count falls to zero and the dentry is going away.

We can't check ino for NULL here because the dentry pointer to it isn't
set to NULL when it's freed in ->d_release(). Setting the dentry field
to NULL is futile because the next thing the VFS does is to free the
dentry itself. Well, it calls RCU to schedule the free anyway.

The fact that ->d_release() has been called makes me think there's a
reference counting problem somewhere in the VFS.

Al, is my thinking correct here?

There were some significant changes to this area of the VFS in 3.11 by
the look of it.

So more history please, had you used 3.11 for an extended amount of
time, before using the 3.12-rc? IOW what's your kernel version use
history please?

Ian



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in autofs4_expire_wait
  2013-10-12  1:56       ` Ian Kent
@ 2013-10-13 20:13         ` David Ahern
  0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2013-10-13 20:13 UTC (permalink / raw)
  To: Ian Kent; +Cc: autofs, viro, linux-kernel

On 10/11/13 7:56 PM, Ian Kent wrote:
> So more history please, had you used 3.11 for an extended amount of
> time, before using the 3.12-rc? IOW what's your kernel version use
> history please?

I think I did have a 3.11 installed, but was not doing a heavy 
nfs/autofs load. That started in the past week.

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-13 20:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <525736C7.9080400@gmail.com>
2013-10-11  2:06 ` NULL pointer dereference in autofs4_expire_wait Ian Kent
2013-10-11  9:55   ` Ian Kent
2013-10-11 13:29     ` David Ahern
2013-10-12  1:56       ` Ian Kent
2013-10-13 20:13         ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).