* RE: NFS hit me (2.4.9-xfs) again
@ 2001-11-09 19:31 Christian, Chip
2001-11-13 9:01 ` Peter Wächtler
0 siblings, 1 reply; 3+ messages in thread
From: Christian, Chip @ 2001-11-09 19:31 UTC (permalink / raw)
To: 'Peter Wächtler', linux-xfs; +Cc: lkml
The reason that dput was dropped in 2.4.10 is that there's another one that also gets executed, causing the kernel oops right after the
if (!pdentry) {
}
code block.
I think you have a filesystem in need of repair.
RedHat ships this patch with their 2.4.9 kernels:
--- linux/fs/nfsd/nfsfh.c~ Fri Aug 17 21:30:25 2001
+++ linux/fs/nfsd/nfsfh.c Fri Aug 17 21:30:40 2001
@@ -275,7 +275,6 @@
d_drop(tdentry); /* we never want ".." hashed */
if (!pdentry && tdentry->d_inode == NULL) {
/* File system cannot find ".." ... sad but possible */
- dput(tdentry);
pdentry = ERR_PTR(-EINVAL);
}
if (!pdentry) {
-----Original Message-----
From: Peter Wächtler [mailto:pwaechtler@loewe-komp.de]
Sent: Friday, November 09, 2001 5:01
To: linux-xfs@oss.sgi.com
Cc: lkml
Subject: NFS hit me (2.4.9-xfs) again
Unable to handle kernel NULL pointer dereference at virtual address 00000000
00000000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<00000000>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000000 ebx: c9c0d7e0 ecx: 00000000 edx: c03a7b00
esi: c9c0d560 edi: c9c0d7e0 ebp: c9c0d7e0 esp: cbddde84
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 233, stackpage=cbddd000)
Stack: c01729f4 cc97f420 c9c0d560 00000005 cdc40a00 c0172e56 c9c0d7e0 00000005
cd390200 cd390000 cbed2000 cbdddf20 cdc40be8 cd390200 c0173199 cdc40a00
cd390010 00000005 00000001 00000001 00000008 cbe17e00 cbee48c0 cd390200
Call Trace: [<c01729f4>] [<c0172e56>] [<c0173199>] [<c0173ad2>] [<c017843d>]
[<c017173c>] [<c0171003>] [<c0240318>] [<c0170dab>] [<c010557f>]
Code: Bad EIP value.
>>EIP; 00000000 Before first symbol
Trace; c01729f4 <nfsd_findparent+34/104>
Trace; c0172e56 <find_fh_dentry+226/34c>
Trace; c0173199 <fh_verify+21d/438>
Trace; c0173ad2 <nfsd_lookup+92/4b8>
Trace; c017843d <nfssvc_decode_diropargs+5d/d0>
Trace; c017173c <nfsd_proc_lookup+88/9c>
Trace; c0171003 <nfsd_dispatch+cb/168>
Trace; c0240318 <svc_process+2ac/544>
Trace; c0170dab <nfsd+1ab/338>
Trace; c010557f <kernel_thread+23/30>
This is not the initial crash location - the machine was dead (and no serial
console yet). But after restarting, about 6-10 clients tried to reconnect
to NFSD and caused the crash.
The crash appears because "child->d_inode->i_op->lookup == NULL"
struct dentry *nfsd_findparent(struct dentry *child)
{
struct dentry *tdentry, *pdentry;
tdentry = d_alloc(child, &(const struct qstr) {"..", 2, 0});
if (!tdentry)
return ERR_PTR(-ENOMEM);
/* I'm going to assume that if the returned dentry is different, then
* it is well connected. But nobody returns different dentrys do they?
*/
/* added safety check to prevent crash - peewee */
if (child->d_inode->i_op && child->d_inode->i_op->lookup){
pdentry = child->d_inode->i_op->lookup(child->d_inode, tdentry);
} else {
printk("normally we had been crashing\n");
printk("child: %p\n",child);
printk("child->d_inode: %p\n",child->d_inode);
printk("child->d_inode->i_op: %p\n",child->d_inode->i_op);
printk("child->d_inode->i_op->lookup: %p\n",child->d_inode->i_op->lookup);
return( ERR_PTR(-EINVAL) );
\x13 }
d_drop(tdentry); /* we never want ".." hashed */
if (!pdentry && tdentry->d_inode == NULL) {
/* File system cannot find ".." ... sad but possible */
dput(tdentry);
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ this one was removed in 2.4.10
pdentry = ERR_PTR(-EINVAL);
}
if (!pdentry) {
/* I don't want to return a ".." dentry.
* I would prefer to return an unconnected "IS_ROOT" dentry,
[...]
If I use 2.4.12-xfs (with nfs-utils 0.3.3), clients can't create an archive with "ar":
[ strace output of "ar" creating a lib out of several *.o]
write(5, "\0\0\1\2\0\0H\2\0\0\1\2\0\0L\2\0\0\1\2\0\0P\2\0\0\1\2\0"..., 3254) = 3
close(5) = 0
munmap(0x4001f000, 4096) = 0
lstat64("lumenuila.a", {st_mode=S_IFREG|0644, st_size=8, ...}) = 0
rename("stO1wjGV", "lumenuila.a") = -1 ESTALE (Stale NFS file handle)
My main question is: Is it possible that some interaction with xfs<->nfsd
causes this kind of trouble? Especially when lookup("..") fails - and dealing
with "disconnected dentries". Does the nfs_fh carry not enough information
( when is oldfh used, when the newer one? [ref_fh->fh_handle.fh_version == 0xca]).
So we have an inode with no proper inode_operations, huh?
I don't use NFSv3, should I?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: NFS hit me (2.4.9-xfs) again
2001-11-09 19:31 NFS hit me (2.4.9-xfs) again Christian, Chip
@ 2001-11-13 9:01 ` Peter Wächtler
0 siblings, 0 replies; 3+ messages in thread
From: Peter Wächtler @ 2001-11-13 9:01 UTC (permalink / raw)
To: Christian, Chip; +Cc: linux-xfs, lkml
"Christian, Chip" wrote:
>
> The reason that dput was dropped in 2.4.10 is that there's another one that also gets executed, causing the kernel oops right after the
>
> if (!pdentry) {
> }
> code block.
>
> I think you have a filesystem in need of repair.
>
> RedHat ships this patch with their 2.4.9 kernels:
>
> --- linux/fs/nfsd/nfsfh.c~ Fri Aug 17 21:30:25 2001
> +++ linux/fs/nfsd/nfsfh.c Fri Aug 17 21:30:40 2001
> @@ -275,7 +275,6 @@
> d_drop(tdentry); /* we never want ".." hashed */
> if (!pdentry && tdentry->d_inode == NULL) {
> /* File system cannot find ".." ... sad but possible */
> - dput(tdentry);
> pdentry = ERR_PTR(-EINVAL);
> }
> if (!pdentry) {
Yes, the patch makes sense.
I've checked both: /home and /usr/local with xfs_check and with xfs_repair -n
No warning, no error.
I have removed /var/tmp, which is a reiserfs, from /etc/exports
but it was seldom used (if any).
Thanks for your effort. I will now closely follow any nfs related patches.
>
> -----Original Message-----
> From: Peter Wächtler [mailto:pwaechtler@loewe-komp.de]
> Sent: Friday, November 09, 2001 5:01
> To: linux-xfs@oss.sgi.com
> Cc: lkml
> Subject: NFS hit me (2.4.9-xfs) again
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 00000000
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<00000000>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010286
> eax: 00000000 ebx: c9c0d7e0 ecx: 00000000 edx: c03a7b00
> esi: c9c0d560 edi: c9c0d7e0 ebp: c9c0d7e0 esp: cbddde84
> ds: 0018 es: 0018 ss: 0018
> Process nfsd (pid: 233, stackpage=cbddd000)
> Stack: c01729f4 cc97f420 c9c0d560 00000005 cdc40a00 c0172e56 c9c0d7e0 00000005
> cd390200 cd390000 cbed2000 cbdddf20 cdc40be8 cd390200 c0173199 cdc40a00
> cd390010 00000005 00000001 00000001 00000008 cbe17e00 cbee48c0 cd390200
> Call Trace: [<c01729f4>] [<c0172e56>] [<c0173199>] [<c0173ad2>] [<c017843d>]
> [<c017173c>] [<c0171003>] [<c0240318>] [<c0170dab>] [<c010557f>]
> Code: Bad EIP value.
>
> >>EIP; 00000000 Before first symbol
> Trace; c01729f4 <nfsd_findparent+34/104>
> Trace; c0172e56 <find_fh_dentry+226/34c>
> Trace; c0173199 <fh_verify+21d/438>
> Trace; c0173ad2 <nfsd_lookup+92/4b8>
> Trace; c017843d <nfssvc_decode_diropargs+5d/d0>
> Trace; c017173c <nfsd_proc_lookup+88/9c>
> Trace; c0171003 <nfsd_dispatch+cb/168>
> Trace; c0240318 <svc_process+2ac/544>
> Trace; c0170dab <nfsd+1ab/338>
> Trace; c010557f <kernel_thread+23/30>
>
> This is not the initial crash location - the machine was dead (and no serial
> console yet). But after restarting, about 6-10 clients tried to reconnect
> to NFSD and caused the crash.
>
> The crash appears because "child->d_inode->i_op->lookup == NULL"
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* NFS hit me (2.4.9-xfs) again
[not found] ` <1005261542.9077.29.camel@jen.americas.sgi.com>
@ 2001-11-09 10:00 ` Peter Wächtler
0 siblings, 0 replies; 3+ messages in thread
From: Peter Wächtler @ 2001-11-09 10:00 UTC (permalink / raw)
To: linux-xfs; +Cc: lkml
Unable to handle kernel NULL pointer dereference at virtual address 00000000
00000000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<00000000>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000000 ebx: c9c0d7e0 ecx: 00000000 edx: c03a7b00
esi: c9c0d560 edi: c9c0d7e0 ebp: c9c0d7e0 esp: cbddde84
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 233, stackpage=cbddd000)
Stack: c01729f4 cc97f420 c9c0d560 00000005 cdc40a00 c0172e56 c9c0d7e0 00000005
cd390200 cd390000 cbed2000 cbdddf20 cdc40be8 cd390200 c0173199 cdc40a00
cd390010 00000005 00000001 00000001 00000008 cbe17e00 cbee48c0 cd390200
Call Trace: [<c01729f4>] [<c0172e56>] [<c0173199>] [<c0173ad2>] [<c017843d>]
[<c017173c>] [<c0171003>] [<c0240318>] [<c0170dab>] [<c010557f>]
Code: Bad EIP value.
>>EIP; 00000000 Before first symbol
Trace; c01729f4 <nfsd_findparent+34/104>
Trace; c0172e56 <find_fh_dentry+226/34c>
Trace; c0173199 <fh_verify+21d/438>
Trace; c0173ad2 <nfsd_lookup+92/4b8>
Trace; c017843d <nfssvc_decode_diropargs+5d/d0>
Trace; c017173c <nfsd_proc_lookup+88/9c>
Trace; c0171003 <nfsd_dispatch+cb/168>
Trace; c0240318 <svc_process+2ac/544>
Trace; c0170dab <nfsd+1ab/338>
Trace; c010557f <kernel_thread+23/30>
This is not the initial crash location - the machine was dead (and no serial
console yet). But after restarting, about 6-10 clients tried to reconnect
to NFSD and caused the crash.
The crash appears because "child->d_inode->i_op->lookup == NULL"
struct dentry *nfsd_findparent(struct dentry *child)
{
struct dentry *tdentry, *pdentry;
tdentry = d_alloc(child, &(const struct qstr) {"..", 2, 0});
if (!tdentry)
return ERR_PTR(-ENOMEM);
/* I'm going to assume that if the returned dentry is different, then
* it is well connected. But nobody returns different dentrys do they?
*/
/* added safety check to prevent crash - peewee */
if (child->d_inode->i_op && child->d_inode->i_op->lookup){
pdentry = child->d_inode->i_op->lookup(child->d_inode, tdentry);
} else {
printk("normally we had been crashing\n");
printk("child: %p\n",child);
printk("child->d_inode: %p\n",child->d_inode);
printk("child->d_inode->i_op: %p\n",child->d_inode->i_op);
printk("child->d_inode->i_op->lookup: %p\n",child->d_inode->i_op->lookup);
return( ERR_PTR(-EINVAL) );
\x13 }
d_drop(tdentry); /* we never want ".." hashed */
if (!pdentry && tdentry->d_inode == NULL) {
/* File system cannot find ".." ... sad but possible */
dput(tdentry);
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ this one was removed in 2.4.10
pdentry = ERR_PTR(-EINVAL);
}
if (!pdentry) {
/* I don't want to return a ".." dentry.
* I would prefer to return an unconnected "IS_ROOT" dentry,
[...]
If I use 2.4.12-xfs (with nfs-utils 0.3.3), clients can't create an archive with "ar":
[ strace output of "ar" creating a lib out of several *.o]
write(5, "\0\0\1\2\0\0H\2\0\0\1\2\0\0L\2\0\0\1\2\0\0P\2\0\0\1\2\0"..., 3254) = 3
close(5) = 0
munmap(0x4001f000, 4096) = 0
lstat64("lumenuila.a", {st_mode=S_IFREG|0644, st_size=8, ...}) = 0
rename("stO1wjGV", "lumenuila.a") = -1 ESTALE (Stale NFS file handle)
My main question is: Is it possible that some interaction with xfs<->nfsd
causes this kind of trouble? Especially when lookup("..") fails - and dealing
with "disconnected dentries". Does the nfs_fh carry not enough information
( when is oldfh used, when the newer one? [ref_fh->fh_handle.fh_version == 0xca]).
So we have an inode with no proper inode_operations, huh?
I don't use NFSv3, should I?
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2001-11-13 9:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-09 19:31 NFS hit me (2.4.9-xfs) again Christian, Chip
2001-11-13 9:01 ` Peter Wächtler
[not found] <1005258455.4701.4.camel@itspec.amoa.org>
[not found] ` <1005258497.9075.22.camel@jen.americas.sgi.com>
[not found] ` <1005259546.5742.9.camel@itspec.amoa.org>
[not found] ` <1005261542.9077.29.camel@jen.americas.sgi.com>
2001-11-09 10:00 ` Peter Wächtler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).