All of lore.kernel.org
 help / color / mirror / Atom feed
* regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-10-25 23:31 Daniel J Blueman
  2009-10-26 13:19   ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel J Blueman @ 2009-10-25 23:31 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

Since 2.6.30-rc, I've been experiencing various issues relating to
getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
server) to bisect [2].

The impact of this regression is moderate (side-effects range from
benign to failure), so we should get a fix into 2.6.32 if at all
possible and strongly consider a 2.6.31 stable update.

Thanks,
  Daniel

--- [1]

$ apt-get source apt
$ cd apt-*
$ ./configure && make
[snip]
sh: getcwd() failed: No such file or directory

--- [2]

a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
commit a65318bf3afc93ce49227e849d213799b072c5fd
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Wed Mar 11 14:10:28 2009 -0400

    NFSv4: Simplify some cache consistency post-op GETATTRs

    Certain asynchronous operations such as write() do not expect
    (or care) that other metadata such as the file owner, mode, acls, ...
    change. All they want to do is update and/or check the change attribute,
    ctime, and mtime.
    By skipping the file owner and group update, we also avoid having to do a
    potential idmapper upcall for these asynchronous RPC calls.

    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

:040000 040000 0c89c426fff3ad249757e3f77327ea79902df64b
6c96fe34f548df08cd11771fbcc839b44a5607e1 M	fs
:040000 040000 6ce4d50124b9dd214c54a42bd922a2b9da10aca2
16f551da881e2d2faeb786e5c47cfe0a21f42ade M	include
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-10-26 13:19   ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-10-26 13:19 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
> Since 2.6.30-rc, I've been experiencing various issues relating to
> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
> server) to bisect [2].
> 
> The impact of this regression is moderate (side-effects range from
> benign to failure), so we should get a fix into 2.6.32 if at all
> possible and strongly consider a 2.6.31 stable update.
> 
> Thanks,
>   Daniel
> 
> --- [1]
> 
> $ apt-get source apt
> $ cd apt-*
> $ ./configure && make
> [snip]
> sh: getcwd() failed: No such file or directory
> 
> --- [2]
> 
> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
> commit a65318bf3afc93ce49227e849d213799b072c5fd
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Wed Mar 11 14:10:28 2009 -0400
> 
>     NFSv4: Simplify some cache consistency post-op GETATTRs

I'm having a lot of trouble seeing how this patch could result in
ENOENT. All it should be doing is reducing the frequency with which we
update some of the inode metadata.

Have you ever been able to capture one of these errors using strace?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-10-26 13:19   ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-10-26 13:19 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
> Since 2.6.30-rc, I've been experiencing various issues relating to
> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
> server) to bisect [2].
> 
> The impact of this regression is moderate (side-effects range from
> benign to failure), so we should get a fix into 2.6.32 if at all
> possible and strongly consider a 2.6.31 stable update.
> 
> Thanks,
>   Daniel
> 
> --- [1]
> 
> $ apt-get source apt
> $ cd apt-*
> $ ./configure && make
> [snip]
> sh: getcwd() failed: No such file or directory
> 
> --- [2]
> 
> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
> commit a65318bf3afc93ce49227e849d213799b072c5fd
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Wed Mar 11 14:10:28 2009 -0400
> 
>     NFSv4: Simplify some cache consistency post-op GETATTRs

I'm having a lot of trouble seeing how this patch could result in
ENOENT. All it should be doing is reducing the frequency with which we
update some of the inode metadata.

Have you ever been able to capture one of these errors using strace?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-01 12:47     ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-01 12:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

Hi Trond,

On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>> Since 2.6.30-rc, I've been experiencing various issues relating to
>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>> server) to bisect [2].
>>
>> The impact of this regression is moderate (side-effects range from
>> benign to failure), so we should get a fix into 2.6.32 if at all
>> possible and strongly consider a 2.6.31 stable update.
>>
>> Thanks,
>>   Daniel
>>
>> --- [1]
>>
>> $ apt-get source apt
>> $ cd apt-*
>> $ ./configure && make
>> [snip]
>> sh: getcwd() failed: No such file or directory
>>
>> --- [2]
>>
>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Date:   Wed Mar 11 14:10:28 2009 -0400
>>
>>     NFSv4: Simplify some cache consistency post-op GETATTRs
>
> I'm having a lot of trouble seeing how this patch could result in
> ENOENT. All it should be doing is reducing the frequency with which we
> update some of the inode metadata.
>
> Have you ever been able to capture one of these errors using strace?

Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
on server) corrects the behaviour. It's readily reproducible [1];
using 2.6.30, the issue is not seen, thus is a regression.

To observe the change to user-level behaviour (after the reproducer commands):
# make clean
# strace -ffe getcwd make -n >list
[pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
make: getcwd: No such file or directory

Would this help for me to log this via a bugzilla.kernel.org ticket?

Thanks,
  Daniel

--- [1]

booting eg:
http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso

$ sudo bash
# apt-get install build-essential
# apt-get build-dep apt
# mount server:/ /mnt -tnfs4 && cd /mnt
# apt-get source apt
# cd apt-0.7.23.1ubuntu2
# ./configure && make
 -> "getcwd: No such file or directory" messages observed with cited
patch and not without
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-01 12:47     ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-01 12:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

Hi Trond,

On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>> Since 2.6.30-rc, I've been experiencing various issues relating to
>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicate=
d
>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>> server) to bisect [2].
>>
>> The impact of this regression is moderate (side-effects range from
>> benign to failure), so we should get a fix into 2.6.32 if at all
>> possible and strongly consider a 2.6.31 stable update.
>>
>> Thanks,
>> =A0 Daniel
>>
>> --- [1]
>>
>> $ apt-get source apt
>> $ cd apt-*
>> $ ./configure && make
>> [snip]
>> sh: getcwd() failed: No such file or directory
>>
>> --- [2]
>>
>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Date: =A0 Wed Mar 11 14:10:28 2009 -0400
>>
>> =A0 =A0 NFSv4: Simplify some cache consistency post-op GETATTRs
>
> I'm having a lot of trouble seeing how this patch could result in
> ENOENT. All it should be doing is reducing the frequency with which w=
e
> update some of the inode metadata.
>
> Have you ever been able to capture one of these errors using strace?

Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
on server) corrects the behaviour. It's readily reproducible [1];
using 2.6.30, the issue is not seen, thus is a regression.

To observe the change to user-level behaviour (after the reproducer com=
mands):
# make clean
# strace -ffe getcwd make -n >list
[pid  3829] getcwd(0x7fffa269a380, 4096) =3D -1 ENOENT (No such file or=
 directory)
make: getcwd: No such file or directory

Would this help for me to log this via a bugzilla.kernel.org ticket?

Thanks,
  Daniel

--- [1]

booting eg:
http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-=
amd64.iso

$ sudo bash
# apt-get install build-essential
# apt-get build-dep apt
# mount server:/ /mnt -tnfs4 && cd /mnt
# apt-get source apt
# cd apt-0.7.23.1ubuntu2
# ./configure && make
 -> "getcwd: No such file or directory" messages observed with cited
patch and not without
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
  2009-11-01 12:47     ` Daniel J Blueman
@ 2009-11-04  9:36       ` Daniel J Blueman
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-04  9:36 UTC (permalink / raw)
  To: Linux Kernel, linux-nfs

On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:
> Hi Trond,
>
> On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>>> Since 2.6.30-rc, I've been experiencing various issues relating to
>>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
>>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>>> server) to bisect [2].
>>>
>>> The impact of this regression is moderate (side-effects range from
>>> benign to failure), so we should get a fix into 2.6.32 if at all
>>> possible and strongly consider a 2.6.31 stable update.
>>>
>>> Thanks,
>>>   Daniel
>>>
>>> --- [1]
>>>
>>> $ apt-get source apt
>>> $ cd apt-*
>>> $ ./configure && make
>>> [snip]
>>> sh: getcwd() failed: No such file or directory
>>>
>>> --- [2]
>>>
>>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> Date:   Wed Mar 11 14:10:28 2009 -0400
>>>
>>>     NFSv4: Simplify some cache consistency post-op GETATTRs
>>
>> I'm having a lot of trouble seeing how this patch could result in
>> ENOENT. All it should be doing is reducing the frequency with which we
>> update some of the inode metadata.
>>
>> Have you ever been able to capture one of these errors using strace?
>
> Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
> on server) corrects the behaviour. It's readily reproducible [1];
> using 2.6.30, the issue is not seen, thus is a regression.
>
> To observe the change to user-level behaviour (after the reproducer commands):
> # make clean
> # strace -ffe getcwd make -n >list
> [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
> make: getcwd: No such file or directory
>
> Would this help for me to log this via a bugzilla.kernel.org ticket?
>
> Thanks,
>  Daniel
>
> --- [1]
>
> booting eg:
> http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
>
> $ sudo bash
> # apt-get install build-essential
> # apt-get build-dep apt
> # mount server:/ /mnt -tnfs4 && cd /mnt
> # apt-get source apt
> # cd apt-0.7.23.1ubuntu2
> # ./configure && make
>  -> "getcwd: No such file or directory" messages observed with cited
> patch and not without

For continuity with the mailing list thread, I've created a bug report
of this at:

http://bugzilla.kernel.org/show_bug.cgi?id=14541
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-04  9:36       ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-04  9:36 UTC (permalink / raw)
  To: Linux Kernel, linux-nfs

On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:
> Hi Trond,
>
> On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>>> Since 2.6.30-rc, I've been experiencing various issues relating to
>>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicat=
ed
>>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>>> server) to bisect [2].
>>>
>>> The impact of this regression is moderate (side-effects range from
>>> benign to failure), so we should get a fix into 2.6.32 if at all
>>> possible and strongly consider a 2.6.31 stable update.
>>>
>>> Thanks,
>>> =A0 Daniel
>>>
>>> --- [1]
>>>
>>> $ apt-get source apt
>>> $ cd apt-*
>>> $ ./configure && make
>>> [snip]
>>> sh: getcwd() failed: No such file or directory
>>>
>>> --- [2]
>>>
>>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> Date: =A0 Wed Mar 11 14:10:28 2009 -0400
>>>
>>> =A0 =A0 NFSv4: Simplify some cache consistency post-op GETATTRs
>>
>> I'm having a lot of trouble seeing how this patch could result in
>> ENOENT. All it should be doing is reducing the frequency with which =
we
>> update some of the inode metadata.
>>
>> Have you ever been able to capture one of these errors using strace?
>
> Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc=
5
> on server) corrects the behaviour. It's readily reproducible [1];
> using 2.6.30, the issue is not seen, thus is a regression.
>
> To observe the change to user-level behaviour (after the reproducer c=
ommands):
> # make clean
> # strace -ffe getcwd make -n >list
> [pid =A03829] getcwd(0x7fffa269a380, 4096) =3D -1 ENOENT (No such fil=
e or directory)
> make: getcwd: No such file or directory
>
> Would this help for me to log this via a bugzilla.kernel.org ticket?
>
> Thanks,
> =A0Daniel
>
> --- [1]
>
> booting eg:
> http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-deskto=
p-amd64.iso
>
> $ sudo bash
> # apt-get install build-essential
> # apt-get build-dep apt
> # mount server:/ /mnt -tnfs4 && cd /mnt
> # apt-get source apt
> # cd apt-0.7.23.1ubuntu2
> # ./configure && make
> =A0-> "getcwd: No such file or directory" messages observed with cite=
d
> patch and not without

=46or continuity with the mailing list thread, I've created a bug repor=
t
of this at:

http://bugzilla.kernel.org/show_bug.cgi?id=3D14541
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-05 17:45         ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-11-05 17:45 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
> <daniel.blueman@gmail.com> wrote:
> > Hi Trond,
> >
> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
> > <Trond.Myklebust@netapp.com> wrote:
> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
> >>> Since 2.6.30-rc, I've been experiencing various issues relating to
> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
> >>> server) to bisect [2].
> >>>
> >>> The impact of this regression is moderate (side-effects range from
> >>> benign to failure), so we should get a fix into 2.6.32 if at all
> >>> possible and strongly consider a 2.6.31 stable update.
> >>>
> >>> Thanks,
> >>>   Daniel
> >>>
> >>> --- [1]
> >>>
> >>> $ apt-get source apt
> >>> $ cd apt-*
> >>> $ ./configure && make
> >>> [snip]
> >>> sh: getcwd() failed: No such file or directory
> >>>
> >>> --- [2]
> >>>
> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>> Date:   Wed Mar 11 14:10:28 2009 -0400
> >>>
> >>>     NFSv4: Simplify some cache consistency post-op GETATTRs
> >>
> >> I'm having a lot of trouble seeing how this patch could result in
> >> ENOENT. All it should be doing is reducing the frequency with which we
> >> update some of the inode metadata.
> >>
> >> Have you ever been able to capture one of these errors using strace?
> >
> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
> > on server) corrects the behaviour. It's readily reproducible [1];
> > using 2.6.30, the issue is not seen, thus is a regression.
> >
> > To observe the change to user-level behaviour (after the reproducer commands):
> > # make clean
> > # strace -ffe getcwd make -n >list
> > [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
> > make: getcwd: No such file or directory
> >
> > Would this help for me to log this via a bugzilla.kernel.org ticket?
> >
> > Thanks,
> >  Daniel
> >
> > --- [1]
> >
> > booting eg:
> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
> >
> > $ sudo bash
> > # apt-get install build-essential
> > # apt-get build-dep apt
> > # mount server:/ /mnt -tnfs4 && cd /mnt
> > # apt-get source apt
> > # cd apt-0.7.23.1ubuntu2
> > # ./configure && make
> >  -> "getcwd: No such file or directory" messages observed with cited
> > patch and not without
> 
> For continuity with the mailing list thread, I've created a bug report
> of this at:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=14541

I just committed the following patch into the above bugzilla entry. I
hope it suffices to fix the bug.

Cheers
  Trond
-------------------------------------------------------------------
NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
From: Trond Myklebust <Trond.Myklebust@netapp.com>

Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
cache consistency post-op GETATTRs) incorrectly changed the getattr
bitmap for readdir().
This causes the readdir() function to fail to return a
fileid/inode number, which again exposed a bug in the NFS readdir code that
causes spurious ENOENT errors to appear in applications (see
http://bugzilla.kernel.org/show_bug.cgi?id=14541).

The immediate band aid is to revert the incorrect bitmap change, but more
long term, we should change the NFS readdir code to cope with the
fact that NFSv4 servers are not required to support fileids/inode numbers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ff37454..741a562 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
 		.pages = &page,
 		.pgbase = 0,
 		.count = count,
-		.bitmask = NFS_SERVER(dentry->d_inode)->cache_consistency_bitmask,
+		.bitmask = NFS_SERVER(dentry->d_inode)->attr_bitmask,
 	};
 	struct nfs4_readdir_res res;
 	struct rpc_message msg = {



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-05 17:45         ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-11-05 17:45 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
> <daniel.blueman@gmail.com> wrote:
> > Hi Trond,
> >
> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
> > <Trond.Myklebust@netapp.com> wrote:
> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
> >>> Since 2.6.30-rc, I've been experiencing various issues relating to
> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
> >>> server) to bisect [2].
> >>>
> >>> The impact of this regression is moderate (side-effects range from
> >>> benign to failure), so we should get a fix into 2.6.32 if at all
> >>> possible and strongly consider a 2.6.31 stable update.
> >>>
> >>> Thanks,
> >>>   Daniel
> >>>
> >>> --- [1]
> >>>
> >>> $ apt-get source apt
> >>> $ cd apt-*
> >>> $ ./configure && make
> >>> [snip]
> >>> sh: getcwd() failed: No such file or directory
> >>>
> >>> --- [2]
> >>>
> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>> Date:   Wed Mar 11 14:10:28 2009 -0400
> >>>
> >>>     NFSv4: Simplify some cache consistency post-op GETATTRs
> >>
> >> I'm having a lot of trouble seeing how this patch could result in
> >> ENOENT. All it should be doing is reducing the frequency with which we
> >> update some of the inode metadata.
> >>
> >> Have you ever been able to capture one of these errors using strace?
> >
> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
> > on server) corrects the behaviour. It's readily reproducible [1];
> > using 2.6.30, the issue is not seen, thus is a regression.
> >
> > To observe the change to user-level behaviour (after the reproducer commands):
> > # make clean
> > # strace -ffe getcwd make -n >list
> > [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
> > make: getcwd: No such file or directory
> >
> > Would this help for me to log this via a bugzilla.kernel.org ticket?
> >
> > Thanks,
> >  Daniel
> >
> > --- [1]
> >
> > booting eg:
> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
> >
> > $ sudo bash
> > # apt-get install build-essential
> > # apt-get build-dep apt
> > # mount server:/ /mnt -tnfs4 && cd /mnt
> > # apt-get source apt
> > # cd apt-0.7.23.1ubuntu2
> > # ./configure && make
> >  -> "getcwd: No such file or directory" messages observed with cited
> > patch and not without
> 
> For continuity with the mailing list thread, I've created a bug report
> of this at:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=14541

I just committed the following patch into the above bugzilla entry. I
hope it suffices to fix the bug.

Cheers
  Trond
-------------------------------------------------------------------
NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
From: Trond Myklebust <Trond.Myklebust@netapp.com>

Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
cache consistency post-op GETATTRs) incorrectly changed the getattr
bitmap for readdir().
This causes the readdir() function to fail to return a
fileid/inode number, which again exposed a bug in the NFS readdir code that
causes spurious ENOENT errors to appear in applications (see
http://bugzilla.kernel.org/show_bug.cgi?id=14541).

The immediate band aid is to revert the incorrect bitmap change, but more
long term, we should change the NFS readdir code to cope with the
fact that NFSv4 servers are not required to support fileids/inode numbers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ff37454..741a562 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
 		.pages = &page,
 		.pgbase = 0,
 		.count = count,
-		.bitmask = NFS_SERVER(dentry->d_inode)->cache_consistency_bitmask,
+		.bitmask = NFS_SERVER(dentry->d_inode)->attr_bitmask,
 	};
 	struct nfs4_readdir_res res;
 	struct rpc_message msg = {



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
  2009-11-05 17:45         ` Trond Myklebust
@ 2009-11-06  0:41           ` Daniel J Blueman
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-06  0:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

On Thu, Nov 5, 2009 at 5:45 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
>> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
>> <daniel.blueman@gmail.com> wrote:
>> > Hi Trond,
>> >
>> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
>> > <Trond.Myklebust@netapp.com> wrote:
>> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>> >>> Since 2.6.30-rc, I've been experiencing various issues relating to
>> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
>> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>> >>> server) to bisect [2].
>> >>>
>> >>> The impact of this regression is moderate (side-effects range from
>> >>> benign to failure), so we should get a fix into 2.6.32 if at all
>> >>> possible and strongly consider a 2.6.31 stable update.
>> >>>
>> >>> Thanks,
>> >>>   Daniel
>> >>>
>> >>> --- [1]
>> >>>
>> >>> $ apt-get source apt
>> >>> $ cd apt-*
>> >>> $ ./configure && make
>> >>> [snip]
>> >>> sh: getcwd() failed: No such file or directory
>> >>>
>> >>> --- [2]
>> >>>
>> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> >>> Date:   Wed Mar 11 14:10:28 2009 -0400
>> >>>
>> >>>     NFSv4: Simplify some cache consistency post-op GETATTRs
>> >>
>> >> I'm having a lot of trouble seeing how this patch could result in
>> >> ENOENT. All it should be doing is reducing the frequency with which we
>> >> update some of the inode metadata.
>> >>
>> >> Have you ever been able to capture one of these errors using strace?
>> >
>> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
>> > on server) corrects the behaviour. It's readily reproducible [1];
>> > using 2.6.30, the issue is not seen, thus is a regression.
>> >
>> > To observe the change to user-level behaviour (after the reproducer commands):
>> > # make clean
>> > # strace -ffe getcwd make -n >list
>> > [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
>> > make: getcwd: No such file or directory
>> >
>> > Would this help for me to log this via a bugzilla.kernel.org ticket?
>> >
>> > Thanks,
>> >  Daniel
>> >
>> > --- [1]
>> >
>> > booting eg:
>> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
>> >
>> > $ sudo bash
>> > # apt-get install build-essential
>> > # apt-get build-dep apt
>> > # mount server:/ /mnt -tnfs4 && cd /mnt
>> > # apt-get source apt
>> > # cd apt-0.7.23.1ubuntu2
>> > # ./configure && make
>> >  -> "getcwd: No such file or directory" messages observed with cited
>> > patch and not without
>>
>> For continuity with the mailing list thread, I've created a bug report
>> of this at:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=14541
>
> I just committed the following patch into the above bugzilla entry. I
> hope it suffices to fix the bug.
>
> Cheers
>  Trond
> -------------------------------------------------------------------
> NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
> cache consistency post-op GETATTRs) incorrectly changed the getattr
> bitmap for readdir().
> This causes the readdir() function to fail to return a
> fileid/inode number, which again exposed a bug in the NFS readdir code that
> causes spurious ENOENT errors to appear in applications (see
> http://bugzilla.kernel.org/show_bug.cgi?id=14541).
>
> The immediate band aid is to revert the incorrect bitmap change, but more
> long term, we should change the NFS readdir code to cope with the
> fact that NFSv4 servers are not required to support fileids/inode numbers.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
>  fs/nfs/nfs4proc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index ff37454..741a562 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
>                .pages = &page,
>                .pgbase = 0,
>                .count = count,
> -               .bitmask = NFS_SERVER(dentry->d_inode)->cache_consistency_bitmask,
> +               .bitmask = NFS_SERVER(dentry->d_inode)->attr_bitmask,
>        };
>        struct nfs4_readdir_res res;
>        struct rpc_message msg = {
>

This fixes the behaviour and passes some heavy testing with two good
test-cases, with 2.6.32-rc6. As well, this would be good value for the
stable stream. I've sync'd the bugzilla report.

Thanks for your work on this, Trond!

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-06  0:41           ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-06  0:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

On Thu, Nov 5, 2009 at 5:45 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
>> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
>> <daniel.blueman@gmail.com> wrote:
>> > Hi Trond,
>> >
>> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
>> > <Trond.Myklebust@netapp.com> wrote:
>> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>> >>> Since 2.6.30-rc, I've been experiencing various issues relating =
to
>> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-compli=
cated
>> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 N=
=46S4
>> >>> server) to bisect [2].
>> >>>
>> >>> The impact of this regression is moderate (side-effects range fr=
om
>> >>> benign to failure), so we should get a fix into 2.6.32 if at all
>> >>> possible and strongly consider a 2.6.31 stable update.
>> >>>
>> >>> Thanks,
>> >>> =A0 Daniel
>> >>>
>> >>> --- [1]
>> >>>
>> >>> $ apt-get source apt
>> >>> $ cd apt-*
>> >>> $ ./configure && make
>> >>> [snip]
>> >>> sh: getcwd() failed: No such file or directory
>> >>>
>> >>> --- [2]
>> >>>
>> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> >>> Date: =A0 Wed Mar 11 14:10:28 2009 -0400
>> >>>
>> >>> =A0 =A0 NFSv4: Simplify some cache consistency post-op GETATTRs
>> >>
>> >> I'm having a lot of trouble seeing how this patch could result in
>> >> ENOENT. All it should be doing is reducing the frequency with whi=
ch we
>> >> update some of the inode metadata.
>> >>
>> >> Have you ever been able to capture one of these errors using stra=
ce?
>> >
>> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32=
-rc5
>> > on server) corrects the behaviour. It's readily reproducible [1];
>> > using 2.6.30, the issue is not seen, thus is a regression.
>> >
>> > To observe the change to user-level behaviour (after the reproduce=
r commands):
>> > # make clean
>> > # strace -ffe getcwd make -n >list
>> > [pid =A03829] getcwd(0x7fffa269a380, 4096) =3D -1 ENOENT (No such =
file or directory)
>> > make: getcwd: No such file or directory
>> >
>> > Would this help for me to log this via a bugzilla.kernel.org ticke=
t?
>> >
>> > Thanks,
>> > =A0Daniel
>> >
>> > --- [1]
>> >
>> > booting eg:
>> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-des=
ktop-amd64.iso
>> >
>> > $ sudo bash
>> > # apt-get install build-essential
>> > # apt-get build-dep apt
>> > # mount server:/ /mnt -tnfs4 && cd /mnt
>> > # apt-get source apt
>> > # cd apt-0.7.23.1ubuntu2
>> > # ./configure && make
>> > =A0-> "getcwd: No such file or directory" messages observed with c=
ited
>> > patch and not without
>>
>> For continuity with the mailing list thread, I've created a bug repo=
rt
>> of this at:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=3D14541
>
> I just committed the following patch into the above bugzilla entry. I
> hope it suffices to fix the bug.
>
> Cheers
> =A0Trond
> -------------------------------------------------------------------
> NFSv4: Fix a cache validation bug which causes getcwd() to return ENO=
ENT
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify s=
ome
> cache consistency post-op GETATTRs) incorrectly changed the getattr
> bitmap for readdir().
> This causes the readdir() function to fail to return a
> fileid/inode number, which again exposed a bug in the NFS readdir cod=
e that
> causes spurious ENOENT errors to appear in applications (see
> http://bugzilla.kernel.org/show_bug.cgi?id=3D14541).
>
> The immediate band aid is to revert the incorrect bitmap change, but =
more
> long term, we should change the NFS readdir code to cope with the
> fact that NFSv4 servers are not required to support fileids/inode num=
bers.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> =A0fs/nfs/nfs4proc.c | =A0 =A02 +-
> =A01 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index ff37454..741a562 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *de=
ntry, struct rpc_cred *cred,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.pages =3D &page,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.pgbase =3D 0,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.count =3D count,
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 .bitmask =3D NFS_SERVER(dentry->d_inode=
)->cache_consistency_bitmask,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 .bitmask =3D NFS_SERVER(dentry->d_inode=
)->attr_bitmask,
> =A0 =A0 =A0 =A0};
> =A0 =A0 =A0 =A0struct nfs4_readdir_res res;
> =A0 =A0 =A0 =A0struct rpc_message msg =3D {
>

This fixes the behaviour and passes some heavy testing with two good
test-cases, with 2.6.32-rc6. As well, this would be good value for the
stable stream. I've sync'd the bugzilla report.

Thanks for your work on this, Trond!

Daniel
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-13 12:52             ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-13 12:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

Hi Trond,

On Fri, Nov 6, 2009 at 12:41 AM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:
> On Thu, Nov 5, 2009 at 5:45 PM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
>> On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
>>> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
>>> <daniel.blueman@gmail.com> wrote:
>>> > Hi Trond,
>>> >
>>> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
>>> > <Trond.Myklebust@netapp.com> wrote:
>>> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>>> >>> Since 2.6.30-rc, I've been experiencing various issues relating to
>>> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-complicated
>>> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 NFS4
>>> >>> server) to bisect [2].
>>> >>>
>>> >>> The impact of this regression is moderate (side-effects range from
>>> >>> benign to failure), so we should get a fix into 2.6.32 if at all
>>> >>> possible and strongly consider a 2.6.31 stable update.
>>> >>>
>>> >>> Thanks,
>>> >>>   Daniel
>>> >>>
>>> >>> --- [1]
>>> >>>
>>> >>> $ apt-get source apt
>>> >>> $ cd apt-*
>>> >>> $ ./configure && make
>>> >>> [snip]
>>> >>> sh: getcwd() failed: No such file or directory
>>> >>>
>>> >>> --- [2]
>>> >>>
>>> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>>> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>>> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> >>> Date:   Wed Mar 11 14:10:28 2009 -0400
>>> >>>
>>> >>>     NFSv4: Simplify some cache consistency post-op GETATTRs
>>> >>
>>> >> I'm having a lot of trouble seeing how this patch could result in
>>> >> ENOENT. All it should be doing is reducing the frequency with which we
>>> >> update some of the inode metadata.
>>> >>
>>> >> Have you ever been able to capture one of these errors using strace?
>>> >
>>> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.32-rc5
>>> > on server) corrects the behaviour. It's readily reproducible [1];
>>> > using 2.6.30, the issue is not seen, thus is a regression.
>>> >
>>> > To observe the change to user-level behaviour (after the reproducer commands):
>>> > # make clean
>>> > # strace -ffe getcwd make -n >list
>>> > [pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
>>> > make: getcwd: No such file or directory
>>> >
>>> > Would this help for me to log this via a bugzilla.kernel.org ticket?
>>> >
>>> > Thanks,
>>> >  Daniel
>>> >
>>> > --- [1]
>>> >
>>> > booting eg:
>>> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso
>>> >
>>> > $ sudo bash
>>> > # apt-get install build-essential
>>> > # apt-get build-dep apt
>>> > # mount server:/ /mnt -tnfs4 && cd /mnt
>>> > # apt-get source apt
>>> > # cd apt-0.7.23.1ubuntu2
>>> > # ./configure && make
>>> >  -> "getcwd: No such file or directory" messages observed with cited
>>> > patch and not without
>>>
>>> For continuity with the mailing list thread, I've created a bug report
>>> of this at:
>>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=14541
>>
>> I just committed the following patch into the above bugzilla entry. I
>> hope it suffices to fix the bug.
>>
>> Cheers
>>  Trond
>> -------------------------------------------------------------------
>> NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
>> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>>
>> Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
>> cache consistency post-op GETATTRs) incorrectly changed the getattr
>> bitmap for readdir().
>> This causes the readdir() function to fail to return a
>> fileid/inode number, which again exposed a bug in the NFS readdir code that
>> causes spurious ENOENT errors to appear in applications (see
>> http://bugzilla.kernel.org/show_bug.cgi?id=14541).
>>
>> The immediate band aid is to revert the incorrect bitmap change, but more
>> long term, we should change the NFS readdir code to cope with the
>> fact that NFSv4 servers are not required to support fileids/inode numbers.
>>
>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>> ---
>>
>>  fs/nfs/nfs4proc.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index ff37454..741a562 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
>>                .pages = &page,
>>                .pgbase = 0,
>>                .count = count,
>> -               .bitmask = NFS_SERVER(dentry->d_inode)->cache_consistency_bitmask,
>> +               .bitmask = NFS_SERVER(dentry->d_inode)->attr_bitmask,
>>        };
>>        struct nfs4_readdir_res res;
>>        struct rpc_message msg = {
>>
>
> This fixes the behaviour and passes some heavy testing with two good
> test-cases, with 2.6.32-rc6. As well, this would be good value for the
> stable stream. I've sync'd the bugzilla report.

Is there opportunity to get this regression fix into 2.6.32-rc8, since
-rc8 may be the (pen)ultimate before final?

Thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-13 12:52             ` Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2009-11-13 12:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel, linux-nfs

Hi Trond,

On Fri, Nov 6, 2009 at 12:41 AM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:
> On Thu, Nov 5, 2009 at 5:45 PM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
>> On Wed, 2009-11-04 at 09:36 +0000, Daniel J Blueman wrote:
>>> On Sun, Nov 1, 2009 at 12:47 PM, Daniel J Blueman
>>> <daniel.blueman@gmail.com> wrote:
>>> > Hi Trond,
>>> >
>>> > On Mon, Oct 26, 2009 at 1:19 PM, Trond Myklebust
>>> > <Trond.Myklebust@netapp.com> wrote:
>>> >> On Sun, 2009-10-25 at 23:31 +0000, Daniel J Blueman wrote:
>>> >>> Since 2.6.30-rc, I've been experiencing various issues relating=
 to
>>> >>> getcwd() returning ENOENT on NFS4 clients. I used an over-compl=
icated
>>> >>> but reliable reproducer [1] (on Karmic RC against a 2.6.32-rc5 =
NFS4
>>> >>> server) to bisect [2].
>>> >>>
>>> >>> The impact of this regression is moderate (side-effects range f=
rom
>>> >>> benign to failure), so we should get a fix into 2.6.32 if at al=
l
>>> >>> possible and strongly consider a 2.6.31 stable update.
>>> >>>
>>> >>> Thanks,
>>> >>> =A0 Daniel
>>> >>>
>>> >>> --- [1]
>>> >>>
>>> >>> $ apt-get source apt
>>> >>> $ cd apt-*
>>> >>> $ ./configure && make
>>> >>> [snip]
>>> >>> sh: getcwd() failed: No such file or directory
>>> >>>
>>> >>> --- [2]
>>> >>>
>>> >>> a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
>>> >>> commit a65318bf3afc93ce49227e849d213799b072c5fd
>>> >>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> >>> Date: =A0 Wed Mar 11 14:10:28 2009 -0400
>>> >>>
>>> >>> =A0 =A0 NFSv4: Simplify some cache consistency post-op GETATTRs
>>> >>
>>> >> I'm having a lot of trouble seeing how this patch could result i=
n
>>> >> ENOENT. All it should be doing is reducing the frequency with wh=
ich we
>>> >> update some of the inode metadata.
>>> >>
>>> >> Have you ever been able to capture one of these errors using str=
ace?
>>> >
>>> > Backing this patch out by hand against stock 2.6.32-rc5 (w/ 2.6.3=
2-rc5
>>> > on server) corrects the behaviour. It's readily reproducible [1];
>>> > using 2.6.30, the issue is not seen, thus is a regression.
>>> >
>>> > To observe the change to user-level behaviour (after the reproduc=
er commands):
>>> > # make clean
>>> > # strace -ffe getcwd make -n >list
>>> > [pid =A03829] getcwd(0x7fffa269a380, 4096) =3D -1 ENOENT (No such=
 file or directory)
>>> > make: getcwd: No such file or directory
>>> >
>>> > Would this help for me to log this via a bugzilla.kernel.org tick=
et?
>>> >
>>> > Thanks,
>>> > =A0Daniel
>>> >
>>> > --- [1]
>>> >
>>> > booting eg:
>>> > http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-de=
sktop-amd64.iso
>>> >
>>> > $ sudo bash
>>> > # apt-get install build-essential
>>> > # apt-get build-dep apt
>>> > # mount server:/ /mnt -tnfs4 && cd /mnt
>>> > # apt-get source apt
>>> > # cd apt-0.7.23.1ubuntu2
>>> > # ./configure && make
>>> > =A0-> "getcwd: No such file or directory" messages observed with =
cited
>>> > patch and not without
>>>
>>> For continuity with the mailing list thread, I've created a bug rep=
ort
>>> of this at:
>>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=3D14541
>>
>> I just committed the following patch into the above bugzilla entry. =
I
>> hope it suffices to fix the bug.
>>
>> Cheers
>> =A0Trond
>> -------------------------------------------------------------------
>> NFSv4: Fix a cache validation bug which causes getcwd() to return EN=
OENT
>> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>>
>> Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify =
some
>> cache consistency post-op GETATTRs) incorrectly changed the getattr
>> bitmap for readdir().
>> This causes the readdir() function to fail to return a
>> fileid/inode number, which again exposed a bug in the NFS readdir co=
de that
>> causes spurious ENOENT errors to appear in applications (see
>> http://bugzilla.kernel.org/show_bug.cgi?id=3D14541).
>>
>> The immediate band aid is to revert the incorrect bitmap change, but=
 more
>> long term, we should change the NFS readdir code to cope with the
>> fact that NFSv4 servers are not required to support fileids/inode nu=
mbers.
>>
>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>> ---
>>
>> =A0fs/nfs/nfs4proc.c | =A0 =A02 +-
>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index ff37454..741a562 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -2767,7 +2767,7 @@ static int _nfs4_proc_readdir(struct dentry *d=
entry, struct rpc_cred *cred,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.pages =3D &page,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.pgbase =3D 0,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0.count =3D count,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 .bitmask =3D NFS_SERVER(dentry->d_inod=
e)->cache_consistency_bitmask,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 .bitmask =3D NFS_SERVER(dentry->d_inod=
e)->attr_bitmask,
>> =A0 =A0 =A0 =A0};
>> =A0 =A0 =A0 =A0struct nfs4_readdir_res res;
>> =A0 =A0 =A0 =A0struct rpc_message msg =3D {
>>
>
> This fixes the behaviour and passes some heavy testing with two good
> test-cases, with 2.6.32-rc6. As well, this would be good value for th=
e
> stable stream. I've sync'd the bugzilla report.

Is there opportunity to get this regression fix into 2.6.32-rc8, since
-rc8 may be the (pen)ultimate before final?

Thanks,
  Daniel
--=20
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-13 12:59               ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-11-13 12:59 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Fri, 2009-11-13 at 12:52 +0000, Daniel J Blueman wrote:
> Hi Trond,
> 
> On Fri, Nov 6, 2009 at 12:41 AM, Daniel J Blueman
> <daniel.blueman@gmail.com> wrote:
> > This fixes the behaviour and passes some heavy testing with two good
> > test-cases, with 2.6.32-rc6. As well, this would be good value for the
> > stable stream. I've sync'd the bugzilla report.
> 
> Is there opportunity to get this regression fix into 2.6.32-rc8, since
> -rc8 may be the (pen)ultimate before final?

It has been committed in the linux-next tree for testing for a few days
now. The plan is to send it on to Linus and stable@kernel.org in a
couple more days.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regression, bisected: getcwd() ENOENT on NFS4...
@ 2009-11-13 12:59               ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2009-11-13 12:59 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, linux-nfs

On Fri, 2009-11-13 at 12:52 +0000, Daniel J Blueman wrote:
> Hi Trond,
> 
> On Fri, Nov 6, 2009 at 12:41 AM, Daniel J Blueman
> <daniel.blueman@gmail.com> wrote:
> > This fixes the behaviour and passes some heavy testing with two good
> > test-cases, with 2.6.32-rc6. As well, this would be good value for the
> > stable stream. I've sync'd the bugzilla report.
> 
> Is there opportunity to get this regression fix into 2.6.32-rc8, since
> -rc8 may be the (pen)ultimate before final?

It has been committed in the linux-next tree for testing for a few days
now. The plan is to send it on to Linus and stable@kernel.org in a
couple more days.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-11-13 12:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-25 23:31 regression, bisected: getcwd() ENOENT on NFS4 Daniel J Blueman
2009-10-26 13:19 ` Trond Myklebust
2009-10-26 13:19   ` Trond Myklebust
2009-11-01 12:47   ` Daniel J Blueman
2009-11-01 12:47     ` Daniel J Blueman
2009-11-04  9:36     ` Daniel J Blueman
2009-11-04  9:36       ` Daniel J Blueman
2009-11-05 17:45       ` Trond Myklebust
2009-11-05 17:45         ` Trond Myklebust
2009-11-06  0:41         ` Daniel J Blueman
2009-11-06  0:41           ` Daniel J Blueman
2009-11-13 12:52           ` Daniel J Blueman
2009-11-13 12:52             ` Daniel J Blueman
2009-11-13 12:59             ` Trond Myklebust
2009-11-13 12:59               ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.