linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
@ 2010-04-17 19:43 Nix
  2010-04-18 19:21 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Nix @ 2010-04-17 19:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-nfs, Trond Myklebust

[Trond Cc:ed as this seems to be a bug in one of your
 writeback-for-2.6.34 commits.]

In 2.6.34rcX (tip of tree) I've started seeing this sort of thing when
building over NFS (v3):

[...]
-- Found LibXslt: /usr/lib64/libxslt.so
--   found libxml-2.0, version 2.7.6
-- Found LibXml2: /usr/lib64/libxml2.so
-- Found shared-mime-info version: 0.71
-- Looking for __progname
CMake Error: Remove failed on file: /usr/src/kde/x86_64-mutilate/build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031fc510000082f: System Error: Device or resource busy
[... eventually, cmake fails because of this error.]

The silly-renamed files are invariably no longer in use (they tend to be
GCC output, ELF executables run as part of testsuites) but haven't been
removed, and they -EBUSY when removal is attempted.

A complete strace log of running cmake against current HEAD (with lots
of these errors) is at
<http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>.
I can do a packet capture too if you like.

I also see it after doing 'make install's followed by an 'rm -rf' of the
build tree: the rm -rf fails because half the files are 'in use' (they
aren't). Repeating the rm -rf a few seconds later works. fuser, even as
root, shows no processes holding these files open.

This bisects down to

commit acdc53b2146c7ee67feb1f02f7bc3020126514b8
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Fri Feb 19 17:03:26 2010 -0800

    NFS: Replace __nfs_write_mapping with sync_inode()
    
    Now that we have correct COMMIT semantics in writeback_single_inode, we can
    reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
    call to filemap_write_and_wait(), which doesn't need to hold the
    inode->i_mutex.
    
    With that done, we can eliminate nfs_write_mapping() altogether.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

I suspect that unlink()ing a not otherwise open file for which writeback
is still underway is causing the files to be sillyrenamed because
writeback is holding them open. If writeback is the only user, they
should surely not be held open: nobody cares what their contents are,
and a lot of code depends on rm -r of directories containing recently-
written-but-still-closed files succeeding.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-17 19:43 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created Nix
@ 2010-04-18 19:21 ` Trond Myklebust
  2010-04-18 19:27   ` Nix
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-04-18 19:21 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel, linux-nfs

On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: 
> [Trond Cc:ed as this seems to be a bug in one of your
>  writeback-for-2.6.34 commits.]
> 
> In 2.6.34rcX (tip of tree) I've started seeing this sort of thing when
> building over NFS (v3):
> 
> [...]
> -- Found LibXslt: /usr/lib64/libxslt.so
> --   found libxml-2.0, version 2.7.6
> -- Found LibXml2: /usr/lib64/libxml2.so
> -- Found shared-mime-info version: 0.71
> -- Looking for __progname
> CMake Error: Remove failed on file: /usr/src/kde/x86_64-mutilate/build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031fc510000082f: System Error: Device or resource busy
> [... eventually, cmake fails because of this error.]
> 
> The silly-renamed files are invariably no longer in use (they tend to be
> GCC output, ELF executables run as part of testsuites) but haven't been
> removed, and they -EBUSY when removal is attempted.
> 
> A complete strace log of running cmake against current HEAD (with lots
> of these errors) is at
> <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>.
> I can do a packet capture too if you like.
> 
> I also see it after doing 'make install's followed by an 'rm -rf' of the
> build tree: the rm -rf fails because half the files are 'in use' (they
> aren't). Repeating the rm -rf a few seconds later works. fuser, even as
> root, shows no processes holding these files open.
> 
> This bisects down to
> 
> commit acdc53b2146c7ee67feb1f02f7bc3020126514b8
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Fri Feb 19 17:03:26 2010 -0800
> 
>     NFS: Replace __nfs_write_mapping with sync_inode()
>     
>     Now that we have correct COMMIT semantics in writeback_single_inode, we can
>     reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a
>     call to filemap_write_and_wait(), which doesn't need to hold the
>     inode->i_mutex.
>     
>     With that done, we can eliminate nfs_write_mapping() altogether.
>     
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> I suspect that unlink()ing a not otherwise open file for which writeback
> is still underway is causing the files to be sillyrenamed because
> writeback is holding them open. If writeback is the only user, they
> should surely not be held open: nobody cares what their contents are,
> and a lot of code depends on rm -r of directories containing recently-
> written-but-still-closed files succeeding.

Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS:
Ensure that writeback_single_inode() calls write_inode() when syncing)?
That fixed the above problem on my setup.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 19:21 ` Trond Myklebust
@ 2010-04-18 19:27   ` Nix
  2010-04-18 19:59     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Nix @ 2010-04-18 19:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 18 Apr 2010, Trond Myklebust verbalised:

> On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: 
>> I suspect that unlink()ing a not otherwise open file for which writeback
>> is still underway is causing the files to be sillyrenamed because
>> writeback is holding them open. If writeback is the only user, they
>> should surely not be held open: nobody cares what their contents are,
>> and a lot of code depends on rm -r of directories containing recently-
>> written-but-still-closed files succeeding.
>
> Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS:
> Ensure that writeback_single_inode() calls write_inode() when syncing)?
> That fixed the above problem on my setup.

tip-of-tree includes that commit, and it's still happening for me there.
(Just verified again.)

(The exported filesystem is coming from a box running 2.6.33 atop ext4,
in case it matters.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 19:27   ` Nix
@ 2010-04-18 19:59     ` Trond Myklebust
  2010-04-18 20:03       ` Nix
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-04-18 19:59 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel, linux-nfs

On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: 
> On 18 Apr 2010, Trond Myklebust verbalised:
> 
> > On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: 
> >> I suspect that unlink()ing a not otherwise open file for which writeback
> >> is still underway is causing the files to be sillyrenamed because
> >> writeback is holding them open. If writeback is the only user, they
> >> should surely not be held open: nobody cares what their contents are,
> >> and a lot of code depends on rm -r of directories containing recently-
> >> written-but-still-closed files succeeding.
> >
> > Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS:
> > Ensure that writeback_single_inode() calls write_inode() when syncing)?
> > That fixed the above problem on my setup.
> 
> tip-of-tree includes that commit, and it's still happening for me there.
> (Just verified again.)

Does a 'sync' clear out the sillyrenamed files?

> (The exported filesystem is coming from a box running 2.6.33 atop ext4,
> in case it matters.)

That shouldn't matter.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 19:59     ` Trond Myklebust
@ 2010-04-18 20:03       ` Nix
  2010-04-18 20:13         ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Nix @ 2010-04-18 20:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 18 Apr 2010, Trond Myklebust said:

> On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: 
>> On 18 Apr 2010, Trond Myklebust verbalised:
>> tip-of-tree includes that commit, and it's still happening for me there.
>> (Just verified again.)
>
> Does a 'sync' clear out the sillyrenamed files?

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 20:03       ` Nix
@ 2010-04-18 20:13         ` Trond Myklebust
  2010-04-18 21:09           ` Nix
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-04-18 20:13 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel, linux-nfs

On Sun, 2010-04-18 at 21:03 +0100, Nix wrote: 
> On 18 Apr 2010, Trond Myklebust said:
> 
> > On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: 
> >> On 18 Apr 2010, Trond Myklebust verbalised:
> >> tip-of-tree includes that commit, and it's still happening for me there.
> >> (Just verified again.)
> >
> > Does a 'sync' clear out the sillyrenamed files?
> 
> Yes.

OK. Next question is how these files are being written. If you 'strace'
the process, do you see ordinary write() calls being used, or is the
file perhaps being written via mmap()?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 20:13         ` Trond Myklebust
@ 2010-04-18 21:09           ` Nix
  2010-04-19 13:10             ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Nix @ 2010-04-18 21:09 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 18 Apr 2010, Trond Myklebust outgrape:
> OK. Next question is how these files are being written. If you 'strace'
> the process, do you see ordinary write() calls being used, or is the
> file perhaps being written via mmap()?

(strace output is at
<http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>
if you want to look for yourself, but it's about 70Mb unlzmaed, so I can
understand if you run very fast in the opposite direction...)

The files are all ELF object files:

build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031c7510000159b: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), stripped

... which means they're being written by gas, and that writes them out
with write() (it's easy to spot gas writing things out, because one
of the first things it writes is an ELF note of the general form
"GCC: (GNU) 4.4.4" which is very rare elsewhere in an strace dump.)

A complete as run looks like this (I think this is one of the failing
ones, only of course it's cmake's subsequent recursive unlink()/rmdir()
that fails: I speculate that the non-failing ones only work because they
ran early enough that writeback had completed for them...):

23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3703000
23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3682000
23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3601000
23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe29f7000
23109 stat("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", 0x7fffe1073b40) = -1 ENOENT (No such file or directory)
23109 open("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
23109 fcntl(3, F_GETFD)                 = 0
23109 fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe2976000
23109 brk(0x1006000)                    = 0x1006000
23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe28f5000
23109 open("/tmp/ccybr1Xe.s", O_RDONLY) = 4
23109 fstat(4, {st_mode=S_IFREG|0600, st_size=356, ...}) = 0
23109 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe37b3000
23109 read(4, "\t.file\t\"src.cxx\"\n\t.text\n.globl m"..., 4096) = 356
23109 read(4, "", 28672)                = 0
23109 read(4, "", 32768)                = 0
23109 read(4, "", 32768)                = 0
23109 close(4)                          = 0
23109 munmap(0x7fafe37b3000, 4096)      = 0
23109 brk(0x102a000)                    = 0x102a000
23109 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
23109 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe27f5000
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "", 1184)                 = 0
23109 lseek(3, 1184, SEEK_CUR)          = 1184
23109 write(3, "\0src.cxx\0main\0", 14) = 14
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
23109 lseek(3, -1134, SEEK_CUR)         = 64
23109 write(3, "UH\211\345\270\0\0\0\0\311\303", 11) = 11
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
23109 lseek(3, -1122, SEEK_CUR)         = 76
23109 write(3, "\0GCC: (GNU) 4.4.4 20100308 (prer"..., 40) = 40
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
23109 lseek(3, -1078, SEEK_CUR)         = 120
23109 write(3, "\24\0\0\0\0\0\0\0\1zR\0\1x\20\1\33\f\7\10\220\1\0\0\34\0\0\0\34\0\0\0"..., 56) = 56
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
23109 lseek(3, 2, SEEK_CUR)             = 1200
23109 write(3, " \0\0\0\0\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
23109 lseek(3, -256, SEEK_CUR)          = 968
23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\4\0\361\377"..., 216) = 216
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
23109 lseek(3, -1048, SEEK_CUR)         = 176
23109 write(3, "\0.symtab\0.strtab\0.shstrtab\0.text"..., 84) = 84
23109 lseek(3, 0, SEEK_SET)             = 0
23109 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64
23109 lseek(3, 0, SEEK_SET)             = 0
23109 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
23109 brk(0x1021000)                    = 0x1021000
23109 lseek(3, -960, SEEK_CUR)          = 264
23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 704) = 704
23109 close(3)                          = 0
23109 munmap(0x7fafe27f5000, 1048576)   = 0
23109 exit_group(0)                     = ?

However, this can't be the whole picture, because this does *not* fail:

mkdir foo; (cd foo; echo 'int main (void) {}' | c++ -x c++ -c -o foo.o -pipe -); rm -r foo

So it's something more than just as.

I wonder if the failures are caused by ld -r invocations?

... no, there are none: the relict files must therefore be the result of
'as' runs. It's a shame I can't tell what they were called before they
were deleted...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-18 21:09           ` Nix
@ 2010-04-19 13:10             ` Trond Myklebust
  2010-04-19 18:54               ` Nix
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-04-19 13:10 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel, linux-nfs

On Sun, 2010-04-18 at 22:09 +0100, Nix wrote: 
> On 18 Apr 2010, Trond Myklebust outgrape:
> > OK. Next question is how these files are being written. If you 'strace'
> > the process, do you see ordinary write() calls being used, or is the
> > file perhaps being written via mmap()?
> 
> (strace output is at
> <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>
> if you want to look for yourself, but it's about 70Mb unlzmaed, so I can
> understand if you run very fast in the opposite direction...)
> 
> The files are all ELF object files:
> 
> build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031c7510000159b: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), stripped
> 
> ... which means they're being written by gas, and that writes them out
> with write() (it's easy to spot gas writing things out, because one
> of the first things it writes is an ELF note of the general form
> "GCC: (GNU) 4.4.4" which is very rare elsewhere in an strace dump.)
> 
> A complete as run looks like this (I think this is one of the failing
> ones, only of course it's cmake's subsequent recursive unlink()/rmdir()
> that fails: I speculate that the non-failing ones only work because they
> ran early enough that writeback had completed for them...):
> 
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3703000
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3682000
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3601000
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe29f7000
> 23109 stat("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", 0x7fffe1073b40) = -1 ENOENT (No such file or directory)
> 23109 open("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
> 23109 fcntl(3, F_GETFD)                 = 0
> 23109 fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe2976000
> 23109 brk(0x1006000)                    = 0x1006000
> 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe28f5000
> 23109 open("/tmp/ccybr1Xe.s", O_RDONLY) = 4
> 23109 fstat(4, {st_mode=S_IFREG|0600, st_size=356, ...}) = 0
> 23109 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe37b3000
> 23109 read(4, "\t.file\t\"src.cxx\"\n\t.text\n.globl m"..., 4096) = 356
> 23109 read(4, "", 28672)                = 0
> 23109 read(4, "", 32768)                = 0
> 23109 read(4, "", 32768)                = 0
> 23109 close(4)                          = 0
> 23109 munmap(0x7fafe37b3000, 4096)      = 0
> 23109 brk(0x102a000)                    = 0x102a000
> 23109 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> 23109 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe27f5000
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "", 1184)                 = 0
> 23109 lseek(3, 1184, SEEK_CUR)          = 1184
> 23109 write(3, "\0src.cxx\0main\0", 14) = 14
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
> 23109 lseek(3, -1134, SEEK_CUR)         = 64
> 23109 write(3, "UH\211\345\270\0\0\0\0\311\303", 11) = 11
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
> 23109 lseek(3, -1122, SEEK_CUR)         = 76
> 23109 write(3, "\0GCC: (GNU) 4.4.4 20100308 (prer"..., 40) = 40
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
> 23109 lseek(3, -1078, SEEK_CUR)         = 120
> 23109 write(3, "\24\0\0\0\0\0\0\0\1zR\0\1x\20\1\33\f\7\10\220\1\0\0\34\0\0\0\34\0\0\0"..., 56) = 56
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198
> 23109 lseek(3, 2, SEEK_CUR)             = 1200
> 23109 write(3, " \0\0\0\0\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
> 23109 lseek(3, -256, SEEK_CUR)          = 968
> 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\4\0\361\377"..., 216) = 216
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
> 23109 lseek(3, -1048, SEEK_CUR)         = 176
> 23109 write(3, "\0.symtab\0.strtab\0.shstrtab\0.text"..., 84) = 84
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64
> 23109 lseek(3, 0, SEEK_SET)             = 0
> 23109 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224
> 23109 brk(0x1021000)                    = 0x1021000
> 23109 lseek(3, -960, SEEK_CUR)          = 264
> 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 704) = 704
> 23109 close(3)                          = 0
> 23109 munmap(0x7fafe27f5000, 1048576)   = 0
> 23109 exit_group(0)                     = ?
> 
> However, this can't be the whole picture, because this does *not* fail:
> 
> mkdir foo; (cd foo; echo 'int main (void) {}' | c++ -x c++ -c -o foo.o -pipe -); rm -r foo
> 
> So it's something more than just as.
> 
> I wonder if the failures are caused by ld -r invocations?
> 
> ... no, there are none: the relict files must therefore be the result of
> 'as' runs. It's a shame I can't tell what they were called before they
> were deleted...

Hmm... Could you please try reverting commit
2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and
COMMIT RPC calls are always uninterruptible). I wonder if that
introduced a new race...

Cheers
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-19 13:10             ` Trond Myklebust
@ 2010-04-19 18:54               ` Nix
  2010-04-20 12:37                 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Nix @ 2010-04-19 18:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 19 Apr 2010, Trond Myklebust verbalised:
> Hmm... Could you please try reverting commit
> 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and
> COMMIT RPC calls are always uninterruptible). I wonder if that
> introduced a new race...

That seems to have fixed it.

(This must be a heck of a wide race, or I'm very unlucky :) I hit it
every single time, even if the machine is under heavy load...)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-19 18:54               ` Nix
@ 2010-04-20 12:37                 ` Trond Myklebust
  2010-04-20 22:19                   ` Nix
  2010-04-21 22:13                   ` Nix
  0 siblings, 2 replies; 12+ messages in thread
From: Trond Myklebust @ 2010-04-20 12:37 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel, linux-nfs

On Mon, 2010-04-19 at 19:54 +0100, Nix wrote: 
> On 19 Apr 2010, Trond Myklebust verbalised:
> > Hmm... Could you please try reverting commit
> > 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and
> > COMMIT RPC calls are always uninterruptible). I wonder if that
> > introduced a new race...
> 
> That seems to have fixed it.
> 
> (This must be a heck of a wide race, or I'm very unlucky :) I hit it
> every single time, even if the machine is under heavy load...)

Hmm... Unfortunately I don't think that reverting the patch is
sufficient to completely close the race. For instance, it is entirely
possible for the background flush threads to schedule a COMMIT which can
race with the close() call in a similar fashion.

I therefore think we need something like the following patch.

Cheers
  Trond
------------------------------------------------------------------------------------------------- 
NFS: Fix an unstable write data integrity race

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Commit 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.

This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c         |   36 ++++++++++++++++++++++++++++++++----
 include/linux/nfs_fs.h |    1 +
 2 files changed, 33 insertions(+), 4 deletions(-)


diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index de38d63..ccde2ae 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1201,6 +1201,25 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 

 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
+static int nfs_commit_set_lock(struct nfs_inode *nfsi, int may_wait)
+{
+	if (!test_and_set_bit(NFS_INO_COMMIT, &nfsi->flags))
+		return 1;
+	if (may_wait && !out_of_line_wait_on_bit_lock(&nfsi->flags,
+				NFS_INO_COMMIT, nfs_wait_bit_killable,
+				TASK_KILLABLE))
+		return 1;
+	return 0;
+}
+
+static void nfs_commit_clear_lock(struct nfs_inode *nfsi)
+{
+	clear_bit(NFS_INO_COMMIT, &nfsi->flags);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&nfsi->flags, NFS_INO_COMMIT);
+}
+
+
 static void nfs_commitdata_release(void *data)
 {
 	struct nfs_write_data *wdata = data;
@@ -1262,8 +1281,6 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
 		return PTR_ERR(task);
-	if (how & FLUSH_SYNC)
-		rpc_wait_for_completion_task(task);
 	rpc_put_task(task);
 	return 0;
 }
@@ -1294,6 +1311,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 				BDI_RECLAIMABLE);
 		nfs_clear_page_tag_locked(req);
 	}
+	nfs_commit_clear_lock(NFS_I(inode));
 	return -ENOMEM;
 }
 
@@ -1349,6 +1367,7 @@ static void nfs_commit_release(void *calldata)
 	next:
 		nfs_clear_page_tag_locked(req);
 	}
+	nfs_commit_clear_lock(NFS_I(data->inode));
 	nfs_commitdata_release(calldata);
 }
 
@@ -1363,8 +1382,11 @@ static const struct rpc_call_ops nfs_commit_ops = {
 static int nfs_commit_inode(struct inode *inode, int how)
 {
 	LIST_HEAD(head);
-	int res;
+	int may_wait = how & FLUSH_SYNC;
+	int res = 0;
 
+	if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
+		goto out;
 	spin_lock(&inode->i_lock);
 	res = nfs_scan_commit(inode, &head, 0, 0);
 	spin_unlock(&inode->i_lock);
@@ -1372,7 +1394,13 @@ static int nfs_commit_inode(struct inode *inode, int how)
 		int error = nfs_commit_list(inode, &head, how);
 		if (error < 0)
 			return error;
-	}
+		if (may_wait)
+			wait_on_bit(&NFS_I(inode)->flags, NFS_INO_COMMIT,
+					nfs_wait_bit_killable,
+					TASK_KILLABLE);
+	} else
+		nfs_commit_clear_lock(NFS_I(inode));
+out:
 	return res;
 }
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1a0b85a..07ce460 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -209,6 +209,7 @@ struct nfs_inode {
 #define NFS_INO_FLUSHING	(4)		/* inode is flushing out data */
 #define NFS_INO_FSCACHE		(5)		/* inode can be cached by FS-Cache */
 #define NFS_INO_FSCACHE_LOCK	(6)		/* FS-Cache cookie management lock */
+#define NFS_INO_COMMIT		(7)		/* inode is committing unstable writes */
 
 static inline struct nfs_inode *NFS_I(const struct inode *inode)
 {


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-20 12:37                 ` Trond Myklebust
@ 2010-04-20 22:19                   ` Nix
  2010-04-21 22:13                   ` Nix
  1 sibling, 0 replies; 12+ messages in thread
From: Nix @ 2010-04-20 22:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 20 Apr 2010, Trond Myklebust spake thusly:
> I therefore think we need something like the following patch.

OK. It'll be 24 hours before I can test this, I'm afraid (real life
intervenes).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created
  2010-04-20 12:37                 ` Trond Myklebust
  2010-04-20 22:19                   ` Nix
@ 2010-04-21 22:13                   ` Nix
  1 sibling, 0 replies; 12+ messages in thread
From: Nix @ 2010-04-21 22:13 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On 20 Apr 2010, Trond Myklebust uttered the following:
> I therefore think we need something like the following patch.

I tried three cmakes in succession: the problem, at least in this
manifestation, appears to be gone.

Thanks for the fix! (Races are always horrible: it's impressive to me
that you diagnosed it correctly first time...)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-21 22:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-17 19:43 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created Nix
2010-04-18 19:21 ` Trond Myklebust
2010-04-18 19:27   ` Nix
2010-04-18 19:59     ` Trond Myklebust
2010-04-18 20:03       ` Nix
2010-04-18 20:13         ` Trond Myklebust
2010-04-18 21:09           ` Nix
2010-04-19 13:10             ` Trond Myklebust
2010-04-19 18:54               ` Nix
2010-04-20 12:37                 ` Trond Myklebust
2010-04-20 22:19                   ` Nix
2010-04-21 22:13                   ` Nix

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).