* 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created @ 2010-04-17 19:43 Nix 2010-04-18 19:21 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2010-04-17 19:43 UTC (permalink / raw) To: linux-kernel; +Cc: linux-nfs, Trond Myklebust [Trond Cc:ed as this seems to be a bug in one of your writeback-for-2.6.34 commits.] In 2.6.34rcX (tip of tree) I've started seeing this sort of thing when building over NFS (v3): [...] -- Found LibXslt: /usr/lib64/libxslt.so -- found libxml-2.0, version 2.7.6 -- Found LibXml2: /usr/lib64/libxml2.so -- Found shared-mime-info version: 0.71 -- Looking for __progname CMake Error: Remove failed on file: /usr/src/kde/x86_64-mutilate/build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031fc510000082f: System Error: Device or resource busy [... eventually, cmake fails because of this error.] The silly-renamed files are invariably no longer in use (they tend to be GCC output, ELF executables run as part of testsuites) but haven't been removed, and they -EBUSY when removal is attempted. A complete strace log of running cmake against current HEAD (with lots of these errors) is at <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>. I can do a packet capture too if you like. I also see it after doing 'make install's followed by an 'rm -rf' of the build tree: the rm -rf fails because half the files are 'in use' (they aren't). Repeating the rm -rf a few seconds later works. fuser, even as root, shows no processes holding these files open. This bisects down to commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Fri Feb 19 17:03:26 2010 -0800 NFS: Replace __nfs_write_mapping with sync_inode() Now that we have correct COMMIT semantics in writeback_single_inode, we can reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a call to filemap_write_and_wait(), which doesn't need to hold the inode->i_mutex. With that done, we can eliminate nfs_write_mapping() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> I suspect that unlink()ing a not otherwise open file for which writeback is still underway is causing the files to be sillyrenamed because writeback is holding them open. If writeback is the only user, they should surely not be held open: nobody cares what their contents are, and a lot of code depends on rm -r of directories containing recently- written-but-still-closed files succeeding. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-17 19:43 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created Nix @ 2010-04-18 19:21 ` Trond Myklebust 2010-04-18 19:27 ` Nix 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2010-04-18 19:21 UTC (permalink / raw) To: Nix; +Cc: linux-kernel, linux-nfs On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: > [Trond Cc:ed as this seems to be a bug in one of your > writeback-for-2.6.34 commits.] > > In 2.6.34rcX (tip of tree) I've started seeing this sort of thing when > building over NFS (v3): > > [...] > -- Found LibXslt: /usr/lib64/libxslt.so > -- found libxml-2.0, version 2.7.6 > -- Found LibXml2: /usr/lib64/libxml2.so > -- Found shared-mime-info version: 0.71 > -- Looking for __progname > CMake Error: Remove failed on file: /usr/src/kde/x86_64-mutilate/build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031fc510000082f: System Error: Device or resource busy > [... eventually, cmake fails because of this error.] > > The silly-renamed files are invariably no longer in use (they tend to be > GCC output, ELF executables run as part of testsuites) but haven't been > removed, and they -EBUSY when removal is attempted. > > A complete strace log of running cmake against current HEAD (with lots > of these errors) is at > <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma>. > I can do a packet capture too if you like. > > I also see it after doing 'make install's followed by an 'rm -rf' of the > build tree: the rm -rf fails because half the files are 'in use' (they > aren't). Repeating the rm -rf a few seconds later works. fuser, even as > root, shows no processes holding these files open. > > This bisects down to > > commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 > Author: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Fri Feb 19 17:03:26 2010 -0800 > > NFS: Replace __nfs_write_mapping with sync_inode() > > Now that we have correct COMMIT semantics in writeback_single_inode, we can > reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a > call to filemap_write_and_wait(), which doesn't need to hold the > inode->i_mutex. > > With that done, we can eliminate nfs_write_mapping() altogether. > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > > I suspect that unlink()ing a not otherwise open file for which writeback > is still underway is causing the files to be sillyrenamed because > writeback is holding them open. If writeback is the only user, they > should surely not be held open: nobody cares what their contents are, > and a lot of code depends on rm -r of directories containing recently- > written-but-still-closed files succeeding. Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS: Ensure that writeback_single_inode() calls write_inode() when syncing)? That fixed the above problem on my setup. Cheers Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 19:21 ` Trond Myklebust @ 2010-04-18 19:27 ` Nix 2010-04-18 19:59 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2010-04-18 19:27 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 18 Apr 2010, Trond Myklebust verbalised: > On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: >> I suspect that unlink()ing a not otherwise open file for which writeback >> is still underway is causing the files to be sillyrenamed because >> writeback is holding them open. If writeback is the only user, they >> should surely not be held open: nobody cares what their contents are, >> and a lot of code depends on rm -r of directories containing recently- >> written-but-still-closed files succeeding. > > Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS: > Ensure that writeback_single_inode() calls write_inode() when syncing)? > That fixed the above problem on my setup. tip-of-tree includes that commit, and it's still happening for me there. (Just verified again.) (The exported filesystem is coming from a box running 2.6.33 atop ext4, in case it matters.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 19:27 ` Nix @ 2010-04-18 19:59 ` Trond Myklebust 2010-04-18 20:03 ` Nix 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2010-04-18 19:59 UTC (permalink / raw) To: Nix; +Cc: linux-kernel, linux-nfs On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: > On 18 Apr 2010, Trond Myklebust verbalised: > > > On Sat, 2010-04-17 at 20:43 +0100, Nix wrote: > >> I suspect that unlink()ing a not otherwise open file for which writeback > >> is still underway is causing the files to be sillyrenamed because > >> writeback is holding them open. If writeback is the only user, they > >> should surely not be held open: nobody cares what their contents are, > >> and a lot of code depends on rm -r of directories containing recently- > >> written-but-still-closed files succeeding. > > > > Did you test with commit b80c3cb628f0ebc241b02e38dd028969fb8026a2 (NFS: > > Ensure that writeback_single_inode() calls write_inode() when syncing)? > > That fixed the above problem on my setup. > > tip-of-tree includes that commit, and it's still happening for me there. > (Just verified again.) Does a 'sync' clear out the sillyrenamed files? > (The exported filesystem is coming from a box running 2.6.33 atop ext4, > in case it matters.) That shouldn't matter. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 19:59 ` Trond Myklebust @ 2010-04-18 20:03 ` Nix 2010-04-18 20:13 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2010-04-18 20:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 18 Apr 2010, Trond Myklebust said: > On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: >> On 18 Apr 2010, Trond Myklebust verbalised: >> tip-of-tree includes that commit, and it's still happening for me there. >> (Just verified again.) > > Does a 'sync' clear out the sillyrenamed files? Yes. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 20:03 ` Nix @ 2010-04-18 20:13 ` Trond Myklebust 2010-04-18 21:09 ` Nix 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2010-04-18 20:13 UTC (permalink / raw) To: Nix; +Cc: linux-kernel, linux-nfs On Sun, 2010-04-18 at 21:03 +0100, Nix wrote: > On 18 Apr 2010, Trond Myklebust said: > > > On Sun, 2010-04-18 at 20:27 +0100, Nix wrote: > >> On 18 Apr 2010, Trond Myklebust verbalised: > >> tip-of-tree includes that commit, and it's still happening for me there. > >> (Just verified again.) > > > > Does a 'sync' clear out the sillyrenamed files? > > Yes. OK. Next question is how these files are being written. If you 'strace' the process, do you see ordinary write() calls being used, or is the file perhaps being written via mmap()? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 20:13 ` Trond Myklebust @ 2010-04-18 21:09 ` Nix 2010-04-19 13:10 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2010-04-18 21:09 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 18 Apr 2010, Trond Myklebust outgrape: > OK. Next question is how these files are being written. If you 'strace' > the process, do you see ordinary write() calls being used, or is the > file perhaps being written via mmap()? (strace output is at <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma> if you want to look for yourself, but it's about 70Mb unlzmaed, so I can understand if you run very fast in the opposite direction...) The files are all ELF object files: build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031c7510000159b: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), stripped ... which means they're being written by gas, and that writes them out with write() (it's easy to spot gas writing things out, because one of the first things it writes is an ELF note of the general form "GCC: (GNU) 4.4.4" which is very rare elsewhere in an strace dump.) A complete as run looks like this (I think this is one of the failing ones, only of course it's cmake's subsequent recursive unlink()/rmdir() that fails: I speculate that the non-failing ones only work because they ran early enough that writeback had completed for them...): 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3703000 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3682000 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3601000 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe29f7000 23109 stat("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", 0x7fffe1073b40) = -1 ENOENT (No such file or directory) 23109 open("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 23109 fcntl(3, F_GETFD) = 0 23109 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe2976000 23109 brk(0x1006000) = 0x1006000 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe28f5000 23109 open("/tmp/ccybr1Xe.s", O_RDONLY) = 4 23109 fstat(4, {st_mode=S_IFREG|0600, st_size=356, ...}) = 0 23109 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe37b3000 23109 read(4, "\t.file\t\"src.cxx\"\n\t.text\n.globl m"..., 4096) = 356 23109 read(4, "", 28672) = 0 23109 read(4, "", 32768) = 0 23109 read(4, "", 32768) = 0 23109 close(4) = 0 23109 munmap(0x7fafe37b3000, 4096) = 0 23109 brk(0x102a000) = 0x102a000 23109 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 23109 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe27f5000 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "", 1184) = 0 23109 lseek(3, 1184, SEEK_CUR) = 1184 23109 write(3, "\0src.cxx\0main\0", 14) = 14 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 23109 lseek(3, -1134, SEEK_CUR) = 64 23109 write(3, "UH\211\345\270\0\0\0\0\311\303", 11) = 11 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 23109 lseek(3, -1122, SEEK_CUR) = 76 23109 write(3, "\0GCC: (GNU) 4.4.4 20100308 (prer"..., 40) = 40 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 23109 lseek(3, -1078, SEEK_CUR) = 120 23109 write(3, "\24\0\0\0\0\0\0\0\1zR\0\1x\20\1\33\f\7\10\220\1\0\0\34\0\0\0\34\0\0\0"..., 56) = 56 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 23109 lseek(3, 2, SEEK_CUR) = 1200 23109 write(3, " \0\0\0\0\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 23109 lseek(3, -256, SEEK_CUR) = 968 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\4\0\361\377"..., 216) = 216 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 23109 lseek(3, -1048, SEEK_CUR) = 176 23109 write(3, "\0.symtab\0.strtab\0.shstrtab\0.text"..., 84) = 84 23109 lseek(3, 0, SEEK_SET) = 0 23109 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64 23109 lseek(3, 0, SEEK_SET) = 0 23109 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 23109 brk(0x1021000) = 0x1021000 23109 lseek(3, -960, SEEK_CUR) = 264 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 704) = 704 23109 close(3) = 0 23109 munmap(0x7fafe27f5000, 1048576) = 0 23109 exit_group(0) = ? However, this can't be the whole picture, because this does *not* fail: mkdir foo; (cd foo; echo 'int main (void) {}' | c++ -x c++ -c -o foo.o -pipe -); rm -r foo So it's something more than just as. I wonder if the failures are caused by ld -r invocations? ... no, there are none: the relict files must therefore be the result of 'as' runs. It's a shame I can't tell what they were called before they were deleted... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-18 21:09 ` Nix @ 2010-04-19 13:10 ` Trond Myklebust 2010-04-19 18:54 ` Nix 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2010-04-19 13:10 UTC (permalink / raw) To: Nix; +Cc: linux-kernel, linux-nfs On Sun, 2010-04-18 at 22:09 +0100, Nix wrote: > On 18 Apr 2010, Trond Myklebust outgrape: > > OK. Next question is how these files are being written. If you 'strace' > > the process, do you see ordinary write() calls being used, or is the > > file perhaps being written via mmap()? > > (strace output is at > <http://www.esperi.org.uk/~nix/temporary/strace-kdelibs-nfs-EBUSY.log.lzma> > if you want to look for yourself, but it's about 70Mb unlzmaed, so I can > understand if you run very fast in the opposite direction...) > > The files are all ELF object files: > > build/CMakeFiles/CMakeTmp/CMakeFiles/cmTryCompileExec.dir/.nfs000000000031c7510000159b: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), stripped > > ... which means they're being written by gas, and that writes them out > with write() (it's easy to spot gas writing things out, because one > of the first things it writes is an ELF note of the general form > "GCC: (GNU) 4.4.4" which is very rare elsewhere in an strace dump.) > > A complete as run looks like this (I think this is one of the failing > ones, only of course it's cmake's subsequent recursive unlink()/rmdir() > that fails: I speculate that the non-failing ones only work because they > ran early enough that writeback had completed for them...): > > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3703000 > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3682000 > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe3601000 > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe29f7000 > 23109 stat("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", 0x7fffe1073b40) = -1 ENOENT (No such file or directory) > 23109 open("CMakeFiles/cmTryCompileExec.dir/src.cxx.o", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 > 23109 fcntl(3, F_GETFD) = 0 > 23109 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe2976000 > 23109 brk(0x1006000) = 0x1006000 > 23109 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe28f5000 > 23109 open("/tmp/ccybr1Xe.s", O_RDONLY) = 4 > 23109 fstat(4, {st_mode=S_IFREG|0600, st_size=356, ...}) = 0 > 23109 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe37b3000 > 23109 read(4, "\t.file\t\"src.cxx\"\n\t.text\n.globl m"..., 4096) = 356 > 23109 read(4, "", 28672) = 0 > 23109 read(4, "", 32768) = 0 > 23109 read(4, "", 32768) = 0 > 23109 close(4) = 0 > 23109 munmap(0x7fafe37b3000, 4096) = 0 > 23109 brk(0x102a000) = 0x102a000 > 23109 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > 23109 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fafe27f5000 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "", 1184) = 0 > 23109 lseek(3, 1184, SEEK_CUR) = 1184 > 23109 write(3, "\0src.cxx\0main\0", 14) = 14 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 > 23109 lseek(3, -1134, SEEK_CUR) = 64 > 23109 write(3, "UH\211\345\270\0\0\0\0\311\303", 11) = 11 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 > 23109 lseek(3, -1122, SEEK_CUR) = 76 > 23109 write(3, "\0GCC: (GNU) 4.4.4 20100308 (prer"..., 40) = 40 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 > 23109 lseek(3, -1078, SEEK_CUR) = 120 > 23109 write(3, "\24\0\0\0\0\0\0\0\1zR\0\1x\20\1\33\f\7\10\220\1\0\0\34\0\0\0\34\0\0\0"..., 56) = 56 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1198 > 23109 lseek(3, 2, SEEK_CUR) = 1200 > 23109 write(3, " \0\0\0\0\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 > 23109 lseek(3, -256, SEEK_CUR) = 968 > 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\4\0\361\377"..., 216) = 216 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 > 23109 lseek(3, -1048, SEEK_CUR) = 176 > 23109 write(3, "\0.symtab\0.strtab\0.shstrtab\0.text"..., 84) = 84 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64 > 23109 lseek(3, 0, SEEK_SET) = 0 > 23109 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1224 > 23109 brk(0x1021000) = 0x1021000 > 23109 lseek(3, -960, SEEK_CUR) = 264 > 23109 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 704) = 704 > 23109 close(3) = 0 > 23109 munmap(0x7fafe27f5000, 1048576) = 0 > 23109 exit_group(0) = ? > > However, this can't be the whole picture, because this does *not* fail: > > mkdir foo; (cd foo; echo 'int main (void) {}' | c++ -x c++ -c -o foo.o -pipe -); rm -r foo > > So it's something more than just as. > > I wonder if the failures are caused by ld -r invocations? > > ... no, there are none: the relict files must therefore be the result of > 'as' runs. It's a shame I can't tell what they were called before they > were deleted... Hmm... Could you please try reverting commit 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible). I wonder if that introduced a new race... Cheers Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-19 13:10 ` Trond Myklebust @ 2010-04-19 18:54 ` Nix 2010-04-20 12:37 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2010-04-19 18:54 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 19 Apr 2010, Trond Myklebust verbalised: > Hmm... Could you please try reverting commit > 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and > COMMIT RPC calls are always uninterruptible). I wonder if that > introduced a new race... That seems to have fixed it. (This must be a heck of a wide race, or I'm very unlucky :) I hit it every single time, even if the machine is under heavy load...) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-19 18:54 ` Nix @ 2010-04-20 12:37 ` Trond Myklebust 2010-04-20 22:19 ` Nix 2010-04-21 22:13 ` Nix 0 siblings, 2 replies; 12+ messages in thread From: Trond Myklebust @ 2010-04-20 12:37 UTC (permalink / raw) To: Nix; +Cc: linux-kernel, linux-nfs On Mon, 2010-04-19 at 19:54 +0100, Nix wrote: > On 19 Apr 2010, Trond Myklebust verbalised: > > Hmm... Could you please try reverting commit > > 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and > > COMMIT RPC calls are always uninterruptible). I wonder if that > > introduced a new race... > > That seems to have fixed it. > > (This must be a heck of a wide race, or I'm very unlucky :) I hit it > every single time, even if the machine is under heavy load...) Hmm... Unfortunately I don't think that reverting the patch is sufficient to completely close the race. For instance, it is entirely possible for the background flush threads to schedule a COMMIT which can race with the close() call in a similar fashion. I therefore think we need something like the following patch. Cheers Trond ------------------------------------------------------------------------------------------------- NFS: Fix an unstable write data integrity race From: Trond Myklebust <Trond.Myklebust@netapp.com> Commit 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible) exposed a race on file close. In order to ensure correct close-to-open behaviour, we want to wait for all outstanding background commit operations to complete. This patch adds an inode flag that indicates if a commit operation is under way, and provides a mechanism to allow ->write_inode() to wait for its completion if this is a data integrity flush. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/nfs/write.c | 36 ++++++++++++++++++++++++++++++++---- include/linux/nfs_fs.h | 1 + 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index de38d63..ccde2ae 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1201,6 +1201,25 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data) #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4) +static int nfs_commit_set_lock(struct nfs_inode *nfsi, int may_wait) +{ + if (!test_and_set_bit(NFS_INO_COMMIT, &nfsi->flags)) + return 1; + if (may_wait && !out_of_line_wait_on_bit_lock(&nfsi->flags, + NFS_INO_COMMIT, nfs_wait_bit_killable, + TASK_KILLABLE)) + return 1; + return 0; +} + +static void nfs_commit_clear_lock(struct nfs_inode *nfsi) +{ + clear_bit(NFS_INO_COMMIT, &nfsi->flags); + smp_mb__after_clear_bit(); + wake_up_bit(&nfsi->flags, NFS_INO_COMMIT); +} + + static void nfs_commitdata_release(void *data) { struct nfs_write_data *wdata = data; @@ -1262,8 +1281,6 @@ static int nfs_commit_rpcsetup(struct list_head *head, task = rpc_run_task(&task_setup_data); if (IS_ERR(task)) return PTR_ERR(task); - if (how & FLUSH_SYNC) - rpc_wait_for_completion_task(task); rpc_put_task(task); return 0; } @@ -1294,6 +1311,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how) BDI_RECLAIMABLE); nfs_clear_page_tag_locked(req); } + nfs_commit_clear_lock(NFS_I(inode)); return -ENOMEM; } @@ -1349,6 +1367,7 @@ static void nfs_commit_release(void *calldata) next: nfs_clear_page_tag_locked(req); } + nfs_commit_clear_lock(NFS_I(data->inode)); nfs_commitdata_release(calldata); } @@ -1363,8 +1382,11 @@ static const struct rpc_call_ops nfs_commit_ops = { static int nfs_commit_inode(struct inode *inode, int how) { LIST_HEAD(head); - int res; + int may_wait = how & FLUSH_SYNC; + int res = 0; + if (!nfs_commit_set_lock(NFS_I(inode), may_wait)) + goto out; spin_lock(&inode->i_lock); res = nfs_scan_commit(inode, &head, 0, 0); spin_unlock(&inode->i_lock); @@ -1372,7 +1394,13 @@ static int nfs_commit_inode(struct inode *inode, int how) int error = nfs_commit_list(inode, &head, how); if (error < 0) return error; - } + if (may_wait) + wait_on_bit(&NFS_I(inode)->flags, NFS_INO_COMMIT, + nfs_wait_bit_killable, + TASK_KILLABLE); + } else + nfs_commit_clear_lock(NFS_I(inode)); +out: return res; } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 1a0b85a..07ce460 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -209,6 +209,7 @@ struct nfs_inode { #define NFS_INO_FLUSHING (4) /* inode is flushing out data */ #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */ #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */ +#define NFS_INO_COMMIT (7) /* inode is committing unstable writes */ static inline struct nfs_inode *NFS_I(const struct inode *inode) { ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-20 12:37 ` Trond Myklebust @ 2010-04-20 22:19 ` Nix 2010-04-21 22:13 ` Nix 1 sibling, 0 replies; 12+ messages in thread From: Nix @ 2010-04-20 22:19 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 20 Apr 2010, Trond Myklebust spake thusly: > I therefore think we need something like the following patch. OK. It'll be 24 hours before I can test this, I'm afraid (real life intervenes). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created 2010-04-20 12:37 ` Trond Myklebust 2010-04-20 22:19 ` Nix @ 2010-04-21 22:13 ` Nix 1 sibling, 0 replies; 12+ messages in thread From: Nix @ 2010-04-21 22:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-kernel, linux-nfs On 20 Apr 2010, Trond Myklebust uttered the following: > I therefore think we need something like the following patch. I tried three cmakes in succession: the problem, at least in this manifestation, appears to be gone. Thanks for the fix! (Races are always horrible: it's impressive to me that you diagnosed it correctly first time...) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-04-21 22:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-04-17 19:43 2.6.34rc4 NFS writeback regression (bisected): client often fails to delete things it just created Nix 2010-04-18 19:21 ` Trond Myklebust 2010-04-18 19:27 ` Nix 2010-04-18 19:59 ` Trond Myklebust 2010-04-18 20:03 ` Nix 2010-04-18 20:13 ` Trond Myklebust 2010-04-18 21:09 ` Nix 2010-04-19 13:10 ` Trond Myklebust 2010-04-19 18:54 ` Nix 2010-04-20 12:37 ` Trond Myklebust 2010-04-20 22:19 ` Nix 2010-04-21 22:13 ` Nix
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).